Skip to content

Comparison Module

The modules.comparison module provides utilities for comparing two text files character by character. It is used in the test suite to verify that processed packet output matches expected reference files exactly.

Overview

The comparison pipeline works in three stages:

  1. Read – both files are loaded as UTF-8 strings via read_text.
  2. Locatefind_first_mismatch scans character by character for the first divergence.
  3. Reportcompare_files and compare_text_content return the mismatch index, file lengths, and the differing characters for easy diagnosis.

API Reference

modules.comparison

Utilities for comparing text file contents character by character.

Used primarily in tests to verify that processed payload output matches expected reference files.

compare_files(path1, path2)

Compare two text files and report the first point of divergence.

Both paths are resolved to absolute paths before reading so that relative paths are handled correctly regardless of the working directory.

Parameters:

Name Type Description Default
path1 str | Path

Path to the first file (actual output).

required
path2 str | Path

Path to the second file (expected reference).

required

Returns:

Type Description
int | None

A five-tuple of

int

(mismatch_index, len1, len2, char_from_file1, char_from_file2).

int

mismatch_index is the zero-based character index of the first

str | None

difference, or None if the files are identical.

str | None

len1 and len2 are the total character counts of each file.

tuple[int | None, int, int, str | None, str | None]

char_from_file1 and char_from_file2 are the differing

tuple[int | None, int, int, str | None, str | None]

characters at mismatch_index, or None when the mismatch is

tuple[int | None, int, int, str | None, str | None]

a length difference.

Source code in modules/comparison.py
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
def compare_files(path1: str | Path, path2: str | Path) -> tuple[int | None, int, int, str | None, str | None]:
    """Compare two text files and report the first point of divergence.

    Both paths are resolved to absolute paths before reading so that relative
    paths are handled correctly regardless of the working directory.

    Args:
        path1: Path to the first file (actual output).
        path2: Path to the second file (expected reference).

    Returns:
        A five-tuple of
        ``(mismatch_index, len1, len2, char_from_file1, char_from_file2)``.

        ``mismatch_index`` is the zero-based character index of the first
        difference, or ``None`` if the files are identical.
        ``len1`` and ``len2`` are the total character counts of each file.
        ``char_from_file1`` and ``char_from_file2`` are the differing
        characters at ``mismatch_index``, or ``None`` when the mismatch is
        a length difference.
    """
    file1 = Path(path1).resolve()
    file2 = Path(path2).resolve()

    data1 = read_text(file1)
    data2 = read_text(file2)

    mismatch_index, len1, len2 = compare_text_content(data1, data2)

    if mismatch_index is None:
        return None, len1, len2, None, None

    min_len = min(len1, len2)

    if mismatch_index < min_len:
        return (
            mismatch_index,
            len1,
            len2,
            data1[mismatch_index],
            data2[mismatch_index],
        )

    return mismatch_index, len1, len2, None, None

compare_text_content(data1, data2)

Compare two strings and return mismatch details.

Parameters:

Name Type Description Default
data1 str

The first string to compare.

required
data2 str

The second string to compare.

required

Returns:

Type Description
int | None

A three-tuple of (mismatch_index, len(data1), len(data2)).

int

mismatch_index is None when the strings are identical.

Source code in modules/comparison.py
42
43
44
45
46
47
48
49
50
51
52
53
54
def compare_text_content(data1: str, data2: str) -> tuple[int | None, int, int]:
    """Compare two strings and return mismatch details.

    Args:
        data1: The first string to compare.
        data2: The second string to compare.

    Returns:
        A three-tuple of ``(mismatch_index, len(data1), len(data2))``.
        ``mismatch_index`` is ``None`` when the strings are identical.
    """
    mismatch_index = find_first_mismatch(data1, data2)
    return mismatch_index, len(data1), len(data2)

find_first_mismatch(data1, data2)

Find the index of the first differing character between two strings.

Parameters:

Name Type Description Default
data1 str

The first string to compare.

required
data2 str

The second string to compare.

required

Returns:

Type Description
int | None

The zero-based index of the first mismatch, or None if the strings

int | None

are identical.

Source code in modules/comparison.py
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
def find_first_mismatch(data1: str, data2: str) -> int | None:
    """Find the index of the first differing character between two strings.

    Args:
        data1: The first string to compare.
        data2: The second string to compare.

    Returns:
        The zero-based index of the first mismatch, or ``None`` if the strings
        are identical.
    """
    min_len = min(len(data1), len(data2))
    for index in range(min_len):
        if data1[index] != data2[index]:
            return index
    if len(data1) != len(data2):
        return min_len
    return None

read_text(path)

Read a file's contents as a UTF-8 string, replacing undecodable bytes.

Parameters:

Name Type Description Default
path Path

Absolute or relative path to the file.

required

Returns:

Type Description
str

The file contents as a string.

Source code in modules/comparison.py
10
11
12
13
14
15
16
17
18
19
def read_text(path: Path) -> str:
    """Read a file's contents as a UTF-8 string, replacing undecodable bytes.

    Args:
        path: Absolute or relative path to the file.

    Returns:
        The file contents as a string.
    """
    return path.read_text(encoding="utf-8", errors="replace")