Skip to content

Homophones

airtable_apply_annotations.homophones

create_convert(*families)

Return a converter function that converts a list to the same list with only main words

Parameters:

Name Type Description Default
*families List[Set[str]]

List of homophone families.

()

Returns:

Type Description
List[List[str]]

List[List[str]]: True if all paths exist in files

Source code in src/airtable_apply_annotations/homophones.py
def create_convert(*families: List[Set[str]]) -> List[List[str]]:
    """Return a converter function that converts a list to the same list with only main
    words

    Args:
        *families (List[Set[str]]): List of homophone families.

    Returns:
        List[List[str]]: True if all paths exist in `files`
    """
    d = {w: main for main, *alternatives in map(list, families) for w in alternatives}
    return lambda L: [d.get(w, w) for w in L]

match_sequence(list1, list2, homophones)

Finds index of overlaps between two lists given a homophone mapping.

Parameters:

Name Type Description Default
list1 List[str]

List of words in a sequence.

required
list2 List[str]

List of words in another sequence for matching/comparison.

required
homophones List[Set[str]]

List of homophone families.

required

Returns:

Type Description
Tuple[List[int], List[int], List[Tuple[str, int, int, int, int]]]

Tuple[List[int], List[int], List[Tuple[str, int, int, int, int]]]: Pair of lists containing list of indices of overlap.

Source code in src/airtable_apply_annotations/homophones.py
def match_sequence(
    list1: List[str], list2: List[str], homophones: List[Set[str]]
) -> Tuple[List[int], List[int], List[Tuple[str, int, int, int, int]]]:
    """Finds index of overlaps between two lists given a homophone mapping.

    Args:
        list1 (List[str]): List of words in a sequence.
        list2 (List[str]): List of words in another sequence for matching/comparison.
        homophones (List[Set[str]]): List of homophone families.

    Returns:
        Tuple[List[int], List[int], List[Tuple[str, int, int, int, int]]]:
            Pair of lists containing list of indices of overlap.
    """
    convert = create_convert(*homophones)
    output1, output2 = [], []
    s = SequenceMatcher(None, convert(list1), convert(list2))
    opcodes = s.get_opcodes()
    for block in s.get_matching_blocks():
        for i in range(block.size):
            output1.append(block.a + i)
            output2.append(block.b + i)

    assert len(output1) == len(output2)

    return output1, output2, opcodes