Mispronunciation
transcribe.mispronunciation
Mispronunciation
A class to represent a Mispronunciation. Contains attributes which holds the type and differences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
job_name |
str
|
Job name/id. |
required |
audio_url |
str
|
URL to audio file. |
required |
language |
str
|
Language of audio. |
required |
type |
MispronunciationType
|
Type of mispronunciation/disfluency present. |
required |
lists |
Tuple[List[str], List[str]]
|
Input list of strings taken for comparison. |
required |
differences |
Tuple[List[str], List[str]]
|
Differences of list of strings that resulted in the type verdict. |
required |
Source code in src/transcribe/mispronunciation.py
__init__(type, lists, differences, opcodes)
Constructor for the Mispronunciation
class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
type |
MispronunciationType
|
Type of mispronunciation/disfluency present. |
required |
lists |
Tuple[List[str], List[str]]
|
Input list of strings taken for comparison. |
required |
differences |
Tuple[List[str], List[str]]
|
Differences of list of strings that resulted in the type verdict. |
required |
opcodes |
List[Tuple[str, int, int, int, int]]
|
Opcodes from |
required |
Source code in src/transcribe/mispronunciation.py
detect_mispronunciation(ground_truth, transcript, homophones=None)
Detects if the pair of ground truth and transcript is considered as a mispronunciation.
We define a mispronunciation to be either an addition (A) / substitution (S).
Ignores deletion (D), 100% match (M) and single-word GT (X), returning None
.
Also handles homophones given a pre-defined list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
ground_truth |
List[str]
|
List of ground truth words. |
required |
transcript |
List[str]
|
List of transcript words. |
required |
homophones |
List[Set[str]]
|
List of homophone families. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
Mispronunciation |
Mispronunciation
|
Object of mispronunciation present. Otherwise, |
Examples
# | Ground Truth | Transcript | Verdict |
---|---|---|---|
1 | skel is a skeleton | skel is a skeleton | M |
2 | skel is a skeleton | skel is not a skeleton | A |
3 | skel is a skeleton | skel is a zombie | S |
4 | skel is a skeleton | skel is not a zombie | A & S |
5 | skel is a skeleton | skel is skeleton | D |
6 | skel is a skeleton | skel is zombie | D |
7 | vain is a skeleton | vein is a skeleton | M |
8 | skel | skel is a skeleton | X |
Algorithm
BASE CASES if:
- single-word ground truth
- empty transcript
- zero alignment
MATCH if:
- both residues are empty (100% match)
DELETION if:
- zero transcript residue, >1 ground truth residue
- all spoken transcripts are correct, but some words are missing
- more residue in ground truth than in transcript
- less strict condition than above
- may possibly contain substitution, but could be minimal
ADDITION if:
- zero ground truth residue, >1 transcript residue
- all words in ground truth are perfectly spoken, but additional words are present
SUBSTITUTION if:
- same amounts of residue, at exact same positions
- strict form of substitution, only 1-1 changes per position
ADDITION & SUBSTITUTION if:
- more residue in transcript than in ground truth
- with at least 1 match
Source code in src/transcribe/mispronunciation.py
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
|
remove_fillers(word)
Manually checks if a word is a filler word
Parameters:
Name | Type | Description | Default |
---|---|---|---|
word |
str
|
Any word (sequence of characters). |
required |
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
|