Aligner
transcribe.aligner
init_label_studio_annotation()
Initializes a pair of dictionaries in Label Studio annotation format.
Returns:
Type | Description |
---|---|
List[Dict[str, Any]]
|
List[Dict[str, Any]]: List containing pair of dictionaries in Label Studio JSON |
List[Dict[str, Any]]
|
annotation format. |
Source code in src/transcribe/aligner.py
overlapping_segments(results, ground_truth, language, max_repeats=None)
Segments Amazon Transcribe raw output to individual sentences based on overlapping regions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
results |
Dict[str, List]
|
Resultant output received from AWS Transcribe. |
required |
ground_truth |
str
|
Ground truth text for the corresponding annotation. |
required |
language |
str
|
Language of the transcript-ground truth pair. |
required |
max_repeats |
int
|
Maximum number of repeats when detecting for overlaps. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
List[Dict[str, Any]]
|
List[Dict[str, Any]]: List of dictionaries with segment-wise annotations for |
List[Dict[str, Any]]
|
Label Studio. |