Skip to content

Config

Example Config File

example_config.json
{
    "do_classify": true,
    "filter_empty_transcript": true,
    "classifier": {
        "model": "bookbot/distil-wav2vec2-adult-child-cls-52m",
        "max_duration_s": 3.0
    },
    "transcriber": {
        "type": "wav2vec2",
        "model": "bookbot/wav2vec2-bookbot-en-lm",
        "return_timestamps": "word",
        "chunk_length_s": 30
    },
    "do_noise_classify": true,
    "noise_classifier": {
        "model": "bookbot/distil-ast-audioset",
        "minimum_empty_duration": 0.3,
        "threshold": 0.2
    },
    "segmenter": {
        "type": "word_overlap",
        "minimum_chunk_duration": 1.0
    }
}

speechline.config.NoiseClassifierConfig dataclass

Noise classifier config.

Parameters:

Name Type Description Default
model str

HuggingFace Hub model hub checkpoint.

required
min_empty_duration float

Minimum non-transcribed segment duration to be segmented, and passed to noise classifier. Defaults to 1.0 seconds.

required
threshold float

The probability threshold for the multi label classification. Defaults to 0.3.

0.3
batch_size int

Batch size during inference. Defaults to 1.

1
Source code in speechline/config.py
class NoiseClassifierConfig:
    """
    Noise classifier config.

    Args:
        model (str):
            HuggingFace Hub model hub checkpoint.
        min_empty_duration (float, optional):
            Minimum non-transcribed segment duration to be segmented,
            and passed to noise classifier.
            Defaults to `1.0` seconds.
        threshold (float, optional):
            The probability threshold for the multi label classification.
            Defaults to `0.3`.
        batch_size (int, optional):
            Batch size during inference. Defaults to `1`.

    """

    model: str
    minimum_empty_duration: float = 1.0
    threshold: float = 0.3
    batch_size: int = 1

speechline.config.ClassifierConfig dataclass

Audio classifier config.

Parameters:

Name Type Description Default
model str

HuggingFace Hub model hub checkpoint.

required
max_duration_s float

Maximum audio duration for padding. Defaults to 3.0 seconds.

3.0
batch_size int

Batch size during inference. Defaults to 1.

1
Source code in speechline/config.py
class ClassifierConfig:
    """
    Audio classifier config.

    Args:
        model (str):
            HuggingFace Hub model hub checkpoint.
        max_duration_s (float, optional):
            Maximum audio duration for padding. Defaults to `3.0` seconds.
        batch_size (int, optional):
            Batch size during inference. Defaults to `1`.
    """

    model: str
    max_duration_s: float = 3.0
    batch_size: int = 1

speechline.config.TranscriberConfig dataclass

Audio transcriber config.

Parameters:

Name Type Description Default
type str

Transcriber model architecture type.

required
model str

HuggingFace Hub model hub checkpoint.

required
return_timestamps Union[str, bool]

return_timestamps argument in AutomaticSpeechRecognitionPipeline's __call__ method. Use "char" for CTC-based models and True for Whisper-based models.

required
chunk_length_s int

Audio chunk length in seconds.

required
Source code in speechline/config.py
class TranscriberConfig:
    """
    Audio transcriber config.

    Args:
        type (str):
            Transcriber model architecture type.
        model (str):
            HuggingFace Hub model hub checkpoint.
        return_timestamps (Union[str, bool]):
            `return_timestamps` argument in `AutomaticSpeechRecognitionPipeline`'s
            `__call__` method. Use `"char"` for CTC-based models and
            `True` for Whisper-based models.
        chunk_length_s (int):
            Audio chunk length in seconds.
    """

    type: str
    model: str
    return_timestamps: Union[str, bool]
    chunk_length_s: int

    def __post_init__(self):
        SUPPORTED_MODELS = {"wav2vec2", "whisper"}
        WAV2VEC_TIMESTAMPS = {"word", "char"}

        if self.type not in SUPPORTED_MODELS:
            raise ValueError(f"Transcriber of type {self.type} is not yet supported!")

        if self.type == "wav2vec2" and self.return_timestamps not in WAV2VEC_TIMESTAMPS:
            raise ValueError("wav2vec2 only supports `'word'` or `'char'` timestamps!")
        elif self.type == "whisper" and self.return_timestamps is not True:
            raise ValueError("Whisper only supports `True` timestamps!")

speechline.config.SegmenterConfig dataclass

Audio segmenter config.

Parameters:

Name Type Description Default
silence_duration float

Minimum in-between silence duration (in seconds) to consider as gaps. Defaults to 3.0 seconds.

0.0
minimum_chunk_duration float

Minimum chunk duration (in seconds) to be exported. Defaults to 0.2 second.

0.2
lexicon_path str

Path to lexicon file. Defaults to None.

None
keep_whitespace bool

Whether to keep whitespace in transcript. Defaults to False.

False
Source code in speechline/config.py
class SegmenterConfig:
    """
    Audio segmenter config.

    Args:
        silence_duration (float, optional):
            Minimum in-between silence duration (in seconds) to consider as gaps.
            Defaults to `3.0` seconds.
        minimum_chunk_duration (float, optional):
            Minimum chunk duration (in seconds) to be exported.
            Defaults to 0.2 second.
        lexicon_path (str, optional):
            Path to lexicon file. Defaults to `None`.
        keep_whitespace (bool, optional):
            Whether to keep whitespace in transcript. Defaults to `False`.
    """

    type: str
    silence_duration: float = 0.0
    minimum_chunk_duration: float = 0.2
    lexicon_path: str = None
    keep_whitespace: bool = False

    def __post_init__(self):
        SUPPORTED_TYPES = {"silence", "word_overlap", "phoneme_overlap"}

        if self.type not in SUPPORTED_TYPES:
            raise ValueError(f"Segmenter of type {self.type} is not yet supported!")

speechline.config.Config dataclass

Main SpeechLine config, contains all other subconfigs.

Parameters:

Name Type Description Default
path str

Path to JSON config file.

required
Source code in speechline/config.py
class Config:
    """
    Main SpeechLine config, contains all other subconfigs.

    Args:
        path (str):
            Path to JSON config file.
    """

    path: str

    def __post_init__(self):
        config = json.load(open(self.path))
        self.do_classify = config.get("do_classify", False)
        self.do_noise_classify = config.get("do_noise_classify", False)
        self.filter_empty_transcript = config.get("filter_empty_transcript", False)

        if self.do_classify:
            self.classifier = ClassifierConfig(**config["classifier"])

        if self.do_noise_classify:
            self.noise_classifier = NoiseClassifierConfig(**config["noise_classifier"])

        self.transcriber = TranscriberConfig(**config["transcriber"])
        self.segmenter = SegmenterConfig(**config["segmenter"])