Skip to content

SRT2TXT

transcribe.srt2txt

srt2txt(srt_string)

Converts stream of srt subtitles to text format.

Parameters:

Name Type Description Default
srt_string str

String-representation of srt subtitles.

required

Returns:

Name Type Description
str str

Cleaned text format of subtitles concatenated with space.

Source code in src/transcribe/srt2txt.py
def srt2txt(srt_string: str) -> str:
    """Converts stream of srt subtitles to text format.

    Args:
        srt_string (str): String-representation of srt subtitles.

    Returns:
        str: Cleaned text format of subtitles concatenated with space.
    """
    subs = pysrt.from_string(srt_string)
    texts = [sub.text for sub in subs]
    # filter for empty strings
    texts = list(filter(lambda text: len(text) > 0, texts))
    # filter special tokens like [Music] and [Applause]
    texts = list(filter(lambda text: text[0] != "[" and text[-1] != "]", texts))
    texts = " ".join(texts)
    texts = texts.replace("\n", " ")
    return texts