Home
Label Pipeline
This repository hosts the necessary AWS Lambda scripts to facilitate an automated audio labeling pipeline. The main components of the pipeline includes:
Component | Description |
---|---|
Audio Transcription using AWS Transcribe | Transcribe incoming audios stored in S3 using AWS Transcribe. After transcribing, align audios based on ground truth values and save annotations. |
Audio Splitting | Based on audio alignment transcriptions, segment audios and split into different files before saving back to S3. |
Audio Adult/Child Classifier | Classify incoming audios stored in S3 as either adult, or child audios. |
Integration with AirTable Dashboards | Export AirTable audio annotations (transcript and labels) to S3 by moving files according to their labels. |
Audio Recording Logger | Logs daily audio recording data from S3 Inventory to AirTable. |
For more details of each component, please check each subdirectory's README file.
Pipeline Overview
The high-level overview of this pipeline is shown below.
Installation
git clone https://github.com/bookbot-kids/label-pipeline.git
cd label-pipeline
pip install -r requirements.txt
References
@misc{label-studio-no-date,
author = {{Label Studio}},
title = {{Improve Audio Transcriptions with Label Studio}},
url = {https://labelstud.io/blog/Improve-Audio-Transcriptions-with-Label-Studio.html},
}