Home
Label Pipeline
This repository hosts the necessary AWS Lambda scripts to facilitate an automated audio labeling pipeline. The main components of the pipeline includes:
| Component | Description |
|---|---|
| Audio Transcription using AWS Transcribe | Transcribe incoming audios stored in S3 using AWS Transcribe. After transcribing, align audios based on ground truth values and save annotations. |
| Audio Splitting | Based on audio alignment transcriptions, segment audios and split into different files before saving back to S3. |
| Audio Adult/Child Classifier | Classify incoming audios stored in S3 as either adult, or child audios. |
| Integration with AirTable Dashboards | Export AirTable audio annotations (transcript and labels) to S3 by moving files according to their labels. |
| Audio Recording Logger | Logs daily audio recording data from S3 Inventory to AirTable. |
For more details of each component, please check each subdirectory's README file.
Pipeline Overview
The high-level overview of this pipeline is shown below.

Installation
git clone https://github.com/bookbot-kids/label-pipeline.git
cd label-pipeline
pip install -r requirements.txt
References
@misc{label-studio-no-date,
author = {{Label Studio}},
title = {{Improve Audio Transcriptions with Label Studio}},
url = {https://labelstud.io/blog/Improve-Audio-Transcriptions-with-Label-Studio.html},
}