AWS Lambda Event Handler
audio_recording_logger.lambda_function
calculate_audio_duration(size_bytes, sample_rate, bit_depth=None, bit_rate=None, num_channels=1)
Calculates audio duration based on audio file metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size_bytes |
int
|
Size of file in bytes. |
required |
sample_rate |
int
|
Sample rate of audio. |
required |
bit_depth |
int
|
Bit depth of audio, e.g. 16. Defaults to None. |
None
|
bit_rate |
int
|
Bit rate of audio if compressed, e.g. 95kbps. Defaults to None. |
None
|
num_channels |
int
|
Number of channels in audio. Defaults to 1 (mono). |
1
|
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
Estimated audio duration, in seconds. |
Source code in src/audio_recording_logger/lambda_function.py
get_log_files(bucket, manifest_file_path)
Gets all log files as listed in S3 inventory manifest file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bucket |
str
|
AWS S3 manifest file's bucket name. |
required |
manifest_file_path |
str
|
AWS S3 path to manifest file. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: DataFrame of all log files, concatenated. |
Source code in src/audio_recording_logger/lambda_function.py
groupby_language_total_bytes(df, folder)
Filters DataFrame by folder, then groups by:
date
,folder
,language
,language-code
,
then sums the duration of each group.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
Preprocessed DataFrame to group. |
required |
folder |
str
|
Name of folder to filter for. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Filtered and grouped-by DataFrame. |
Source code in src/audio_recording_logger/lambda_function.py
lambda_handler(event, context)
Event listener for S3 event and calls the daily logger function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
event |
AWS Event
|
A JSON-formatted document that contains data for a Lambda function to process. |
required |
context |
AWS Context
|
An object that provides methods and properties that provide information about the invocation, function, and runtime environment. |
required |
Source code in src/audio_recording_logger/lambda_function.py
main(bucket, manifest_file_path, query_date)
Main function to be executed by lambda_handler
.
- Gets all log files from manifest file, then preprocesses it.
- Gets all audio in
training
andarchive
folder.- Groups audios based on language code.
- Calculates total audio duration for each language code.
- Converts
date
column to string for AirTable upload purposes. - Drops unused
size
column.
- Push both resultant DataFrames to AirTable.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bucket |
str
|
AWS S3 manifest file's bucket name. |
required |
manifest_file_path |
str
|
AWS S3 path to manifest file. |
required |
query_date |
datetime.date
|
Query date to filter with. |
required |
Source code in src/audio_recording_logger/lambda_function.py
preprocess_dataframe(df)
Perform basic preprocessing on concatenated S3 inventory log files.
- Rename columns to
bucket
,key
,size
, andlast_modified_date
. - Converts
date
column todatetime
type. - Gets language code of item based on
key
, e.g.en-AU
. - Gets language of item based on
language
, e.g.en
. - Gets folder name of item based on
key
, e.g.training
,archive
. - Gets filename suffix based on
key
, e.g.aac
,wav
. - Only filters for audio files based on extensions found in
AUDIO_EXTENSIONS
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
pd.DataFrame
|
DataFrame of all log files, concatenated. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame
|
pd.DataFrame: Preprocessed DataFrame based on the outline above. |