Home
Children's Speech Recognizer Spanish
A cross platform (Android/iOS/MacOS) Spanish children's speech recognizer library, written in Flutter and leveraging the Kaldi framework. The speech recognizer library reads a buffer from a microphone device and converts spoken words into text in near-instant inference time with high accuracy. This library is also extensible to your own custom speech recognition model!
Note
Since our built-in default model was trained on children's speech, it may perform poorly on adult's speech.
Features
- Spanish speech-to-text through a Kaldi-based automatic speech recognition (ASR) model, trained on children's speech.
- Integrate speech-to-text model with mobile and desktop applications.
Installation / Setup
- Install Flutter SDK.
- Install Visual Studio Code.
- Open the project in Visual Studio Code, navigate to
lib/main.dart. - Launch an Android emulator or iOS simulator. Optionaly, you can also connect to a real device.
- Run the demo on Android/iOS/MacOS by going to the top navigation bar of VSCode, hit Run, then Start Debugging.
Android
On Android, you will need to allow microphone permission in AndroidManifest.xml like so:
<uses-feature android:name="android.hardware.microphone" android:required="false"/>
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
iOS
Similarly on iOS/MacOS:
- Open Xcode
- Navigate to
Info.plist - Add microphone permission
NSMicrophoneUsageDescription. You can follow this guide.
How to Use
Flutter Sample App
- After setting up, run the app by pressing the
Load modelbutton and thenStart listening - Speak into the microphone and the corresponding output text will be displayed in the text field.
- Press
Stop listeningto stop the app from listening.
import 'package:speech_recognizer/speech_recognizer.dart';
class _MyHomePageState implements SpeechListener { // (1)
final recognizer = SpeechController.shared;
void _load() async {
// ask for permission
final permissions = await SpeechController.shared.permissions(); // (2)
if (permissions == AudioSpeechPermission.undetermined) {
await SpeechController.shared.authorize();
}
if (await SpeechController.shared.permissions() !=
AudioSpeechPermission.authorized) {
return;
}
if (!_isInitialized) {
await SpeechController.shared.initSpeech('id'); // (3)
setState(() {
_isInitialized = true;
});
SpeechController.shared.addListener(this); // (4)
}
}
/// listen to speech events and print result in UI
void onResult(
String transcript, bool wasEndpoint, bool resetEndPos,
bool isVoiceActive, bool isNoSpeech) {
if (transcript.isEmpty) {
return;
}
print(transcript);
setState(() {
_decoded.insert(0, transcript);
});
}
}
- Setup listener by implements
SpeechListenerin your class. - Ask for recording permission.
- Initialize Spanish recognizer model.
- Register listener in this class.
- Output text listener while speaking.
- Normalized result.
Architecture
This library uses Flutter Platform Channels to enable communication between Dart (Flutter) and native code (Android/iOS). The architecture follows a three-layer design:
1. Flutter Layer (Dart)
The Flutter layer provides a high-level API through the SpeechController class, which communicates with native platforms using:
- Method Channel (
com.bookbot/control): For sending commands to native code - Event Channel (
com.bookbot/event): For receiving continuous speech recognition results
// Example: Flutter sends command to native platform
await methodChannel.invokeMethod('initSpeech', [language, profileId, wordMode]);
// Example: Flutter receives events from native platform
eventChannel.receiveBroadcastStream().listen((event) {
final transcript = event['transcript'];
final wasEndpoint = event['wasEndpoint'];
// Process recognition results
});
2. Platform Channel Bridge
Platform channels act as a bridge between Flutter and native code:
| Channel Name | Type | Purpose |
|---|---|---|
com.bookbot/control |
MethodChannel | Send commands (init, listen, stop, etc.) |
com.bookbot/event |
EventChannel | Receive recognition results continuously |
com.bookbot/levels |
EventChannel | Receive audio level updates |
com.bookbot/recognizer |
EventChannel | Receive recognizer running status |
3. Native Layer (Android/iOS)
Android Implementation (Kotlin)
The Android native code in SpeechController.kt handles:
- Microphone Permission Management: Requests and checks
RECORD_AUDIOpermission - Speech Recognition Service: Integrates with Sherpa-ONNX ASR engine
- Audio Processing: Captures audio from microphone using Android's audio APIs
- Real-time Recognition: Processes audio buffers and sends results back to Flutter
// Android: Registering the plugin
class MainActivity : FlutterActivity() {
override fun configureFlutterEngine(flutterEngine: FlutterEngine) {
speechController = SpeechController(this, lifecycle)
flutterEngine.plugins.add(speechController)
}
}
// Android: Handling method calls from Flutter
override fun onMethodCall(call: MethodCall, result: MethodChannel.Result) {
when (call.method) {
"initSpeech" -> initSpeech(call.arguments as List<String?>, result)
"listen" -> startSpeech()
"stopListening" -> stopSpeech()
// ... other methods
}
}
// Android: Sending results back to Flutter
override fun onSpeechResult(result: String, wasEndpoint: Boolean, ...) {
eventSink?.success(hashMapOf(
"transcript" to result,
"wasEndpoint" to wasEndpoint,
"isVoiceActive" to isVoiceActive
))
}
iOS Implementation (Swift)
The iOS native code in SpeechController.swift handles:
- Audio Session Management: Configures
AVAudioSessionfor recording - Audio Engine: Uses
AVAudioEngineto capture microphone input - Voice Activity Detection (VAD): Detects speech vs silence using Sherpa-ONNX VAD
- Speech Recognition: Processes audio with Sherpa-ONNX ASR model
// iOS: Registering the plugin
public static func register(with registrar: FlutterPluginRegistrar) {
let channel = FlutterMethodChannel(
name: "com.bookbot/control",
binaryMessenger: messenger
)
registrar.addMethodCallDelegate(instance, channel: channel)
let eventChannel = FlutterEventChannel(
name: "com.bookbot/event",
binaryMessenger: messenger
)
eventChannel.setStreamHandler(instance)
}
// iOS: Handling method calls from Flutter
public func handle(_ call: FlutterMethodCall, result: @escaping FlutterResult) {
switch call.method {
case "initSpeech":
initSpeech(profileId: profileId, language: language, ...)
case "listen":
startListening()
case "stopListening":
stopListening()
// ... other methods
}
}
// iOS: Processing audio buffers
engine.inputNode.installTap(onBus: 0, bufferSize: bufferSize, format: format) { buffer, _ in
self.recognize(buffer: buffer)
}
Speech Recognition Flow
┌─────────────────────────────────────────────────────────────────┐
│ Flutter App (Dart) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ SpeechController.shared.listen() │ │
│ └───────────────────────┬───────────────────────────────────┘ │
└────────────────────────────┼─────────────────────────────────────┘
│ Method Channel
▼
┌─────────────────────────────────────────────────────────────────┐
│ Native Platform (Android/iOS) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ 1. Request Microphone Permission │ │
│ │ 2. Initialize AVAudioEngine/AudioRecord │ │
│ │ 3. Load Sherpa-ONNX ASR Model │ │
│ │ 4. Start Capturing Audio (100ms buffers) │ │
│ └───────────────────────┬───────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────▼───────────────────────────────────┐ │
│ │ Audio Buffer Processing: │ │
│ │ • Convert to 16kHz PCM Float32 │ │
│ │ • Run Voice Activity Detection (VAD) │ │
│ │ • Feed to Sherpa-ONNX Recognizer │ │
│ │ • Decode Speech → Text │ │
│ └───────────────────────┬───────────────────────────────────┘ │
│ │ Event Channel │
└────────────────────────────┼─────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Flutter App (Dart) │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Receive Results: { transcript, wasEndpoint, ... } │ │
│ │ Update UI with recognized text │ │
│ └───────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Key Technical Details
- Audio Processing:
- Microphone captures raw audio at native sample rate (typically 48kHz)
- Audio is resampled to 16kHz for ASR model compatibility
-
Buffer duration: 100ms for optimal latency
-
Voice Activity Detection (VAD):
- Uses Silero VAD model with 25ms window size
- Detects speech/silence patterns:
[silence][speech][silence] -
Patience counters prevent false endpoint detection
-
Recognition Modes:
- Phoneme Mode: Returns phonetic tokens for pronunciation analysis
-
Word Mode: Returns complete words for text transcription
-
Thread Safety:
- Android: Uses coroutines and synchronized blocks
- iOS: Uses dedicated DispatchQueues for recognition, audio, and level processing
File Structure
| Platform | Code | Function |
|---|---|---|
| Flutter | speech_recognizer.dart |
Interface API to communicate with native platform (Android/iOS/Mac). There are many speech recognizer methods, check lib/main.dart to know how to use them. |
| All Platforms | asr/es |
Speech model shared for all platforms. |
| iOS/MacOS | SpeechController.swift |
Native platform channel for speech recognizer on iOS/MacOS. It uses sherpa-onnx with custom model. |
| Android | SpeechController.kt |
Native platform channel for speech recognizer on android. It uses sherpa-onnx with custom model. |
UI Automation Testing
- Follow Installation / Setup guide
- Launch an Android emulator or iOS simulator
- Run
flutter test integration_test/app_test.dart
https://github.com/user-attachments/assets/46476c73-cfbb-442d-8e81-3199fe0f704d