Susuru
🌊 Live Demo: http://turpin.dev/susuru/
Overview
Susuru is a truly offline English ↔ Spanish speech recognition and translation tool powered by Vosk WebAssembly with locally cached models. Features live translation and beautiful ocean wave FFT visualisation. Click the waves to start, speak naturally, and get instant translations - all without sending any audio to external servers.
Key Features
- 100% Offline Speech Recognition - Vosk WebAssembly with locally cached models (67MB total)
- Real-Time Translation - Automatic English ↔ Spanish translation with smart timeout
- Manual Language Toggle - Click banner to switch between en-GB ↔ es-ES
- Ocean Wave FFT - Beautiful frequency spectrum visualisation with surf physics
- No Cloud Dependencies - Models cached in service worker for true offline use
- Clean Text Display - Two-line layout: transcription + translation
- Smart Auto-Translation - Short phrases (< 80 chars) auto-translate after 2s pause
- Rate Limit Resilience - Intelligent fallback when translation APIs hit limits
- Privacy First - All audio processing in browser, never sent to servers
- Mobile Optimised - Touch-friendly responsive design
Quick Start
Web Application
make # Build and serve at http://localhost:8001
Then open http://localhost:8001 in your browser and start speaking!
Available Commands
make
- Build and serve the web application
make serve
- Serve at http://localhost:8001
make build
- Install dependencies and generate test audio
make test
- Run the test suite
make clean
- Clean build artifacts
Technology Stack
Platform:
- HTML5 + JavaScript + Vosk WebAssembly
- True offline speech recognition with locally cached 67MB models
- Web Audio API for microphone and FFT
- Service Worker for offline model caching
- Responsive design for mobile and desktop
Speech Recognition:
- Vosk offline models: vosk-model-small-en-us-0.15, vosk-model-small-es-0.42
- Models cached locally via Service Worker for offline use
- Real-time audio processing via ScriptProcessorNode
- Float32 → Int16 conversion for WebAssembly
- Optional fallback to Web Speech API if Vosk unavailable
Translation:
- MyMemory API (100 requests/day, no account required)
- Smart auto-translation with length-based timeout
- Rate limiting protection with graceful degradation
- Future: Multiple provider support (DeepL, Bergamot offline)
Audio Processing:
- Web Audio API
createAnalyser()
+ ScriptProcessorNode
- 2048-point FFT for ocean wave visualisation
- 16kHz audio sampling optimised for Vosk
- Peak frequency detection with pink glow effects
Interface:
- Manual language toggle via banner clicks
- Clean two-line display format (transcription + translation)
- Canvas-based ocean wave FFT visualisation
- Touch-friendly responsive mobile interface
- Auto-refresh for development
Usage
- Ocean Demo - Beautiful waves animate automatically on load
- Start Recognition - Click the ocean waves to begin speech recognition
- Speak Naturally - Talk in English or Spanish for real-time transcription
- Auto Translation - Short phrases auto-translate after 2-second pause
- Manual Translation - Press Escape to force translate longer text
- Switch Languages - Click the banner (V*SKI) to toggle en-GB ↔ es-ES
- Clean Display - See transcription and translation in clean two-line format
Visual Features
- Ocean waves with realistic surf break physics for FFT demo
- Green-to-blue frequency gradient matching audio spectrum
- Pink glow effects for speech activity and peak detection
- Two-line text display with original transcription + translated text below
- Language-specific colors (green for English, orange for Spanish)
- Italic translations to distinguish from original transcription
- Language indicator showing current recognition language (en-GB/es-ES)
Architecture
graph TD
A[🌊 Ocean Demo Animation] --> B[👆 Click to Start Recognition]
B --> C[🎤 Microphone Access 16kHz]
C --> D[🎵 Web Audio API]
D --> E[📊 FFT + Speech Processing]
E --> F[🗣️ Vosk Speech Recognition]
F --> G[📝 Live Transcription Display]
G --> H[🌐 Auto Translation (MyMemory API)]
H --> I[📋 Two-line Display: Original + Translation]
I --> J[👆 Click Banner to Switch Language]
J --> K[🔄 Toggle en-GB ↔ es-ES]
subgraph "Speech Pipeline"
D --> D1[ScriptProcessorNode]
D --> D2[Float32 → Int16]
F --> F1[Vosk Models 67MB]
F --> F2[Web Speech API Fallback]
G --> G1[Real-time Display]
G --> G2[Upward Fade Effect]
end
subgraph "Visual Pipeline"
E --> V1[Ocean Wave FFT]
E --> V2[Peak Detection]
E --> V3[Pink Glow Effects]
I --> V4[Language Indicator]
end
style A fill:#e1f5fe
style F fill:#f3e5f5
style G fill:#e8f5e8
Development
The codebase uses a clean OOP structure with the main SusuruTranslator
class handling:
- Vosk Integration - Offline speech recognition with WebAssembly models
- Language Switching - Manual toggle between English and Spanish
- Audio Processing - Dual-path FFT visualisation + speech recognition
- Canvas Rendering - Ocean wave demo with real-time spectrum analysis
- Translation Integration - Real-time English ↔ Spanish translation with MyMemory API
- Transcription Display - Clean two-line format with original text + translation
- Fallback System - Graceful degradation from Vosk → Web Speech API
Project Structure
susuru/
├── index.html # Main application with speech recognition
├── susuru.js # Core SusuruTranslator class with Vosk integration
├── vosk.js # Vosk browser WebAssembly library
├── auto-refresh.js # Development auto-reload
├── models/ # Vosk speech recognition models
│ ├── vosk-model-small-en-us-0.15/ # English model (41MB)
│ └── vosk-model-small-es-0.42/ # Spanish model (39MB)
├── Makefile # Build and serve commands
├── package.json # Dependencies including vosk-browser
└── README.md # This documentation
Inspiration
Susuru combines offline speech recognition with the clean design philosophy of roomtone. The ocean wave FFT provides beautiful visualisation while Vosk WebAssembly models enable privacy-focused speech recognition without cloud dependencies.
Perfect for:
- Offline speech recognition and transcription applications
- Language learning and pronunciation practice
- Privacy-focused voice interfaces (no cloud processing)
- Educational demos of WebAssembly and speech processing
- Mobile speech apps development and testing
- Voice-controlled applications with visual feedback
Live Demo: http://turpin.dev/susuru/ 🌊📱