voski

Susuru

🌊 Live Demo: http://turpin.dev/susuru/

Overview

Susuru is a truly offline English ↔ Spanish speech recognition and translation tool powered by Vosk WebAssembly with locally cached models. Features live translation and beautiful ocean wave FFT visualisation. Click the waves to start, speak naturally, and get instant translations - all without sending any audio to external servers.

Key Features

100% Offline Speech Recognition - Vosk WebAssembly with locally cached models (67MB total)
Real-Time Translation - Automatic English ↔ Spanish translation with smart timeout
Manual Language Toggle - Click banner to switch between en-GB ↔ es-ES
Ocean Wave FFT - Beautiful frequency spectrum visualisation with surf physics
No Cloud Dependencies - Models cached in service worker for true offline use
Clean Text Display - Two-line layout: transcription + translation
Smart Auto-Translation - Short phrases (< 80 chars) auto-translate after 2s pause
Rate Limit Resilience - Intelligent fallback when translation APIs hit limits
Privacy First - All audio processing in browser, never sent to servers
Mobile Optimised - Touch-friendly responsive design

Quick Start

Web Application

make        # Build and serve at http://localhost:8001

Then open http://localhost:8001 in your browser and start speaking!

Available Commands

make - Build and serve the web application
make serve - Serve at http://localhost:8001
make build - Install dependencies and generate test audio
make test - Run the test suite
make clean - Clean build artifacts

Technology Stack

Platform:

HTML5 + JavaScript + Vosk WebAssembly
True offline speech recognition with locally cached 67MB models
Web Audio API for microphone and FFT
Service Worker for offline model caching
Responsive design for mobile and desktop

Speech Recognition:

Vosk offline models: vosk-model-small-en-us-0.15, vosk-model-small-es-0.42
Models cached locally via Service Worker for offline use
Real-time audio processing via ScriptProcessorNode
Float32 → Int16 conversion for WebAssembly
Optional fallback to Web Speech API if Vosk unavailable

Translation:

MyMemory API (100 requests/day, no account required)
Smart auto-translation with length-based timeout
Rate limiting protection with graceful degradation
Future: Multiple provider support (DeepL, Bergamot offline)

Audio Processing:

Web Audio API createAnalyser() + ScriptProcessorNode
2048-point FFT for ocean wave visualisation
16kHz audio sampling optimised for Vosk
Peak frequency detection with pink glow effects

Interface:

Manual language toggle via banner clicks
Clean two-line display format (transcription + translation)
Canvas-based ocean wave FFT visualisation
Touch-friendly responsive mobile interface
Auto-refresh for development

Usage

Ocean Demo - Beautiful waves animate automatically on load
Start Recognition - Click the ocean waves to begin speech recognition
Speak Naturally - Talk in English or Spanish for real-time transcription
Auto Translation - Short phrases auto-translate after 2-second pause
Manual Translation - Press Escape to force translate longer text
Switch Languages - Click the banner (V*SKI) to toggle en-GB ↔ es-ES
Clean Display - See transcription and translation in clean two-line format

Visual Features

Ocean waves with realistic surf break physics for FFT demo
Green-to-blue frequency gradient matching audio spectrum
Pink glow effects for speech activity and peak detection
Two-line text display with original transcription + translated text below
Language-specific colors (green for English, orange for Spanish)
Italic translations to distinguish from original transcription
Language indicator showing current recognition language (en-GB/es-ES)

Architecture

graph TD
    A[🌊 Ocean Demo Animation] --> B[👆 Click to Start Recognition]
    B --> C[🎤 Microphone Access 16kHz]
    C --> D[🎵 Web Audio API]
    D --> E[📊 FFT + Speech Processing]
    E --> F[🗣️ Vosk Speech Recognition]
    F --> G[📝 Live Transcription Display]
    G --> H[🌐 Auto Translation (MyMemory API)]
    H --> I[📋 Two-line Display: Original + Translation]
    I --> J[👆 Click Banner to Switch Language]
    J --> K[🔄 Toggle en-GB ↔ es-ES]

    subgraph "Speech Pipeline"
        D --> D1[ScriptProcessorNode]
        D --> D2[Float32 → Int16]
        F --> F1[Vosk Models 67MB]
        F --> F2[Web Speech API Fallback]
        G --> G1[Real-time Display]
        G --> G2[Upward Fade Effect]
    end

    subgraph "Visual Pipeline"
        E --> V1[Ocean Wave FFT]
        E --> V2[Peak Detection]
        E --> V3[Pink Glow Effects]
        I --> V4[Language Indicator]
    end

    style A fill:#e1f5fe
    style F fill:#f3e5f5
    style G fill:#e8f5e8

Development

The codebase uses a clean OOP structure with the main SusuruTranslator class handling:

Vosk Integration - Offline speech recognition with WebAssembly models
Language Switching - Manual toggle between English and Spanish
Audio Processing - Dual-path FFT visualisation + speech recognition
Canvas Rendering - Ocean wave demo with real-time spectrum analysis
Translation Integration - Real-time English ↔ Spanish translation with MyMemory API
Transcription Display - Clean two-line format with original text + translation
Fallback System - Graceful degradation from Vosk → Web Speech API

Project Structure

susuru/
├── index.html          # Main application with speech recognition
├── susuru.js           # Core SusuruTranslator class with Vosk integration
├── vosk.js             # Vosk browser WebAssembly library
├── auto-refresh.js     # Development auto-reload
├── models/             # Vosk speech recognition models
│   ├── vosk-model-small-en-us-0.15/  # English model (41MB)
│   └── vosk-model-small-es-0.42/     # Spanish model (39MB)
├── Makefile           # Build and serve commands
├── package.json       # Dependencies including vosk-browser
└── README.md          # This documentation

Inspiration

Susuru combines offline speech recognition with the clean design philosophy of roomtone. The ocean wave FFT provides beautiful visualisation while Vosk WebAssembly models enable privacy-focused speech recognition without cloud dependencies.

Perfect for:

Offline speech recognition and transcription applications
Language learning and pronunciation practice
Privacy-focused voice interfaces (no cloud processing)
Educational demos of WebAssembly and speech processing
Mobile speech apps development and testing
Voice-controlled applications with visual feedback

Live Demo: http://turpin.dev/susuru/ 🌊📱