voski

Susuru

🌊 Live Demo: http://turpin.dev/susuru/

Overview

Susuru is a truly offline English ↔ Spanish speech recognition and translation tool powered by Vosk WebAssembly with locally cached models. Features live translation and beautiful ocean wave FFT visualisation. Click the waves to start, speak naturally, and get instant translations - all without sending any audio to external servers.

Key Features

Quick Start

Web Application

make        # Build and serve at http://localhost:8001

Then open http://localhost:8001 in your browser and start speaking!

Available Commands

Technology Stack

Platform:

Speech Recognition:

Translation:

Audio Processing:

Interface:

Usage

  1. Ocean Demo - Beautiful waves animate automatically on load
  2. Start Recognition - Click the ocean waves to begin speech recognition
  3. Speak Naturally - Talk in English or Spanish for real-time transcription
  4. Auto Translation - Short phrases auto-translate after 2-second pause
  5. Manual Translation - Press Escape to force translate longer text
  6. Switch Languages - Click the banner (V*SKI) to toggle en-GB ↔ es-ES
  7. Clean Display - See transcription and translation in clean two-line format

Visual Features

Architecture

graph TD
    A[🌊 Ocean Demo Animation] --> B[👆 Click to Start Recognition]
    B --> C[🎤 Microphone Access 16kHz]
    C --> D[🎵 Web Audio API]
    D --> E[📊 FFT + Speech Processing]
    E --> F[🗣️ Vosk Speech Recognition]
    F --> G[📝 Live Transcription Display]
    G --> H[🌐 Auto Translation (MyMemory API)]
    H --> I[📋 Two-line Display: Original + Translation]
    I --> J[👆 Click Banner to Switch Language]
    J --> K[🔄 Toggle en-GB ↔ es-ES]

    subgraph "Speech Pipeline"
        D --> D1[ScriptProcessorNode]
        D --> D2[Float32 → Int16]
        F --> F1[Vosk Models 67MB]
        F --> F2[Web Speech API Fallback]
        G --> G1[Real-time Display]
        G --> G2[Upward Fade Effect]
    end

    subgraph "Visual Pipeline"
        E --> V1[Ocean Wave FFT]
        E --> V2[Peak Detection]
        E --> V3[Pink Glow Effects]
        I --> V4[Language Indicator]
    end

    style A fill:#e1f5fe
    style F fill:#f3e5f5
    style G fill:#e8f5e8

Development

The codebase uses a clean OOP structure with the main SusuruTranslator class handling:

Project Structure

susuru/
├── index.html          # Main application with speech recognition
├── susuru.js           # Core SusuruTranslator class with Vosk integration
├── vosk.js             # Vosk browser WebAssembly library
├── auto-refresh.js     # Development auto-reload
├── models/             # Vosk speech recognition models
│   ├── vosk-model-small-en-us-0.15/  # English model (41MB)
│   └── vosk-model-small-es-0.42/     # Spanish model (39MB)
├── Makefile           # Build and serve commands
├── package.json       # Dependencies including vosk-browser
└── README.md          # This documentation

Inspiration

Susuru combines offline speech recognition with the clean design philosophy of roomtone. The ocean wave FFT provides beautiful visualisation while Vosk WebAssembly models enable privacy-focused speech recognition without cloud dependencies.

Perfect for:


Live Demo: http://turpin.dev/susuru/ 🌊📱