← Back

Project

Kitten TTS

A locally-executed neural text-to-speech system deployed as a Chrome extension with FastAPI backend orchestration.

Overview

This project implements a client-side text-to-speech solution utilizing Kitten TTS - an open-source neural TTS engine - deployed as a local FastAPI microservice. The architecture eliminates API dependencies, cloud latency, and privacy concerns by performing all inference operations on-device.

Architecture

The system follows a distributed microservices pattern with a local inference server. The Chrome extension handles the frontend - DOM traversal, text extraction, audio playback, and user controls - while the FastAPI backend manages neural model inference and real-time audio streaming.

background.js - Service worker for command routing and lifecycle management

content.js - DOM traversal, text extraction, and audio playback orchestration

popup - User interface for voice selection and playback controls

main.py - Async REST API server handling TTS requests with 8-voice polyphonic synthesis

Local Inference Pipeline

Text Input → Tokenization → Neural Forward Pass → Audio Buffer → Browser Playback

Zero API costs and rate limits

Sub-50ms latency on modern hardware

Complete data sovereignty - no text leaves your machine

Offline operation after initial model download

Use Cases

Accessibility - Screen reader alternative with neural voice quality

Content consumption - Listen to articles, documentation, or research papers

Multitasking - Absorb written content while performing other tasks

Language learning - Hear proper pronunciation with adjustable playback speed

Chrome ExtensionFastAPIONNXTransformers.jsPython
GitHub