supertonic serve¶
Run a thin local HTTP server around the same TTS engine. Exposes a native /v1/* namespace plus an OpenAI Audio Speech-compatible alias so any client that already speaks the OpenAI API can swap the base URL.
Requires fastapi + uvicorn
Install with: pip install 'supertonic[serve]'
Usage¶
Default bind is 127.0.0.1:7788. Binding to any other interface is opt-in and emits a one-line stderr warning — put the server behind a reverse proxy when exposing it beyond loopback.
Endpoints¶
| Method | Path | Description |
|---|---|---|
GET | /v1/health | Liveness/readiness, returns {status, model, sample_rate, version, voices_loaded} |
GET | /v1/styles | List built-in voices + imported custom voices |
POST | /v1/styles/import | Upload a Voice Builder JSON (multipart or JSON body); persisted per-model under ~/.cache/<model>/custom_styles/ |
POST | /v1/tts | Native synthesis — full Supertonic parameter set |
POST | /v1/audio/speech | OpenAI-compatible alias for /v1/tts |
POST | /v1/tts/batch | Synthesize up to 64 items in one request (JSON + base64) |
Interactive OpenAPI docs are served at /docs when the process is running.
Quick examples¶
# Native endpoint
curl -X POST http://127.0.0.1:7788/v1/tts \
-H 'content-type: application/json' \
-d '{"text":"Supertonic is a lightning fast, on-device TTS system.","voice":"M1","lang":"en"}' \
-o output.wav
# OpenAI-compatible alias — base-URL swap is enough for OpenAI SDK clients
curl -X POST http://127.0.0.1:7788/v1/audio/speech \
-H 'content-type: application/json' \
-d '{"model":"supertonic-3","input":"Hello in my own cloned voice.","voice":"M1","response_format":"wav"}' \
-o output.wav
# Import a Voice Builder export, then synthesize with it
curl -X POST http://127.0.0.1:7788/v1/styles/import -F "file=@voices/my_voice.json"
curl -X POST http://127.0.0.1:7788/v1/tts \
-H 'content-type: application/json' \
-d '{"text":"Hello in my own cloned voice.","voice":"my_voice","lang":"en"}' \
-o output_own_voice.wav
See the Local Server section in Quick Start for the full walkthrough (Voice Builder import, batch, response formats).
Audio output formats¶
Supported response_format values: wav (default), flac, ogg (Vorbis). MP3, AAC, and Opus are intentionally not supported in v1 — Opus because libsndfile's OPUS encoder is fixed to 8/12/16/24/48 kHz while the model is 44.1 kHz; MP3/AAC because they would add encoder dependencies. Clients should pick one of the supported formats or transcode externally.
Errors¶
Every error response uses the OpenAI-shaped envelope so existing error parsers in OpenAI SDK clients continue to work:
{
"error": {
"message": "unsupported response_format 'mp3'; set response_format to one of: wav, flac, ogg",
"type": "invalid_request_error",
"code": "unsupported_response_format"
}
}
Common codes:
- synthesis:
unknown_voice,unsupported_lang,unsupported_response_format,unknown_model,model_not_loaded,synthesis_failed,not_ready - style import:
style_name_conflict,invalid_style_name,invalid_style_payload,missing_file,missing_name,invalid_json,invalid_body - request size:
payload_too_large,invalid_content_length(from theContent-Lengthpre-flight middleware onPOST /v1/styles/import)
Arguments¶
--host¶
Interface to bind (default: 127.0.0.1; loopback only)
Default: 127.0.0.1
--port¶
Port to listen on (default: 7788)
Default: 7788
--model¶
Possible choices: supertonic, supertonic-2, supertonic-3
Model to load on startup (default: supertonic-3)
Default: supertonic-3
--cors¶
Comma-separated CORS origins to allow (e.g. 'http://localhost:,chrome-extension://'). Omit to disable CORS entirely.
--log-level¶
Possible choices: critical, error, warning, info, debug, trace
uvicorn log level (default: info)
Default: info
-v, --verbose¶
Enable verbose output with detailed logging
Default: False