supertonic serve¶

Run a thin local HTTP server around the same TTS engine. Exposes a native /v1/* namespace plus an OpenAI Audio Speech-compatible alias so any client that already speaks the OpenAI API can swap the base URL.

Requires fastapi + uvicorn

Install with: pip install 'supertonic[serve]'

Usage¶

supertonic serve [--host HOST] [--port PORT] [OPTIONS]

Default bind is 127.0.0.1:7788. Binding to any other interface is opt-in and emits a one-line stderr warning — put the server behind a reverse proxy when exposing it beyond loopback.

Endpoints¶

Method	Path	Description
`GET`	`/v1/health`	Liveness/readiness, returns `{status, model, sample_rate, version, voices_loaded}`
`GET`	`/v1/styles`	List built-in voices + imported custom voices
`POST`	`/v1/styles/import`	Upload a Voice Builder JSON (multipart or JSON body); persisted per-model under `~/.cache/<model>/custom_styles/`
`POST`	`/v1/tts`	Native synthesis — full Supertonic parameter set
`POST`	`/v1/audio/speech`	OpenAI-compatible alias for `/v1/tts`
`POST`	`/v1/tts/batch`	Synthesize up to 64 items in one request (JSON + base64)

Interactive OpenAPI docs are served at /docs when the process is running.

Quick examples¶

# Native endpoint
curl -X POST http://127.0.0.1:7788/v1/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Supertonic is a lightning fast, on-device TTS system.","voice":"M1","lang":"en"}' \
  -o output.wav

# OpenAI-compatible alias — base-URL swap is enough for OpenAI SDK clients
curl -X POST http://127.0.0.1:7788/v1/audio/speech \
  -H 'content-type: application/json' \
  -d '{"model":"supertonic-3","input":"Hello in my own cloned voice.","voice":"M1","response_format":"wav"}' \
  -o output.wav

# Import a Voice Builder export, then synthesize with it
curl -X POST http://127.0.0.1:7788/v1/styles/import -F "file=@voices/my_voice.json"
curl -X POST http://127.0.0.1:7788/v1/tts \
  -H 'content-type: application/json' \
  -d '{"text":"Hello in my own cloned voice.","voice":"my_voice","lang":"en"}' \
  -o output_own_voice.wav

See the Local Server section in Quick Start for the full walkthrough (Voice Builder import, batch, response formats).

Audio output formats¶

Supported response_format values: wav (default), flac, ogg (Vorbis). MP3, AAC, and Opus are intentionally not supported in v1 — Opus because libsndfile's OPUS encoder is fixed to 8/12/16/24/48 kHz while the model is 44.1 kHz; MP3/AAC because they would add encoder dependencies. Clients should pick one of the supported formats or transcode externally.

Errors¶

Every error response uses the OpenAI-shaped envelope so existing error parsers in OpenAI SDK clients continue to work:

{
  "error": {
    "message": "unsupported response_format 'mp3'; set response_format to one of: wav, flac, ogg",
    "type": "invalid_request_error",
    "code": "unsupported_response_format"
  }
}

Common codes:

synthesis: unknown_voice, unsupported_lang, unsupported_response_format, unknown_model, model_not_loaded, synthesis_failed, not_ready
style import: style_name_conflict, invalid_style_name, invalid_style_payload, missing_file, missing_name, invalid_json, invalid_body
request size: payload_too_large, invalid_content_length (from the Content-Length pre-flight middleware on POST /v1/styles/import)

Arguments¶

`--host`¶

Interface to bind (default: 127.0.0.1; loopback only)

Default: 127.0.0.1

`--port`¶

Port to listen on (default: 7788)

Default: 7788

`--model`¶

Possible choices: supertonic, supertonic-2, supertonic-3

Model to load on startup (default: supertonic-3)

Default: supertonic-3

`--cors`¶

Comma-separated CORS origins to allow (e.g. 'http://localhost:,chrome-extension://'). Omit to disable CORS entirely.

`--log-level`¶

Possible choices: critical, error, warning, info, debug, trace

uvicorn log level (default: info)

Default: info

`-v`, `--verbose`¶

Enable verbose output with detailed logging

Default: False