Skip to content

supertonic.server

supertonic.server

Local HTTP server for Supertonic TTS.

This subpackage is optional. It depends on fastapi, uvicorn, and python-multipart which install via the [serve] extra:

pip install supertonic[serve]

It exposes a thin FastAPI wrapper around :class:supertonic.pipeline.TTS designed for local-only integration with n8n, browser extensions, Electron, Unity, Home Assistant, robotics devices, and any client that already speaks the OpenAI Audio Speech API.

Public surface:

  • :func:create_app — build a FastAPI ASGI app (model loads in lifespan).
  • :class:ServerState — shared runtime state if you need to inject a pre-loaded TTS (e.g. tests).
  • :data:__all__ listed below.

Modules:

Name Description
app

FastAPI application factory for supertonic serve.

audio

Audio encoding helpers for the local TTS server.

routes

HTTP route handlers for supertonic serve.

schemas

Pydantic request/response schemas for the local TTS server.

styles_store

On-disk store for user-imported voice styles.

Classes:

Name Description
ServerState

Mutable shared state used by every request handler.

Functions:

Name Description
create_app

Build a configured FastAPI app.

ServerState

ServerState(
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None
)

Mutable shared state used by every request handler.

Attributes:

Name Type Description
model

Model name to load (e.g. "supertonic-3").

tts

Loaded :class:supertonic.TTS instance, None until the lifespan finishes.

custom_styles

{stem: path} for user-imported style JSONs.

custom_styles_dir

Directory on disk that backs custom_styles.

synth_lock

Serializes ONNX Runtime inference across threads (FastAPI executes sync handlers in a threadpool).

is_ready

True once the lifespan has finished initialization.

Source code in supertonic/server/app.py
def __init__(
    self,
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None,
) -> None:
    self.model = model
    self.tts = tts
    # Custom styles default to the *model's* cache dir, so the same name
    # cannot collide across model versions.
    self.custom_styles_dir = (
        Path(custom_styles_dir)
        if custom_styles_dir
        else styles_store.default_custom_styles_dir(model)
    )
    self.custom_styles = dict(custom_styles or {})
    self.synth_lock = threading.Lock()
    self.is_ready = False

model instance-attribute

model = model

tts instance-attribute

tts = tts

custom_styles_dir instance-attribute

custom_styles_dir = (
    Path(custom_styles_dir)
    if custom_styles_dir
    else default_custom_styles_dir(model)
)

custom_styles instance-attribute

custom_styles = dict(custom_styles or {})

synth_lock instance-attribute

synth_lock = Lock()

is_ready instance-attribute

is_ready = False

create_app

create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None
) -> FastAPI

Build a configured FastAPI app.

Parameters:

Name Type Description Default
state Optional[ServerState]

Pre-built state to reuse. When provided, the lifespan does not instantiate :class:supertonic.TTS — useful for tests that inject a fake. Pass None for normal use.

None
model str

Model name to load if state.tts is None.

DEFAULT_MODEL
custom_styles_dir Optional[Path]

Override the on-disk location of user-imported voice styles. Defaults to :func:supertonic.server.styles_store.default_custom_styles_dir.

None
cors_origins Optional[Iterable[str]]

If non-empty, install CORSMiddleware for these origins. Browser-extension or Electron clients need this; n8n and curl do not.

None
Source code in supertonic/server/app.py
def create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None,
) -> FastAPI:
    """Build a configured FastAPI app.

    Args:
        state: Pre-built state to reuse. When provided, the lifespan does *not*
            instantiate :class:`supertonic.TTS` — useful for tests that inject
            a fake. Pass ``None`` for normal use.
        model: Model name to load if ``state.tts`` is ``None``.
        custom_styles_dir: Override the on-disk location of user-imported
            voice styles. Defaults to
            :func:`supertonic.server.styles_store.default_custom_styles_dir`.
        cors_origins: If non-empty, install ``CORSMiddleware`` for these
            origins. Browser-extension or Electron clients need this; n8n and
            curl do not.
    """
    if state is None:
        state = ServerState(model=model, custom_styles_dir=custom_styles_dir)
    elif custom_styles_dir is not None:
        state.custom_styles_dir = Path(custom_styles_dir)

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        if state.tts is None:
            # Import here so that ``supertonic.server`` import does not pull
            # the model loader into hot paths or test harnesses that mock it.
            from ..pipeline import TTS

            logger.info("Loading TTS model %r ...", state.model)
            state.tts = TTS(model=state.model)
        state.custom_styles = styles_store.scan(state.custom_styles_dir)
        state.is_ready = True
        logger.info(
            "supertonic serve ready: model=%s builtin=%d custom=%d",
            state.model,
            len(state.tts.voice_style_names) if state.tts else 0,
            len(state.custom_styles),
        )
        try:
            yield
        finally:
            state.is_ready = False

    app = FastAPI(
        title="Supertonic TTS",
        description=(
            "Local HTTP server for Supertonic TTS. Exposes a native /v1/* "
            "namespace plus an OpenAI Audio Speech-compatible alias at "
            "POST /v1/audio/speech so existing clients work with just a "
            "base-URL change."
        ),
        version=__version__,
        lifespan=lifespan,
    )
    app.state.server_state = state

    # Note: middlewares execute in reverse order of addition (the *last*
    # added wraps everything below it). Add the size limit last so it
    # short-circuits before FastAPI's routing/dependency layers start
    # buffering the multipart body.
    if cors_origins:
        from fastapi.middleware.cors import CORSMiddleware

        app.add_middleware(
            CORSMiddleware,
            allow_origins=list(cors_origins),
            allow_credentials=False,
            allow_methods=["*"],
            allow_headers=["*"],
        )
    app.add_middleware(StyleImportSizeLimit, max_bytes=MAX_STYLE_IMPORT_BYTES)

    register_routes(app)
    return app

app

FastAPI application factory for supertonic serve.

Designed so that:

  • cmd_serve builds the app, uvicorn drives it.
  • Tests can inject a pre-built :class:ServerState (with a fake TTS) so no real ONNX session is created.
  • Anyone embedding the server inside a larger ASGI app can mount the FastAPI returned by :func:create_app.

Classes:

Name Description
StyleImportSizeLimit

ASGI middleware: reject POST /v1/styles/import when the request

ServerState

Mutable shared state used by every request handler.

Functions:

Name Description
create_app

Build a configured FastAPI app.

Attributes:

Name Type Description
logger

logger module-attribute

logger = getLogger(__name__)

StyleImportSizeLimit

StyleImportSizeLimit(app, max_bytes: int)

ASGI middleware: reject POST /v1/styles/import when the request Content-Length exceeds :data:MAX_STYLE_IMPORT_BYTES.

The check runs before FastAPI's dependency machinery starts buffering the multipart body, so a malicious or accidental oversized upload is rejected at the headers stage. Requests without Content-Length (chunked transfer encoding) fall through; the handler's read(MAX+1) enforces the same cap there.

Attributes:

Name Type Description
app
max_bytes
Source code in supertonic/server/app.py
def __init__(self, app, max_bytes: int) -> None:
    self.app = app
    self.max_bytes = max_bytes
app instance-attribute
app = app
max_bytes instance-attribute
max_bytes = max_bytes

ServerState

ServerState(
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None
)

Mutable shared state used by every request handler.

Attributes:

Name Type Description
model

Model name to load (e.g. "supertonic-3").

tts

Loaded :class:supertonic.TTS instance, None until the lifespan finishes.

custom_styles

{stem: path} for user-imported style JSONs.

custom_styles_dir

Directory on disk that backs custom_styles.

synth_lock

Serializes ONNX Runtime inference across threads (FastAPI executes sync handlers in a threadpool).

is_ready

True once the lifespan has finished initialization.

Source code in supertonic/server/app.py
def __init__(
    self,
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None,
) -> None:
    self.model = model
    self.tts = tts
    # Custom styles default to the *model's* cache dir, so the same name
    # cannot collide across model versions.
    self.custom_styles_dir = (
        Path(custom_styles_dir)
        if custom_styles_dir
        else styles_store.default_custom_styles_dir(model)
    )
    self.custom_styles = dict(custom_styles or {})
    self.synth_lock = threading.Lock()
    self.is_ready = False
model instance-attribute
model = model
tts instance-attribute
tts = tts
custom_styles_dir instance-attribute
custom_styles_dir = (
    Path(custom_styles_dir)
    if custom_styles_dir
    else default_custom_styles_dir(model)
)
custom_styles instance-attribute
custom_styles = dict(custom_styles or {})
synth_lock instance-attribute
synth_lock = Lock()
is_ready instance-attribute
is_ready = False

create_app

create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None
) -> FastAPI

Build a configured FastAPI app.

Parameters:

Name Type Description Default
state Optional[ServerState]

Pre-built state to reuse. When provided, the lifespan does not instantiate :class:supertonic.TTS — useful for tests that inject a fake. Pass None for normal use.

None
model str

Model name to load if state.tts is None.

DEFAULT_MODEL
custom_styles_dir Optional[Path]

Override the on-disk location of user-imported voice styles. Defaults to :func:supertonic.server.styles_store.default_custom_styles_dir.

None
cors_origins Optional[Iterable[str]]

If non-empty, install CORSMiddleware for these origins. Browser-extension or Electron clients need this; n8n and curl do not.

None
Source code in supertonic/server/app.py
def create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None,
) -> FastAPI:
    """Build a configured FastAPI app.

    Args:
        state: Pre-built state to reuse. When provided, the lifespan does *not*
            instantiate :class:`supertonic.TTS` — useful for tests that inject
            a fake. Pass ``None`` for normal use.
        model: Model name to load if ``state.tts`` is ``None``.
        custom_styles_dir: Override the on-disk location of user-imported
            voice styles. Defaults to
            :func:`supertonic.server.styles_store.default_custom_styles_dir`.
        cors_origins: If non-empty, install ``CORSMiddleware`` for these
            origins. Browser-extension or Electron clients need this; n8n and
            curl do not.
    """
    if state is None:
        state = ServerState(model=model, custom_styles_dir=custom_styles_dir)
    elif custom_styles_dir is not None:
        state.custom_styles_dir = Path(custom_styles_dir)

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        if state.tts is None:
            # Import here so that ``supertonic.server`` import does not pull
            # the model loader into hot paths or test harnesses that mock it.
            from ..pipeline import TTS

            logger.info("Loading TTS model %r ...", state.model)
            state.tts = TTS(model=state.model)
        state.custom_styles = styles_store.scan(state.custom_styles_dir)
        state.is_ready = True
        logger.info(
            "supertonic serve ready: model=%s builtin=%d custom=%d",
            state.model,
            len(state.tts.voice_style_names) if state.tts else 0,
            len(state.custom_styles),
        )
        try:
            yield
        finally:
            state.is_ready = False

    app = FastAPI(
        title="Supertonic TTS",
        description=(
            "Local HTTP server for Supertonic TTS. Exposes a native /v1/* "
            "namespace plus an OpenAI Audio Speech-compatible alias at "
            "POST /v1/audio/speech so existing clients work with just a "
            "base-URL change."
        ),
        version=__version__,
        lifespan=lifespan,
    )
    app.state.server_state = state

    # Note: middlewares execute in reverse order of addition (the *last*
    # added wraps everything below it). Add the size limit last so it
    # short-circuits before FastAPI's routing/dependency layers start
    # buffering the multipart body.
    if cors_origins:
        from fastapi.middleware.cors import CORSMiddleware

        app.add_middleware(
            CORSMiddleware,
            allow_origins=list(cors_origins),
            allow_credentials=False,
            allow_methods=["*"],
            allow_headers=["*"],
        )
    app.add_middleware(StyleImportSizeLimit, max_bytes=MAX_STYLE_IMPORT_BYTES)

    register_routes(app)
    return app

styles_store

On-disk store for user-imported voice styles.

Imported voice styles live alongside the bundled built-ins, scoped per model so that a voice imported while serving supertonic-3 is not silently used by supertonic-2:

~/.cache/supertonic3/custom_styles/<name>.json   # supertonic-3
~/.cache/supertonic2/custom_styles/<name>.json   # supertonic-2
~/.cache/supertonic/custom_styles/<name>.json    # supertonic v1

This matches how the bundled voices are organized (each model's voice_styles/ lives under its own cache dir) and keeps custom JSONs out of voice_styles/ so the SDK's :func:list_available_voice_style_names remains unchanged.

This module deliberately stays small: it never loads the styles itself — that work belongs to :func:supertonic.loader.load_voice_style_from_json_file, which already enforces the JSON schema via :func:supertonic.utils.validate_voice_style_format. We just decide where files live and how their names are sanitized.

Classes:

Name Description
InvalidStyleName

Raised when an imported style name fails sanitization.

StyleNameConflict

Raised when an imported style would overwrite an existing one.

Functions:

Name Description
default_custom_styles_dir

Resolve the on-disk directory for user-imported voice styles.

sanitize_name
scan

Return {stem: path} for every well-formed JSON in directory.

save

Persist a validated style payload to directory / f"{name}.json".

Attributes:

Name Type Description
logger

logger module-attribute

logger = getLogger(__name__)

default_custom_styles_dir

default_custom_styles_dir(
    model: str = DEFAULT_MODEL,
) -> Path

Resolve the on-disk directory for user-imported voice styles.

Priority:

  1. $SUPERTONIC_CUSTOM_STYLES_DIR — explicit override, applies to every model (the user opted into a single shared location).
  2. <model cache dir>/custom_styles/ — e.g. ~/.cache/supertonic3/ custom_styles/ for supertonic-3. Respects $SUPERTONIC_CACHE_DIR through :func:supertonic.loader.get_cache_dir.
Source code in supertonic/server/styles_store.py
def default_custom_styles_dir(model: str = DEFAULT_MODEL) -> Path:
    """Resolve the on-disk directory for user-imported voice styles.

    Priority:

    1. ``$SUPERTONIC_CUSTOM_STYLES_DIR`` — explicit override, applies to every
       model (the user opted into a single shared location).
    2. ``<model cache dir>/custom_styles/`` — e.g. ``~/.cache/supertonic3/
       custom_styles/`` for ``supertonic-3``. Respects ``$SUPERTONIC_CACHE_DIR``
       through :func:`supertonic.loader.get_cache_dir`.
    """
    env = os.getenv("SUPERTONIC_CUSTOM_STYLES_DIR")
    if env:
        return Path(env).expanduser()
    return get_cache_dir(model) / "custom_styles"

InvalidStyleName

Bases: ValueError

Raised when an imported style name fails sanitization.

StyleNameConflict

Bases: ValueError

Raised when an imported style would overwrite an existing one.

sanitize_name

sanitize_name(name: str) -> str
Source code in supertonic/server/styles_store.py
def sanitize_name(name: str) -> str:
    name = (name or "").strip()
    if not _NAME_RE.fullmatch(name):
        raise InvalidStyleName(f"Invalid style name {name!r}: must match [A-Za-z0-9_-]{{1,64}}")
    return name

scan

scan(directory: Path) -> Dict[str, Path]

Return {stem: path} for every well-formed JSON in directory.

A file that fails :func:validate_voice_style_format is skipped with a warning rather than crashing startup — the server should still come up.

Source code in supertonic/server/styles_store.py
def scan(directory: Path) -> Dict[str, Path]:
    """Return ``{stem: path}`` for every well-formed JSON in ``directory``.

    A file that fails :func:`validate_voice_style_format` is skipped with a
    warning rather than crashing startup — the server should still come up.
    """
    out: Dict[str, Path] = {}
    if not directory.exists():
        return out
    for p in sorted(directory.glob("*.json")):
        try:
            with p.open("r", encoding="utf-8") as f:
                data = json.load(f)
            if not validate_voice_style_format(data):
                logger.warning("Skipping invalid voice style file: %s", p)
                continue
        except (OSError, json.JSONDecodeError) as e:
            logger.warning("Skipping unreadable voice style file %s: %s", p, e)
            continue
        out[p.stem] = p
    return out

save

save(
    directory: Path,
    name: str,
    payload: dict,
    *,
    builtin_names: Iterable[str] = (),
    overwrite: bool = False
) -> Path

Persist a validated style payload to directory / f"{name}.json".

Parameters:

Name Type Description Default
directory Path

target directory (created if missing).

required
name str

requested style name; sanitized via :func:sanitize_name.

required
payload dict

parsed JSON; must pass :func:validate_voice_style_format.

required
builtin_names Iterable[str]

names reserved by the bundled model; conflict → 400.

()
overwrite bool

if False, conflict with an existing custom name → 409.

False

Returns:

Type Description
Path

The path the style was written to.

Source code in supertonic/server/styles_store.py
def save(
    directory: Path,
    name: str,
    payload: dict,
    *,
    builtin_names: Iterable[str] = (),
    overwrite: bool = False,
) -> Path:
    """Persist a validated style payload to ``directory / f"{name}.json"``.

    Args:
        directory: target directory (created if missing).
        name: requested style name; sanitized via :func:`sanitize_name`.
        payload: parsed JSON; must pass :func:`validate_voice_style_format`.
        builtin_names: names reserved by the bundled model; conflict → 400.
        overwrite: if False, conflict with an existing custom name → 409.

    Returns:
        The path the style was written to.
    """
    name = sanitize_name(name)
    if name in set(builtin_names):
        raise StyleNameConflict(f"Name {name!r} is a built-in voice and cannot be overwritten")
    if not validate_voice_style_format(payload):
        # Re-using the SDK error type so server handlers can map uniformly.
        raise ValueError("voice style JSON is missing required keys/fields")

    directory.mkdir(parents=True, exist_ok=True)
    target = directory / f"{name}.json"
    if target.exists() and not overwrite:
        raise StyleNameConflict(f"Style {name!r} already exists")
    tmp = target.with_suffix(".json.tmp")
    with tmp.open("w", encoding="utf-8") as f:
        json.dump(payload, f)
    tmp.replace(target)
    return target

audio

Audio encoding helpers for the local TTS server.

Only formats reachable through soundfile (libsndfile) at the model's native 44.1 kHz are supported, so the server adds no extra system dependencies beyond what the SDK already requires. MP3 / AAC / Opus are intentionally rejected with a clear error rather than silently emitting WAV — clients should detect the unsupported format and fall back.

(Opus is excluded for now because libsndfile's OGG/OPUS encoder only accepts 8/12/16/24/48 kHz, and we'd rather error clearly than ship a broken format. Re-add it once we have a resampling step.)

Classes:

Name Description
UnsupportedAudioFormat

Raised when the caller asks for a format we cannot encode.

Functions:

Name Description
format_to_mime
encode_audio

Encode a synthesized waveform into fmt bytes.

duration_seconds
coerce_response_format

Validate and normalize a user-supplied response_format.

Attributes:

Name Type Description
SUPPORTED_FORMATS

SUPPORTED_FORMATS module-attribute

SUPPORTED_FORMATS = tuple(keys())

UnsupportedAudioFormat

Bases: ValueError

Raised when the caller asks for a format we cannot encode.

format_to_mime

format_to_mime(fmt: str) -> str
Source code in supertonic/server/audio.py
def format_to_mime(fmt: str) -> str:
    entry = _FORMATS.get(fmt)
    if entry is None:
        raise UnsupportedAudioFormat(fmt)
    return entry[2]

encode_audio

encode_audio(
    wav: ndarray, sample_rate: int, fmt: str
) -> bytes

Encode a synthesized waveform into fmt bytes.

Parameters:

Name Type Description Default
wav ndarray

ndarray of shape (1, num_samples) or (num_samples,) — the shape produced by :meth:supertonic.TTS.synthesize.

required
sample_rate int

model sample rate (e.g. 44100).

required
fmt str

one of :data:SUPPORTED_FORMATS.

required
Source code in supertonic/server/audio.py
def encode_audio(wav: np.ndarray, sample_rate: int, fmt: str) -> bytes:
    """Encode a synthesized waveform into ``fmt`` bytes.

    Args:
        wav: ndarray of shape ``(1, num_samples)`` or ``(num_samples,)`` —
            the shape produced by :meth:`supertonic.TTS.synthesize`.
        sample_rate: model sample rate (e.g. 44100).
        fmt: one of :data:`SUPPORTED_FORMATS`.
    """
    entry = _FORMATS.get(fmt)
    if entry is None:
        raise UnsupportedAudioFormat(fmt)
    sf_format, subtype, _ = entry

    if wav.ndim == 2:
        # soundfile expects (frames,) or (frames, channels). The pipeline
        # returns (1, num_samples), so squeeze the leading singleton.
        wav = wav.squeeze(0)

    buf = io.BytesIO()
    sf.write(buf, wav, sample_rate, format=sf_format, subtype=subtype)
    return buf.getvalue()

duration_seconds

duration_seconds(wav: ndarray, sample_rate: int) -> float
Source code in supertonic/server/audio.py
def duration_seconds(wav: np.ndarray, sample_rate: int) -> float:
    return float(wav.shape[-1]) / float(sample_rate)

coerce_response_format

coerce_response_format(value: Optional[str]) -> str

Validate and normalize a user-supplied response_format.

None"wav" (sensible default for local-host integrations). An unsupported value raises :class:UnsupportedAudioFormat so handlers can return a 400 with a stable error code.

Source code in supertonic/server/audio.py
def coerce_response_format(value: Optional[str]) -> str:
    """Validate and normalize a user-supplied ``response_format``.

    ``None`` → ``"wav"`` (sensible default for local-host integrations). An
    unsupported value raises :class:`UnsupportedAudioFormat` so handlers can
    return a 400 with a stable error code.
    """
    if value is None:
        return "wav"
    v = value.lower().strip()
    if v not in _FORMATS:
        raise UnsupportedAudioFormat(value)
    return v

routes

HTTP route handlers for supertonic serve.

The route surface is intentionally narrow and follows two conventions so that existing clients work with minimal changes:

  1. Native namespace under /v1/... for first-class Supertonic features.
  2. OpenAI Audio Speech alias at POST /v1/audio/speech so any client that already speaks the OpenAI API (n8n OpenAI node, openai-python, many browser extensions, Electron tools) can swap the base URL.

Errors use the OpenAI-shaped envelope::

{ "error": { "message": "...", "type": "...", "code": "..." } }

so that downstream error parsers keep working.

Classes:

Name Description
UnknownVoice

Voice name does not match any built-in or imported style.

Functions:

Name Description
register_routes

Attach all /v1/... routes to app.

Attributes:

Name Type Description
logger
MAX_STYLE_IMPORT_BYTES

logger module-attribute

logger = getLogger(__name__)

MAX_STYLE_IMPORT_BYTES module-attribute

MAX_STYLE_IMPORT_BYTES = 1 * 1024 * 1024

UnknownVoice

Bases: LookupError

Voice name does not match any built-in or imported style.

register_routes

register_routes(app: FastAPI) -> None

Attach all /v1/... routes to app.

Called from :func:supertonic.server.app.create_app after the lifespan and app.state.server_state have been set up.

Source code in supertonic/server/routes.py
def register_routes(app: FastAPI) -> None:
    """Attach all `/v1/...` routes to ``app``.

    Called from :func:`supertonic.server.app.create_app` after the lifespan and
    ``app.state.server_state`` have been set up.
    """
    router = APIRouter()

    @router.get("/v1/health", response_model=HealthResponse)
    def health(request: Request):
        state = _state(request)
        if not state.is_ready or state.tts is None:
            return JSONResponse(
                status_code=503,
                content=HealthResponse(
                    status="loading",
                    model=state.model,
                    version=__version__,
                    voices_loaded=0,
                ).model_dump(),
            )
        return HealthResponse(
            status="ok",
            model=state.model,
            sample_rate=state.tts.sample_rate,
            version=__version__,
            voices_loaded=len(state.tts.voice_style_names) + len(state.custom_styles),
        )

    @router.get("/v1/styles", response_model=StylesResponse)
    def list_styles(request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        builtin = [StyleInfo(name=n, kind="builtin") for n in state.tts.voice_style_names]
        custom = [
            StyleInfo(name=n, kind="custom", path=str(p))
            for n, p in sorted(state.custom_styles.items())
        ]
        return StylesResponse(styles=builtin + custom)

    @router.post("/v1/styles/import", response_model=StyleImportResponse)
    async def import_style(
        request: Request,
        overwrite: bool = False,
        file: Optional[UploadFile] = File(None),
        name: Optional[str] = Form(None),
    ):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")

        ct = request.headers.get("content-type", "")
        chosen_name: Optional[str]
        if ct.startswith("multipart/form-data"):
            if file is None:
                return _error(400, "missing 'file' part", "missing_file")
            # Read with an explicit cap as a fallback for chunked uploads
            # that bypass the middleware's Content-Length pre-flight check.
            raw = await file.read(MAX_STYLE_IMPORT_BYTES + 1)
            if len(raw) > MAX_STYLE_IMPORT_BYTES:
                return _error(
                    413,
                    f"uploaded voice style exceeds {MAX_STYLE_IMPORT_BYTES} bytes",
                    "payload_too_large",
                )
            try:
                data = json.loads(raw)
            except json.JSONDecodeError as e:
                return _error(400, f"invalid JSON in uploaded file: {e}", "invalid_json")
            chosen_name = name or Path(file.filename or "").stem or "imported"
        else:
            try:
                body = await request.json()
            except json.JSONDecodeError:
                return _error(400, "invalid JSON body", "invalid_json")
            if not isinstance(body, dict):
                return _error(400, "JSON body must be an object", "invalid_body")
            chosen_name = body.get("name")
            if not chosen_name:
                return _error(400, "missing 'name' in JSON body", "missing_name")
            data = {k: body[k] for k in ("style_ttl", "style_dp") if k in body}

        try:
            target = styles_store.save(
                state.custom_styles_dir,
                chosen_name,
                data,
                builtin_names=state.tts.voice_style_names,
                overwrite=overwrite,
            )
        except styles_store.InvalidStyleName as e:
            return _error(400, str(e), "invalid_style_name")
        except styles_store.StyleNameConflict as e:
            status = 409 if "already exists" in str(e) else 400
            return _error(status, str(e), "style_name_conflict")
        except ValueError as e:
            return _error(400, str(e), "invalid_style_payload")

        state.custom_styles[target.stem] = target
        return StyleImportResponse(name=target.stem, stored_at=str(target))

    @router.post("/v1/tts")
    def synth_native(req: TTSRequest, request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}",
                "unsupported_response_format",
            )
        err = _validate_lang(req.lang)
        if err is not None:
            return err
        try:
            wav, dur = _do_synthesize(
                state,
                text=req.text,
                voice=req.voice,
                lang=req.lang,
                speed=req.speed,
                steps=req.steps,
                max_chunk_length=req.max_chunk_length,
                silence_duration=req.silence_duration,
            )
        except UnknownVoice as e:
            return _error(400, f"unknown voice {str(e)!r}", "unknown_voice")
        except Exception as e:  # noqa: BLE001 — surface as 500 with code
            logger.exception("synthesis failed")
            return _error(500, f"synthesis failed: {e}", "synthesis_failed", type_="server_error")
        return _audio_response(state, wav, fmt, dur)

    @router.post("/v1/audio/speech")
    def openai_compat_speech(req: OpenAISpeechRequest, request: Request):
        # Validate ``model`` against AVAILABLE_MODELS but only *accept* the
        # model currently loaded — switching at request time is out of scope.
        state = _state(request)
        if req.model not in OpenAISpeechRequest.valid_models():
            return _error(
                400,
                f"unknown model {req.model!r}; valid: {', '.join(OpenAISpeechRequest.valid_models())}",
                "unknown_model",
            )
        if req.model != state.model:
            return _error(
                400,
                f"this server serves {state.model!r}; request asked for {req.model!r}. "
                f"Restart with --model {req.model} to switch.",
                "model_not_loaded",
            )
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        # OpenAI clients default to ``response_format='mp3'`` — surface a
        # clear error rather than silently emitting WAV.
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}; "
                f"set response_format to one of: {', '.join(SUPPORTED_FORMATS)}",
                "unsupported_response_format",
            )
        err = _validate_lang(req.lang)
        if err is not None:
            return err
        try:
            wav, dur = _do_synthesize(
                state,
                text=req.input,
                voice=req.voice,
                lang=req.lang,
                speed=req.speed,
                steps=None,
                max_chunk_length=None,
                silence_duration=None,
            )
        except UnknownVoice as e:
            return _error(400, f"unknown voice {str(e)!r}", "unknown_voice")
        except Exception as e:  # noqa: BLE001
            logger.exception("synthesis failed")
            return _error(500, f"synthesis failed: {e}", "synthesis_failed", type_="server_error")
        return _audio_response(state, wav, fmt, dur)

    @router.post("/v1/tts/batch", response_model=BatchResponse)
    def synth_batch(req: BatchRequest, request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}",
                "unsupported_response_format",
            )
        defaults = req.defaults
        results: list[BatchResultItem] = []
        for idx, item in enumerate(req.items):
            voice = item.voice or (defaults.voice if defaults else None) or "M1"
            lang = item.lang or (defaults.lang if defaults else None)
            speed = item.speed if item.speed is not None else (defaults.speed if defaults else None)
            steps = item.steps if item.steps is not None else (defaults.steps if defaults else None)
            mcl = (
                item.max_chunk_length
                if item.max_chunk_length is not None
                else (defaults.max_chunk_length if defaults else None)
            )
            sil = (
                item.silence_duration
                if item.silence_duration is not None
                else (defaults.silence_duration if defaults else None)
            )
            if lang is not None and lang not in AVAILABLE_LANGUAGES:
                return _error(
                    400,
                    f"items[{idx}].lang: unsupported lang {lang!r}",
                    "unsupported_lang",
                )
            try:
                wav, dur = _do_synthesize(
                    state,
                    text=item.text,
                    voice=voice,
                    lang=lang,
                    speed=speed,
                    steps=steps,
                    max_chunk_length=mcl,
                    silence_duration=sil,
                )
            except UnknownVoice as e:
                return _error(
                    400,
                    f"items[{idx}]: unknown voice {str(e)!r}",
                    "unknown_voice",
                )
            except Exception as e:  # noqa: BLE001
                logger.exception("batch item %d synthesis failed", idx)
                return _error(
                    500,
                    f"items[{idx}]: synthesis failed: {e}",
                    "synthesis_failed",
                    type_="server_error",
                )
            body = encode_audio(wav, state.tts.sample_rate, fmt)
            results.append(
                BatchResultItem(
                    audio_base64=base64.b64encode(body).decode("ascii"),
                    duration_s=dur,
                    format=fmt,
                    sample_rate=state.tts.sample_rate,
                )
            )
        return BatchResponse(items=results)

    app.include_router(router)

schemas

Pydantic request/response schemas for the local TTS server.

The wire format mirrors common TTS-server conventions so existing clients (n8n HTTP nodes, openedai-speech-compatible browser extensions, OpenAI SDKs) can talk to supertonic serve with little or no code change.

Classes:

Name Description
TTSRequest

Native synthesis request — POST /v1/tts.

OpenAISpeechRequest

OpenAI Audio Speech-compatible request — POST /v1/audio/speech.

BatchItem
BatchDefaults
BatchRequest
BatchResultItem
BatchResponse
StyleInfo
StylesResponse
StyleImportJSON

JSON-body variant of POST /v1/styles/import.

StyleImportResponse
HealthResponse
ErrorDetail
ErrorEnvelope

OpenAI-shaped error envelope so integrators can reuse existing parsers.

TTSRequest

Bases: BaseModel

Native synthesis request — POST /v1/tts.

Attributes:

Name Type Description
text str
voice str
lang Optional[str]
speed Optional[float]
steps Optional[int]
max_chunk_length Optional[int]
silence_duration Optional[float]
response_format Optional[str]
text class-attribute instance-attribute
text: str = Field(
    ..., min_length=1, description="Text to synthesize"
)
voice class-attribute instance-attribute
voice: str = Field(
    "M1",
    description="Voice style name (built-in or imported)",
)
lang class-attribute instance-attribute
lang: Optional[str] = Field(
    None, description="Language code or 'na' for fallback"
)
speed class-attribute instance-attribute
speed: Optional[float] = Field(None, ge=0.7, le=2.0)
steps class-attribute instance-attribute
steps: Optional[int] = Field(None, ge=1, le=100)
max_chunk_length class-attribute instance-attribute
max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)
silence_duration class-attribute instance-attribute
silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)
response_format class-attribute instance-attribute
response_format: Optional[str] = Field(
    None, description=f"One of: {join(SUPPORTED_FORMATS)}"
)

OpenAISpeechRequest

Bases: BaseModel

OpenAI Audio Speech-compatible request — POST /v1/audio/speech.

Field names match the OpenAI API so existing clients (n8n's OpenAI node, openai-python, openedai-speech-style browser extensions) only need to swap the base URL.

Methods:

Name Description
valid_models

Attributes:

Name Type Description
model str
input str
voice str
response_format Optional[str]
speed Optional[float]
lang Optional[str]
model class-attribute instance-attribute
model: str = Field(
    ...,
    description="Model name (must match the loaded model)",
)
input class-attribute instance-attribute
input: str = Field(
    ..., min_length=1, description="Text to synthesize"
)
voice class-attribute instance-attribute
voice: str = Field('M1', description='Voice style name')
response_format class-attribute instance-attribute
response_format: Optional[str] = Field(None)
speed class-attribute instance-attribute
speed: Optional[float] = Field(None, ge=0.7, le=2.0)
lang class-attribute instance-attribute
lang: Optional[str] = Field(None)
valid_models classmethod
valid_models() -> tuple[str, ...]
Source code in supertonic/server/schemas.py
@classmethod
def valid_models(cls) -> tuple[str, ...]:
    return tuple(AVAILABLE_MODELS)

BatchItem

Bases: BaseModel

Attributes:

Name Type Description
text str
voice str
lang Optional[str]
speed Optional[float]
steps Optional[int]
max_chunk_length Optional[int]
silence_duration Optional[float]
text class-attribute instance-attribute
text: str = Field(..., min_length=1)
voice class-attribute instance-attribute
voice: str = Field('M1')
lang class-attribute instance-attribute
lang: Optional[str] = None
speed class-attribute instance-attribute
speed: Optional[float] = Field(None, ge=0.7, le=2.0)
steps class-attribute instance-attribute
steps: Optional[int] = Field(None, ge=1, le=100)
max_chunk_length class-attribute instance-attribute
max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)
silence_duration class-attribute instance-attribute
silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)

BatchDefaults

Bases: BaseModel

Attributes:

Name Type Description
voice Optional[str]
lang Optional[str]
speed Optional[float]
steps Optional[int]
max_chunk_length Optional[int]
silence_duration Optional[float]
voice class-attribute instance-attribute
voice: Optional[str] = None
lang class-attribute instance-attribute
lang: Optional[str] = None
speed class-attribute instance-attribute
speed: Optional[float] = Field(None, ge=0.7, le=2.0)
steps class-attribute instance-attribute
steps: Optional[int] = Field(None, ge=1, le=100)
max_chunk_length class-attribute instance-attribute
max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)
silence_duration class-attribute instance-attribute
silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)

BatchRequest

Bases: BaseModel

Attributes:

Name Type Description
items List[BatchItem]
response_format Optional[str]
defaults Optional[BatchDefaults]
items class-attribute instance-attribute
items: List[BatchItem] = Field(
    ..., min_length=1, max_length=64
)
response_format class-attribute instance-attribute
response_format: Optional[str] = None
defaults class-attribute instance-attribute
defaults: Optional[BatchDefaults] = None

BatchResultItem

Bases: BaseModel

Attributes:

Name Type Description
audio_base64 str
duration_s float
format str
sample_rate int
audio_base64 instance-attribute
audio_base64: str
duration_s instance-attribute
duration_s: float
format instance-attribute
format: str
sample_rate instance-attribute
sample_rate: int

BatchResponse

Bases: BaseModel

Attributes:

Name Type Description
items List[BatchResultItem]
items instance-attribute

StyleInfo

Bases: BaseModel

Attributes:

Name Type Description
name str
kind Literal['builtin', 'custom']
path Optional[str]
name instance-attribute
name: str
kind instance-attribute
kind: Literal['builtin', 'custom']
path class-attribute instance-attribute
path: Optional[str] = None

StylesResponse

Bases: BaseModel

Attributes:

Name Type Description
styles List[StyleInfo]
styles instance-attribute
styles: List[StyleInfo]

StyleImportJSON

Bases: BaseModel

JSON-body variant of POST /v1/styles/import.

The endpoint also accepts multipart/form-data with a file field; that path bypasses this schema and is handled directly in the route.

Attributes:

Name Type Description
name str
style_ttl dict
style_dp dict
name instance-attribute
name: str
style_ttl instance-attribute
style_ttl: dict
style_dp instance-attribute
style_dp: dict

StyleImportResponse

Bases: BaseModel

Attributes:

Name Type Description
name str
stored_at str
name instance-attribute
name: str
stored_at instance-attribute
stored_at: str

HealthResponse

Bases: BaseModel

Attributes:

Name Type Description
status Literal['ok', 'loading']
model str
sample_rate Optional[int]
version str
voices_loaded int
status instance-attribute
status: Literal['ok', 'loading']
model class-attribute instance-attribute
model: str = DEFAULT_MODEL
sample_rate class-attribute instance-attribute
sample_rate: Optional[int] = None
version instance-attribute
version: str
voices_loaded class-attribute instance-attribute
voices_loaded: int = 0

ErrorDetail

Bases: BaseModel

Attributes:

Name Type Description
message str
type str
code Optional[str]
message instance-attribute
message: str
type class-attribute instance-attribute
type: str = 'invalid_request_error'
code class-attribute instance-attribute
code: Optional[str] = None

ErrorEnvelope

Bases: BaseModel

OpenAI-shaped error envelope so integrators can reuse existing parsers.

Attributes:

Name Type Description
error ErrorDetail
error instance-attribute
error: ErrorDetail