supertonic.server¶

supertonic.server ¶

Local HTTP server for Supertonic TTS.

This subpackage is optional. It depends on fastapi, uvicorn, and python-multipart which install via the [serve] extra:

pip install supertonic[serve]

It exposes a thin FastAPI wrapper around :class:supertonic.pipeline.TTS designed for local-only integration with n8n, browser extensions, Electron, Unity, Home Assistant, robotics devices, and any client that already speaks the OpenAI Audio Speech API.

Public surface:

:func:create_app — build a FastAPI ASGI app (model loads in lifespan).
:class:ServerState — shared runtime state if you need to inject a pre-loaded TTS (e.g. tests).
:data:__all__ listed below.

Modules:

Name	Description
`app`	FastAPI application factory for `supertonic serve`.
`audio`	Audio encoding helpers for the local TTS server.
`routes`	HTTP route handlers for `supertonic serve`.
`schemas`	Pydantic request/response schemas for the local TTS server.
`styles_store`	On-disk store for user-imported voice styles.

Classes:

Name	Description
`ServerState`	Mutable shared state used by every request handler.

Functions:

Name	Description
`create_app`	Build a configured FastAPI app.

ServerState ¶

ServerState(
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None
)

Mutable shared state used by every request handler.

Attributes:

Name	Type	Description
`model`		Model name to load (e.g. `"supertonic-3"`).
`tts`		Loaded :class:`supertonic.TTS` instance, `None` until the lifespan finishes.
`custom_styles`		`{stem: path}` for user-imported style JSONs.
`custom_styles_dir`		Directory on disk that backs `custom_styles`.
`synth_lock`		Serializes ONNX Runtime inference across threads (FastAPI executes sync handlers in a threadpool).
`is_ready`		`True` once the lifespan has finished initialization.

Source code in supertonic/server/app.py

def __init__(
    self,
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None,
) -> None:
    self.model = model
    self.tts = tts
    # Custom styles default to the *model's* cache dir, so the same name
    # cannot collide across model versions.
    self.custom_styles_dir = (
        Path(custom_styles_dir)
        if custom_styles_dir
        else styles_store.default_custom_styles_dir(model)
    )
    self.custom_styles = dict(custom_styles or {})
    self.synth_lock = threading.Lock()
    self.is_ready = False

model `instance-attribute` ¶

model = model

tts `instance-attribute` ¶

tts = tts

custom_styles_dir `instance-attribute` ¶

custom_styles_dir = (
    Path(custom_styles_dir)
    if custom_styles_dir
    else default_custom_styles_dir(model)
)

custom_styles `instance-attribute` ¶

custom_styles = dict(custom_styles or {})

synth_lock `instance-attribute` ¶

synth_lock = Lock()

is_ready `instance-attribute` ¶

is_ready = False

create_app ¶

create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None
) -> FastAPI

Build a configured FastAPI app.

Parameters:

Name	Type	Description	Default
`state`	`Optional[ServerState]`	Pre-built state to reuse. When provided, the lifespan does not instantiate :class:`supertonic.TTS` — useful for tests that inject a fake. Pass `None` for normal use.	`None`
`model`	`str`	Model name to load if `state.tts` is `None`.	`DEFAULT_MODEL`
`custom_styles_dir`	`Optional[Path]`	Override the on-disk location of user-imported voice styles. Defaults to :func:`supertonic.server.styles_store.default_custom_styles_dir`.	`None`
`cors_origins`	`Optional[Iterable[str]]`	If non-empty, install `CORSMiddleware` for these origins. Browser-extension or Electron clients need this; n8n and curl do not.	`None`

Source code in supertonic/server/app.py

def create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None,
) -> FastAPI:
    """Build a configured FastAPI app.

    Args:
        state: Pre-built state to reuse. When provided, the lifespan does *not*
            instantiate :class:`supertonic.TTS` — useful for tests that inject
            a fake. Pass ``None`` for normal use.
        model: Model name to load if ``state.tts`` is ``None``.
        custom_styles_dir: Override the on-disk location of user-imported
            voice styles. Defaults to
            :func:`supertonic.server.styles_store.default_custom_styles_dir`.
        cors_origins: If non-empty, install ``CORSMiddleware`` for these
            origins. Browser-extension or Electron clients need this; n8n and
            curl do not.
    """
    if state is None:
        state = ServerState(model=model, custom_styles_dir=custom_styles_dir)
    elif custom_styles_dir is not None:
        state.custom_styles_dir = Path(custom_styles_dir)

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        if state.tts is None:
            # Import here so that ``supertonic.server`` import does not pull
            # the model loader into hot paths or test harnesses that mock it.
            from ..pipeline import TTS

            logger.info("Loading TTS model %r ...", state.model)
            state.tts = TTS(model=state.model)
        state.custom_styles = styles_store.scan(state.custom_styles_dir)
        state.is_ready = True
        logger.info(
            "supertonic serve ready: model=%s builtin=%d custom=%d",
            state.model,
            len(state.tts.voice_style_names) if state.tts else 0,
            len(state.custom_styles),
        )
        try:
            yield
        finally:
            state.is_ready = False

    app = FastAPI(
        title="Supertonic TTS",
        description=(
            "Local HTTP server for Supertonic TTS. Exposes a native /v1/* "
            "namespace plus an OpenAI Audio Speech-compatible alias at "
            "POST /v1/audio/speech so existing clients work with just a "
            "base-URL change."
        ),
        version=__version__,
        lifespan=lifespan,
    )
    app.state.server_state = state

    # Note: middlewares execute in reverse order of addition (the *last*
    # added wraps everything below it). Add the size limit last so it
    # short-circuits before FastAPI's routing/dependency layers start
    # buffering the multipart body.
    if cors_origins:
        from fastapi.middleware.cors import CORSMiddleware

        app.add_middleware(
            CORSMiddleware,
            allow_origins=list(cors_origins),
            allow_credentials=False,
            allow_methods=["*"],
            allow_headers=["*"],
        )
    app.add_middleware(StyleImportSizeLimit, max_bytes=MAX_STYLE_IMPORT_BYTES)

    register_routes(app)
    return app

app ¶

FastAPI application factory for supertonic serve.

Designed so that:

cmd_serve builds the app, uvicorn drives it.
Tests can inject a pre-built :class:ServerState (with a fake TTS) so no real ONNX session is created.
Anyone embedding the server inside a larger ASGI app can mount the FastAPI returned by :func:create_app.

Classes:

Name	Description
`StyleImportSizeLimit`	ASGI middleware: reject `POST /v1/styles/import` when the request
`ServerState`	Mutable shared state used by every request handler.

Functions:

Name	Description
`create_app`	Build a configured FastAPI app.

Attributes:

Name	Type	Description
`logger`

logger `module-attribute` ¶

logger = getLogger(__name__)

StyleImportSizeLimit ¶

StyleImportSizeLimit(app, max_bytes: int)

ASGI middleware: reject POST /v1/styles/import when the request Content-Length exceeds :data:MAX_STYLE_IMPORT_BYTES.

The check runs before FastAPI's dependency machinery starts buffering the multipart body, so a malicious or accidental oversized upload is rejected at the headers stage. Requests without Content-Length (chunked transfer encoding) fall through; the handler's read(MAX+1) enforces the same cap there.

Attributes:

Name	Type	Description
`app`
`max_bytes`

Source code in supertonic/server/app.py

def __init__(self, app, max_bytes: int) -> None:
    self.app = app
    self.max_bytes = max_bytes

app `instance-attribute` ¶

app = app

max_bytes `instance-attribute` ¶

max_bytes = max_bytes

ServerState ¶

ServerState(
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None
)

Mutable shared state used by every request handler.

Attributes:

Name	Type	Description
`model`		Model name to load (e.g. `"supertonic-3"`).
`tts`		Loaded :class:`supertonic.TTS` instance, `None` until the lifespan finishes.
`custom_styles`		`{stem: path}` for user-imported style JSONs.
`custom_styles_dir`		Directory on disk that backs `custom_styles`.
`synth_lock`		Serializes ONNX Runtime inference across threads (FastAPI executes sync handlers in a threadpool).
`is_ready`		`True` once the lifespan has finished initialization.

Source code in supertonic/server/app.py

def __init__(
    self,
    model: str = DEFAULT_MODEL,
    *,
    tts: Optional["TTS"] = None,
    custom_styles_dir: Optional[Path] = None,
    custom_styles: Optional[Dict[str, Path]] = None,
) -> None:
    self.model = model
    self.tts = tts
    # Custom styles default to the *model's* cache dir, so the same name
    # cannot collide across model versions.
    self.custom_styles_dir = (
        Path(custom_styles_dir)
        if custom_styles_dir
        else styles_store.default_custom_styles_dir(model)
    )
    self.custom_styles = dict(custom_styles or {})
    self.synth_lock = threading.Lock()
    self.is_ready = False

model `instance-attribute` ¶

model = model

tts `instance-attribute` ¶

tts = tts

custom_styles_dir `instance-attribute` ¶

custom_styles_dir = (
    Path(custom_styles_dir)
    if custom_styles_dir
    else default_custom_styles_dir(model)
)

custom_styles `instance-attribute` ¶

custom_styles = dict(custom_styles or {})

synth_lock `instance-attribute` ¶

synth_lock = Lock()

is_ready `instance-attribute` ¶

is_ready = False

create_app ¶

create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None
) -> FastAPI

Build a configured FastAPI app.

Parameters:

Name	Type	Description	Default
`state`	`Optional[ServerState]`	Pre-built state to reuse. When provided, the lifespan does not instantiate :class:`supertonic.TTS` — useful for tests that inject a fake. Pass `None` for normal use.	`None`
`model`	`str`	Model name to load if `state.tts` is `None`.	`DEFAULT_MODEL`
`custom_styles_dir`	`Optional[Path]`	Override the on-disk location of user-imported voice styles. Defaults to :func:`supertonic.server.styles_store.default_custom_styles_dir`.	`None`
`cors_origins`	`Optional[Iterable[str]]`	If non-empty, install `CORSMiddleware` for these origins. Browser-extension or Electron clients need this; n8n and curl do not.	`None`

Source code in supertonic/server/app.py

def create_app(
    *,
    state: Optional[ServerState] = None,
    model: str = DEFAULT_MODEL,
    custom_styles_dir: Optional[Path] = None,
    cors_origins: Optional[Iterable[str]] = None,
) -> FastAPI:
    """Build a configured FastAPI app.

    Args:
        state: Pre-built state to reuse. When provided, the lifespan does *not*
            instantiate :class:`supertonic.TTS` — useful for tests that inject
            a fake. Pass ``None`` for normal use.
        model: Model name to load if ``state.tts`` is ``None``.
        custom_styles_dir: Override the on-disk location of user-imported
            voice styles. Defaults to
            :func:`supertonic.server.styles_store.default_custom_styles_dir`.
        cors_origins: If non-empty, install ``CORSMiddleware`` for these
            origins. Browser-extension or Electron clients need this; n8n and
            curl do not.
    """
    if state is None:
        state = ServerState(model=model, custom_styles_dir=custom_styles_dir)
    elif custom_styles_dir is not None:
        state.custom_styles_dir = Path(custom_styles_dir)

    @asynccontextmanager
    async def lifespan(app: FastAPI):
        if state.tts is None:
            # Import here so that ``supertonic.server`` import does not pull
            # the model loader into hot paths or test harnesses that mock it.
            from ..pipeline import TTS

            logger.info("Loading TTS model %r ...", state.model)
            state.tts = TTS(model=state.model)
        state.custom_styles = styles_store.scan(state.custom_styles_dir)
        state.is_ready = True
        logger.info(
            "supertonic serve ready: model=%s builtin=%d custom=%d",
            state.model,
            len(state.tts.voice_style_names) if state.tts else 0,
            len(state.custom_styles),
        )
        try:
            yield
        finally:
            state.is_ready = False

    app = FastAPI(
        title="Supertonic TTS",
        description=(
            "Local HTTP server for Supertonic TTS. Exposes a native /v1/* "
            "namespace plus an OpenAI Audio Speech-compatible alias at "
            "POST /v1/audio/speech so existing clients work with just a "
            "base-URL change."
        ),
        version=__version__,
        lifespan=lifespan,
    )
    app.state.server_state = state

    # Note: middlewares execute in reverse order of addition (the *last*
    # added wraps everything below it). Add the size limit last so it
    # short-circuits before FastAPI's routing/dependency layers start
    # buffering the multipart body.
    if cors_origins:
        from fastapi.middleware.cors import CORSMiddleware

        app.add_middleware(
            CORSMiddleware,
            allow_origins=list(cors_origins),
            allow_credentials=False,
            allow_methods=["*"],
            allow_headers=["*"],
        )
    app.add_middleware(StyleImportSizeLimit, max_bytes=MAX_STYLE_IMPORT_BYTES)

    register_routes(app)
    return app

styles_store ¶

On-disk store for user-imported voice styles.

Imported voice styles live alongside the bundled built-ins, scoped per model so that a voice imported while serving supertonic-3 is not silently used by supertonic-2:

~/.cache/supertonic3/custom_styles/<name>.json   # supertonic-3
~/.cache/supertonic2/custom_styles/<name>.json   # supertonic-2
~/.cache/supertonic/custom_styles/<name>.json    # supertonic v1

This matches how the bundled voices are organized (each model's voice_styles/ lives under its own cache dir) and keeps custom JSONs out of voice_styles/ so the SDK's :func:list_available_voice_style_names remains unchanged.

This module deliberately stays small: it never loads the styles itself — that work belongs to :func:supertonic.loader.load_voice_style_from_json_file, which already enforces the JSON schema via :func:supertonic.utils.validate_voice_style_format. We just decide where files live and how their names are sanitized.

Classes:

Name	Description
`InvalidStyleName`	Raised when an imported style name fails sanitization.
`StyleNameConflict`	Raised when an imported style would overwrite an existing one.

Functions:

Name	Description
`default_custom_styles_dir`	Resolve the on-disk directory for user-imported voice styles.
`sanitize_name`
`scan`	Return `{stem: path}` for every well-formed JSON in `directory`.
`save`	Persist a validated style payload to `directory / f"{name}.json"`.

Attributes:

Name	Type	Description
`logger`

logger `module-attribute` ¶

logger = getLogger(__name__)

default_custom_styles_dir ¶

default_custom_styles_dir(
    model: str = DEFAULT_MODEL,
) -> Path

Resolve the on-disk directory for user-imported voice styles.

Priority:

$SUPERTONIC_CUSTOM_STYLES_DIR — explicit override, applies to every model (the user opted into a single shared location).
<model cache dir>/custom_styles/ — e.g. ~/.cache/supertonic3/ custom_styles/ for supertonic-3. Respects $SUPERTONIC_CACHE_DIR through :func:supertonic.loader.get_cache_dir.

Source code in supertonic/server/styles_store.py

def default_custom_styles_dir(model: str = DEFAULT_MODEL) -> Path:
    """Resolve the on-disk directory for user-imported voice styles.

    Priority:

    1. ``$SUPERTONIC_CUSTOM_STYLES_DIR`` — explicit override, applies to every
       model (the user opted into a single shared location).
    2. ``<model cache dir>/custom_styles/`` — e.g. ``~/.cache/supertonic3/
       custom_styles/`` for ``supertonic-3``. Respects ``$SUPERTONIC_CACHE_DIR``
       through :func:`supertonic.loader.get_cache_dir`.
    """
    env = os.getenv("SUPERTONIC_CUSTOM_STYLES_DIR")
    if env:
        return Path(env).expanduser()
    return get_cache_dir(model) / "custom_styles"

InvalidStyleName ¶

Bases: ValueError

Raised when an imported style name fails sanitization.

StyleNameConflict ¶

Bases: ValueError

Raised when an imported style would overwrite an existing one.

sanitize_name ¶

sanitize_name(name: str) -> str

Source code in supertonic/server/styles_store.py

def sanitize_name(name: str) -> str:
    name = (name or "").strip()
    if not _NAME_RE.fullmatch(name):
        raise InvalidStyleName(f"Invalid style name {name!r}: must match [A-Za-z0-9_-]{{1,64}}")
    return name

scan ¶

scan(directory: Path) -> Dict[str, Path]

Return {stem: path} for every well-formed JSON in directory.

A file that fails :func:validate_voice_style_format is skipped with a warning rather than crashing startup — the server should still come up.

Source code in supertonic/server/styles_store.py

def scan(directory: Path) -> Dict[str, Path]:
    """Return ``{stem: path}`` for every well-formed JSON in ``directory``.

    A file that fails :func:`validate_voice_style_format` is skipped with a
    warning rather than crashing startup — the server should still come up.
    """
    out: Dict[str, Path] = {}
    if not directory.exists():
        return out
    for p in sorted(directory.glob("*.json")):
        try:
            with p.open("r", encoding="utf-8") as f:
                data = json.load(f)
            if not validate_voice_style_format(data):
                logger.warning("Skipping invalid voice style file: %s", p)
                continue
        except (OSError, json.JSONDecodeError) as e:
            logger.warning("Skipping unreadable voice style file %s: %s", p, e)
            continue
        out[p.stem] = p
    return out

save ¶

save(
    directory: Path,
    name: str,
    payload: dict,
    *,
    builtin_names: Iterable[str] = (),
    overwrite: bool = False
) -> Path

Persist a validated style payload to directory / f"{name}.json".

Parameters:

Name	Type	Description	Default
`directory`	`Path`	target directory (created if missing).	required
`name`	`str`	requested style name; sanitized via :func:`sanitize_name`.	required
`payload`	`dict`	parsed JSON; must pass :func:`validate_voice_style_format`.	required
`builtin_names`	`Iterable[str]`	names reserved by the bundled model; conflict → 400.	`()`
`overwrite`	`bool`	if False, conflict with an existing custom name → 409.	`False`

Returns:

Type	Description
`Path`	The path the style was written to.

Source code in supertonic/server/styles_store.py

def save(
    directory: Path,
    name: str,
    payload: dict,
    *,
    builtin_names: Iterable[str] = (),
    overwrite: bool = False,
) -> Path:
    """Persist a validated style payload to ``directory / f"{name}.json"``.

    Args:
        directory: target directory (created if missing).
        name: requested style name; sanitized via :func:`sanitize_name`.
        payload: parsed JSON; must pass :func:`validate_voice_style_format`.
        builtin_names: names reserved by the bundled model; conflict → 400.
        overwrite: if False, conflict with an existing custom name → 409.

    Returns:
        The path the style was written to.
    """
    name = sanitize_name(name)
    if name in set(builtin_names):
        raise StyleNameConflict(f"Name {name!r} is a built-in voice and cannot be overwritten")
    if not validate_voice_style_format(payload):
        # Re-using the SDK error type so server handlers can map uniformly.
        raise ValueError("voice style JSON is missing required keys/fields")

    directory.mkdir(parents=True, exist_ok=True)
    target = directory / f"{name}.json"
    if target.exists() and not overwrite:
        raise StyleNameConflict(f"Style {name!r} already exists")
    tmp = target.with_suffix(".json.tmp")
    with tmp.open("w", encoding="utf-8") as f:
        json.dump(payload, f)
    tmp.replace(target)
    return target

audio ¶

Audio encoding helpers for the local TTS server.

Only formats reachable through soundfile (libsndfile) at the model's native 44.1 kHz are supported, so the server adds no extra system dependencies beyond what the SDK already requires. MP3 / AAC / Opus are intentionally rejected with a clear error rather than silently emitting WAV — clients should detect the unsupported format and fall back.

(Opus is excluded for now because libsndfile's OGG/OPUS encoder only accepts 8/12/16/24/48 kHz, and we'd rather error clearly than ship a broken format. Re-add it once we have a resampling step.)

Classes:

Name	Description
`UnsupportedAudioFormat`	Raised when the caller asks for a format we cannot encode.

Functions:

Name	Description
`format_to_mime`
`encode_audio`	Encode a synthesized waveform into `fmt` bytes.
`duration_seconds`
`coerce_response_format`	Validate and normalize a user-supplied `response_format`.

Attributes:

Name	Type	Description
`SUPPORTED_FORMATS`

SUPPORTED_FORMATS `module-attribute` ¶

SUPPORTED_FORMATS = tuple(keys())

UnsupportedAudioFormat ¶

Bases: ValueError

Raised when the caller asks for a format we cannot encode.

format_to_mime ¶

format_to_mime(fmt: str) -> str

Source code in supertonic/server/audio.py

def format_to_mime(fmt: str) -> str:
    entry = _FORMATS.get(fmt)
    if entry is None:
        raise UnsupportedAudioFormat(fmt)
    return entry[2]

encode_audio ¶

encode_audio(
    wav: ndarray, sample_rate: int, fmt: str
) -> bytes

Encode a synthesized waveform into fmt bytes.

Parameters:

Name	Type	Description	Default
`wav`	`ndarray`	ndarray of shape `(1, num_samples)` or `(num_samples,)` — the shape produced by :meth:`supertonic.TTS.synthesize`.	required
`sample_rate`	`int`	model sample rate (e.g. 44100).	required
`fmt`	`str`	one of :data:`SUPPORTED_FORMATS`.	required

Source code in supertonic/server/audio.py

def encode_audio(wav: np.ndarray, sample_rate: int, fmt: str) -> bytes:
    """Encode a synthesized waveform into ``fmt`` bytes.

    Args:
        wav: ndarray of shape ``(1, num_samples)`` or ``(num_samples,)`` —
            the shape produced by :meth:`supertonic.TTS.synthesize`.
        sample_rate: model sample rate (e.g. 44100).
        fmt: one of :data:`SUPPORTED_FORMATS`.
    """
    entry = _FORMATS.get(fmt)
    if entry is None:
        raise UnsupportedAudioFormat(fmt)
    sf_format, subtype, _ = entry

    if wav.ndim == 2:
        # soundfile expects (frames,) or (frames, channels). The pipeline
        # returns (1, num_samples), so squeeze the leading singleton.
        wav = wav.squeeze(0)

    buf = io.BytesIO()
    sf.write(buf, wav, sample_rate, format=sf_format, subtype=subtype)
    return buf.getvalue()

duration_seconds ¶

duration_seconds(wav: ndarray, sample_rate: int) -> float

Source code in supertonic/server/audio.py

def duration_seconds(wav: np.ndarray, sample_rate: int) -> float:
    return float(wav.shape[-1]) / float(sample_rate)

coerce_response_format ¶

coerce_response_format(value: Optional[str]) -> str

Validate and normalize a user-supplied response_format.

None → "wav" (sensible default for local-host integrations). An unsupported value raises :class:UnsupportedAudioFormat so handlers can return a 400 with a stable error code.

Source code in supertonic/server/audio.py

def coerce_response_format(value: Optional[str]) -> str:
    """Validate and normalize a user-supplied ``response_format``.

    ``None`` → ``"wav"`` (sensible default for local-host integrations). An
    unsupported value raises :class:`UnsupportedAudioFormat` so handlers can
    return a 400 with a stable error code.
    """
    if value is None:
        return "wav"
    v = value.lower().strip()
    if v not in _FORMATS:
        raise UnsupportedAudioFormat(value)
    return v

routes ¶

HTTP route handlers for supertonic serve.

The route surface is intentionally narrow and follows two conventions so that existing clients work with minimal changes:

Native namespace under /v1/... for first-class Supertonic features.
OpenAI Audio Speech alias at POST /v1/audio/speech so any client that already speaks the OpenAI API (n8n OpenAI node, openai-python, many browser extensions, Electron tools) can swap the base URL.

Errors use the OpenAI-shaped envelope::

{ "error": { "message": "...", "type": "...", "code": "..." } }

so that downstream error parsers keep working.

Classes:

Name	Description
`UnknownVoice`	Voice name does not match any built-in or imported style.

Functions:

Name	Description
`register_routes`	Attach all `/v1/...` routes to `app`.

Attributes:

Name	Type	Description
`logger`
`MAX_STYLE_IMPORT_BYTES`

logger `module-attribute` ¶

logger = getLogger(__name__)

MAX_STYLE_IMPORT_BYTES `module-attribute` ¶

MAX_STYLE_IMPORT_BYTES = 1 * 1024 * 1024

UnknownVoice ¶

Bases: LookupError

Voice name does not match any built-in or imported style.

register_routes ¶

register_routes(app: FastAPI) -> None

Attach all /v1/... routes to app.

Called from :func:supertonic.server.app.create_app after the lifespan and app.state.server_state have been set up.

Source code in supertonic/server/routes.py

def register_routes(app: FastAPI) -> None:
    """Attach all `/v1/...` routes to ``app``.

    Called from :func:`supertonic.server.app.create_app` after the lifespan and
    ``app.state.server_state`` have been set up.
    """
    router = APIRouter()

    @router.get("/v1/health", response_model=HealthResponse)
    def health(request: Request):
        state = _state(request)
        if not state.is_ready or state.tts is None:
            return JSONResponse(
                status_code=503,
                content=HealthResponse(
                    status="loading",
                    model=state.model,
                    version=__version__,
                    voices_loaded=0,
                ).model_dump(),
            )
        return HealthResponse(
            status="ok",
            model=state.model,
            sample_rate=state.tts.sample_rate,
            version=__version__,
            voices_loaded=len(state.tts.voice_style_names) + len(state.custom_styles),
        )

    @router.get("/v1/styles", response_model=StylesResponse)
    def list_styles(request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        builtin = [StyleInfo(name=n, kind="builtin") for n in state.tts.voice_style_names]
        custom = [
            StyleInfo(name=n, kind="custom", path=str(p))
            for n, p in sorted(state.custom_styles.items())
        ]
        return StylesResponse(styles=builtin + custom)

    @router.post("/v1/styles/import", response_model=StyleImportResponse)
    async def import_style(
        request: Request,
        overwrite: bool = False,
        file: Optional[UploadFile] = File(None),
        name: Optional[str] = Form(None),
    ):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")

        ct = request.headers.get("content-type", "")
        chosen_name: Optional[str]
        if ct.startswith("multipart/form-data"):
            if file is None:
                return _error(400, "missing 'file' part", "missing_file")
            # Read with an explicit cap as a fallback for chunked uploads
            # that bypass the middleware's Content-Length pre-flight check.
            raw = await file.read(MAX_STYLE_IMPORT_BYTES + 1)
            if len(raw) > MAX_STYLE_IMPORT_BYTES:
                return _error(
                    413,
                    f"uploaded voice style exceeds {MAX_STYLE_IMPORT_BYTES} bytes",
                    "payload_too_large",
                )
            try:
                data = json.loads(raw)
            except json.JSONDecodeError as e:
                return _error(400, f"invalid JSON in uploaded file: {e}", "invalid_json")
            chosen_name = name or Path(file.filename or "").stem or "imported"
        else:
            try:
                body = await request.json()
            except json.JSONDecodeError:
                return _error(400, "invalid JSON body", "invalid_json")
            if not isinstance(body, dict):
                return _error(400, "JSON body must be an object", "invalid_body")
            chosen_name = body.get("name")
            if not chosen_name:
                return _error(400, "missing 'name' in JSON body", "missing_name")
            data = {k: body[k] for k in ("style_ttl", "style_dp") if k in body}

        try:
            target = styles_store.save(
                state.custom_styles_dir,
                chosen_name,
                data,
                builtin_names=state.tts.voice_style_names,
                overwrite=overwrite,
            )
        except styles_store.InvalidStyleName as e:
            return _error(400, str(e), "invalid_style_name")
        except styles_store.StyleNameConflict as e:
            status = 409 if "already exists" in str(e) else 400
            return _error(status, str(e), "style_name_conflict")
        except ValueError as e:
            return _error(400, str(e), "invalid_style_payload")

        state.custom_styles[target.stem] = target
        return StyleImportResponse(name=target.stem, stored_at=str(target))

    @router.post("/v1/tts")
    def synth_native(req: TTSRequest, request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}",
                "unsupported_response_format",
            )
        err = _validate_lang(req.lang)
        if err is not None:
            return err
        try:
            wav, dur = _do_synthesize(
                state,
                text=req.text,
                voice=req.voice,
                lang=req.lang,
                speed=req.speed,
                steps=req.steps,
                max_chunk_length=req.max_chunk_length,
                silence_duration=req.silence_duration,
            )
        except UnknownVoice as e:
            return _error(400, f"unknown voice {str(e)!r}", "unknown_voice")
        except Exception as e:  # noqa: BLE001 — surface as 500 with code
            logger.exception("synthesis failed")
            return _error(500, f"synthesis failed: {e}", "synthesis_failed", type_="server_error")
        return _audio_response(state, wav, fmt, dur)

    @router.post("/v1/audio/speech")
    def openai_compat_speech(req: OpenAISpeechRequest, request: Request):
        # Validate ``model`` against AVAILABLE_MODELS but only *accept* the
        # model currently loaded — switching at request time is out of scope.
        state = _state(request)
        if req.model not in OpenAISpeechRequest.valid_models():
            return _error(
                400,
                f"unknown model {req.model!r}; valid: {', '.join(OpenAISpeechRequest.valid_models())}",
                "unknown_model",
            )
        if req.model != state.model:
            return _error(
                400,
                f"this server serves {state.model!r}; request asked for {req.model!r}. "
                f"Restart with --model {req.model} to switch.",
                "model_not_loaded",
            )
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        # OpenAI clients default to ``response_format='mp3'`` — surface a
        # clear error rather than silently emitting WAV.
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}; "
                f"set response_format to one of: {', '.join(SUPPORTED_FORMATS)}",
                "unsupported_response_format",
            )
        err = _validate_lang(req.lang)
        if err is not None:
            return err
        try:
            wav, dur = _do_synthesize(
                state,
                text=req.input,
                voice=req.voice,
                lang=req.lang,
                speed=req.speed,
                steps=None,
                max_chunk_length=None,
                silence_duration=None,
            )
        except UnknownVoice as e:
            return _error(400, f"unknown voice {str(e)!r}", "unknown_voice")
        except Exception as e:  # noqa: BLE001
            logger.exception("synthesis failed")
            return _error(500, f"synthesis failed: {e}", "synthesis_failed", type_="server_error")
        return _audio_response(state, wav, fmt, dur)

    @router.post("/v1/tts/batch", response_model=BatchResponse)
    def synth_batch(req: BatchRequest, request: Request):
        state = _state(request)
        if state.tts is None:
            return _error(503, "server not ready", "not_ready", type_="server_error")
        try:
            fmt = coerce_response_format(req.response_format)
        except UnsupportedAudioFormat as e:
            return _error(
                400,
                f"unsupported response_format {str(e)!r}",
                "unsupported_response_format",
            )
        defaults = req.defaults
        results: list[BatchResultItem] = []
        for idx, item in enumerate(req.items):
            voice = item.voice or (defaults.voice if defaults else None) or "M1"
            lang = item.lang or (defaults.lang if defaults else None)
            speed = item.speed if item.speed is not None else (defaults.speed if defaults else None)
            steps = item.steps if item.steps is not None else (defaults.steps if defaults else None)
            mcl = (
                item.max_chunk_length
                if item.max_chunk_length is not None
                else (defaults.max_chunk_length if defaults else None)
            )
            sil = (
                item.silence_duration
                if item.silence_duration is not None
                else (defaults.silence_duration if defaults else None)
            )
            if lang is not None and lang not in AVAILABLE_LANGUAGES:
                return _error(
                    400,
                    f"items[{idx}].lang: unsupported lang {lang!r}",
                    "unsupported_lang",
                )
            try:
                wav, dur = _do_synthesize(
                    state,
                    text=item.text,
                    voice=voice,
                    lang=lang,
                    speed=speed,
                    steps=steps,
                    max_chunk_length=mcl,
                    silence_duration=sil,
                )
            except UnknownVoice as e:
                return _error(
                    400,
                    f"items[{idx}]: unknown voice {str(e)!r}",
                    "unknown_voice",
                )
            except Exception as e:  # noqa: BLE001
                logger.exception("batch item %d synthesis failed", idx)
                return _error(
                    500,
                    f"items[{idx}]: synthesis failed: {e}",
                    "synthesis_failed",
                    type_="server_error",
                )
            body = encode_audio(wav, state.tts.sample_rate, fmt)
            results.append(
                BatchResultItem(
                    audio_base64=base64.b64encode(body).decode("ascii"),
                    duration_s=dur,
                    format=fmt,
                    sample_rate=state.tts.sample_rate,
                )
            )
        return BatchResponse(items=results)

    app.include_router(router)

schemas ¶

Pydantic request/response schemas for the local TTS server.

The wire format mirrors common TTS-server conventions so existing clients (n8n HTTP nodes, openedai-speech-compatible browser extensions, OpenAI SDKs) can talk to supertonic serve with little or no code change.

Classes:

Name	Description
`TTSRequest`	Native synthesis request — `POST /v1/tts`.
`OpenAISpeechRequest`	OpenAI Audio Speech-compatible request — `POST /v1/audio/speech`.
`BatchItem`
`BatchDefaults`
`BatchRequest`
`BatchResultItem`
`BatchResponse`
`StyleInfo`
`StylesResponse`
`StyleImportJSON`	JSON-body variant of `POST /v1/styles/import`.
`StyleImportResponse`
`HealthResponse`
`ErrorDetail`
`ErrorEnvelope`	OpenAI-shaped error envelope so integrators can reuse existing parsers.

TTSRequest ¶

Bases: BaseModel

Native synthesis request — POST /v1/tts.

Attributes:

Name	Type	Description
`text`	`str`
`voice`	`str`
`lang`	`Optional[str]`
`speed`	`Optional[float]`
`steps`	`Optional[int]`
`max_chunk_length`	`Optional[int]`
`silence_duration`	`Optional[float]`
`response_format`	`Optional[str]`

text `class-attribute` `instance-attribute` ¶

text: str = Field(
    ..., min_length=1, description="Text to synthesize"
)

voice `class-attribute` `instance-attribute` ¶

voice: str = Field(
    "M1",
    description="Voice style name (built-in or imported)",
)

lang `class-attribute` `instance-attribute` ¶

lang: Optional[str] = Field(
    None, description="Language code or 'na' for fallback"
)

speed `class-attribute` `instance-attribute` ¶

speed: Optional[float] = Field(None, ge=0.7, le=2.0)

steps `class-attribute` `instance-attribute` ¶

steps: Optional[int] = Field(None, ge=1, le=100)

max_chunk_length `class-attribute` `instance-attribute` ¶

max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)

silence_duration `class-attribute` `instance-attribute` ¶

silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)

response_format `class-attribute` `instance-attribute` ¶

response_format: Optional[str] = Field(
    None, description=f"One of: {join(SUPPORTED_FORMATS)}"
)

OpenAISpeechRequest ¶

Bases: BaseModel

OpenAI Audio Speech-compatible request — POST /v1/audio/speech.

Field names match the OpenAI API so existing clients (n8n's OpenAI node, openai-python, openedai-speech-style browser extensions) only need to swap the base URL.

Methods:

Name	Description
`valid_models`

Attributes:

Name	Type	Description
`model`	`str`
`input`	`str`
`voice`	`str`
`response_format`	`Optional[str]`
`speed`	`Optional[float]`
`lang`	`Optional[str]`

model `class-attribute` `instance-attribute` ¶

model: str = Field(
    ...,
    description="Model name (must match the loaded model)",
)

input `class-attribute` `instance-attribute` ¶

input: str = Field(
    ..., min_length=1, description="Text to synthesize"
)

voice `class-attribute` `instance-attribute` ¶

voice: str = Field('M1', description='Voice style name')

response_format `class-attribute` `instance-attribute` ¶

response_format: Optional[str] = Field(None)

speed `class-attribute` `instance-attribute` ¶

speed: Optional[float] = Field(None, ge=0.7, le=2.0)

lang `class-attribute` `instance-attribute` ¶

lang: Optional[str] = Field(None)

valid_models `classmethod` ¶

valid_models() -> tuple[str, ...]

Source code in supertonic/server/schemas.py

@classmethod
def valid_models(cls) -> tuple[str, ...]:
    return tuple(AVAILABLE_MODELS)

BatchItem ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`text`	`str`
`voice`	`str`
`lang`	`Optional[str]`
`speed`	`Optional[float]`
`steps`	`Optional[int]`
`max_chunk_length`	`Optional[int]`
`silence_duration`	`Optional[float]`

text `class-attribute` `instance-attribute` ¶

text: str = Field(..., min_length=1)

voice `class-attribute` `instance-attribute` ¶

voice: str = Field('M1')

lang `class-attribute` `instance-attribute` ¶

lang: Optional[str] = None

speed `class-attribute` `instance-attribute` ¶

speed: Optional[float] = Field(None, ge=0.7, le=2.0)

steps `class-attribute` `instance-attribute` ¶

steps: Optional[int] = Field(None, ge=1, le=100)

max_chunk_length `class-attribute` `instance-attribute` ¶

max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)

silence_duration `class-attribute` `instance-attribute` ¶

silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)

BatchDefaults ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`voice`	`Optional[str]`
`lang`	`Optional[str]`
`speed`	`Optional[float]`
`steps`	`Optional[int]`
`max_chunk_length`	`Optional[int]`
`silence_duration`	`Optional[float]`

voice `class-attribute` `instance-attribute` ¶

voice: Optional[str] = None

lang `class-attribute` `instance-attribute` ¶

lang: Optional[str] = None

speed `class-attribute` `instance-attribute` ¶

speed: Optional[float] = Field(None, ge=0.7, le=2.0)

steps `class-attribute` `instance-attribute` ¶

steps: Optional[int] = Field(None, ge=1, le=100)

max_chunk_length `class-attribute` `instance-attribute` ¶

max_chunk_length: Optional[int] = Field(
    None, ge=1, le=10000
)

silence_duration `class-attribute` `instance-attribute` ¶

silence_duration: Optional[float] = Field(
    None, ge=0.0, le=10.0
)

BatchRequest ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`items`	`List[BatchItem]`
`response_format`	`Optional[str]`
`defaults`	`Optional[BatchDefaults]`

items `class-attribute` `instance-attribute` ¶

items: List[BatchItem] = Field(
    ..., min_length=1, max_length=64
)

response_format `class-attribute` `instance-attribute` ¶

response_format: Optional[str] = None

defaults `class-attribute` `instance-attribute` ¶

defaults: Optional[BatchDefaults] = None

BatchResultItem ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`audio_base64`	`str`
`duration_s`	`float`
`format`	`str`
`sample_rate`	`int`

audio_base64 `instance-attribute` ¶

audio_base64: str

duration_s `instance-attribute` ¶

duration_s: float

format `instance-attribute` ¶

format: str

sample_rate `instance-attribute` ¶

sample_rate: int

BatchResponse ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`items`	`List[BatchResultItem]`

items `instance-attribute` ¶

items: List[BatchResultItem]

StyleInfo ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`name`	`str`
`kind`	`Literal['builtin', 'custom']`
`path`	`Optional[str]`

name `instance-attribute` ¶

name: str

kind `instance-attribute` ¶

kind: Literal['builtin', 'custom']

path `class-attribute` `instance-attribute` ¶

path: Optional[str] = None

StylesResponse ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`styles`	`List[StyleInfo]`

styles `instance-attribute` ¶

styles: List[StyleInfo]

StyleImportJSON ¶

Bases: BaseModel

JSON-body variant of POST /v1/styles/import.

The endpoint also accepts multipart/form-data with a file field; that path bypasses this schema and is handled directly in the route.

Attributes:

Name	Type	Description
`name`	`str`
`style_ttl`	`dict`
`style_dp`	`dict`

name `instance-attribute` ¶

name: str

style_ttl `instance-attribute` ¶

style_ttl: dict

style_dp `instance-attribute` ¶

style_dp: dict

StyleImportResponse ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`name`	`str`
`stored_at`	`str`

name `instance-attribute` ¶

name: str

stored_at `instance-attribute` ¶

stored_at: str

HealthResponse ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`status`	`Literal['ok', 'loading']`
`model`	`str`
`sample_rate`	`Optional[int]`
`version`	`str`
`voices_loaded`	`int`

status `instance-attribute` ¶

status: Literal['ok', 'loading']

model `class-attribute` `instance-attribute` ¶

model: str = DEFAULT_MODEL

sample_rate `class-attribute` `instance-attribute` ¶

sample_rate: Optional[int] = None

version `instance-attribute` ¶

version: str

voices_loaded `class-attribute` `instance-attribute` ¶

voices_loaded: int = 0

ErrorDetail ¶

Bases: BaseModel

Attributes:

Name	Type	Description
`message`	`str`
`type`	`str`
`code`	`Optional[str]`

message `instance-attribute` ¶

message: str

type `class-attribute` `instance-attribute` ¶

type: str = 'invalid_request_error'

code `class-attribute` `instance-attribute` ¶

code: Optional[str] = None

ErrorEnvelope ¶

Bases: BaseModel

OpenAI-shaped error envelope so integrators can reuse existing parsers.

Attributes:

Name	Type	Description
`error`	`ErrorDetail`

error `instance-attribute` ¶

error: ErrorDetail

supertonic.server¶