supertonic.core¶
supertonic.core ¶
Core TTS engine and text processing components.
This module contains the main Supertonic TTS engine, text processor, and supporting utilities for audio synthesis.
Classes:
| Name | Description |
|---|---|
UnicodeProcessor | Processes text into unicode indices for the TTS model. |
Style | Voice style representation for TTS synthesis. |
Supertonic | Core TTS engine for Supertonic speech synthesis. |
Functions:
| Name | Description |
|---|---|
length_to_mask | Convert lengths to binary mask. |
get_latent_mask | Generate mask for latent representations. |
Attributes:
| Name | Type | Description |
|---|---|---|
logger | |
length_to_mask ¶
Convert lengths to binary mask.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lengths | ndarray | (B,) | required |
max_len | Optional[int] | int | None |
Returns:
| Name | Type | Description |
|---|---|---|
mask | ndarray | (B, 1, max_len) |
Source code in supertonic/core.py
get_latent_mask ¶
get_latent_mask(
wav_lengths: ndarray,
base_chunk_size: int,
chunk_compress_factor: int,
) -> ndarray
Generate mask for latent representations.
Source code in supertonic/core.py
UnicodeProcessor ¶
UnicodeProcessor(unicode_indexer_path: str)
Processes text into unicode indices for the TTS model.
This class handles text preprocessing, normalization, and conversion to numeric indices that the TTS model can understand.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
unicode_indexer_path | str | Path to the unicode indexer JSON file | required |
Methods:
| Name | Description |
|---|---|
validate_text | Validate if text can be processed by the model. |
validate_text_list | Validate a list of texts. |
Attributes:
| Name | Type | Description |
|---|---|---|
indexer | | |
supported_chars | | |
supported_character_set | set[str] | |
Source code in supertonic/core.py
validate_text ¶
Validate if text can be processed by the model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text | str | Text to validate | required |
Returns:
| Type | Description |
|---|---|
tuple[bool, list[str]] | Tuple of (is_valid, unsupported_chars): - is_valid: True if text can be processed - unsupported_chars: List of unsupported characters (empty if valid) |
Example
Source code in supertonic/core.py
validate_text_list ¶
Style ¶
Voice style representation for TTS synthesis.
This class encapsulates the style vectors used to control the voice characteristics during speech synthesis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
style_ttl_onnx | ndarray | Style vector for the text-to-latent model | required |
style_dp_onnx | ndarray | Style vector for the duration predictor | required |
Attributes:
| Name | Type | Description |
|---|---|---|
ttl | ndarray | Text-to-latent style vector |
dp | ndarray | Duration predictor style vector |
Source code in supertonic/core.py
Supertonic ¶
Supertonic(
cfgs: dict,
text_processor: UnicodeProcessor,
dp_ort: InferenceSession,
text_enc_ort: InferenceSession,
vector_est_ort: InferenceSession,
vocoder_ort: InferenceSession,
)
Core TTS engine for Supertonic speech synthesis.
This class orchestrates the entire text-to-speech pipeline, from text encoding through duration prediction and waveform generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfgs | dict | Model configuration dictionary | required |
text_processor | UnicodeProcessor | Unicode text processor instance | required |
dp_ort | InferenceSession | Duration predictor ONNX session | required |
text_enc_ort | InferenceSession | Text encoder ONNX session | required |
vector_est_ort | InferenceSession | Vector estimator ONNX session | required |
vocoder_ort | InferenceSession | Vocoder ONNX session | required |
Attributes:
| Name | Type | Description |
|---|---|---|
sample_rate | int | Audio sample rate in Hz |
base_chunk_size | int | Base chunk size for latent representation |
chunk_compress_factor | int | Compression factor for chunks |
ldim | int | Latent dimension size |
Methods:
| Name | Description |
|---|---|
sample_noisy_latent | |