FastTTSR provides a RESTful API with OpenAI-compatible endpoints for text-to-speech synthesis.
http://localhost:5768
Change the port via HOST_PORT environment variable in docker-compose.yml or HTTP_PORT in the container.
Swagger UI is available at:
http://localhost:5768/swagger
OpenAPI specification:
http://localhost:5768/swagger/v1/swagger.json
Currently, no authentication is required. For production deployments, implement authentication at the reverse proxy level (nginx, API Gateway, etc.).
Check service health status.
Request:
GET /healthResponse:
{
"status": "ok"
}Status Codes:
200 OK- Service is healthy
Get detailed information about all available TTS models.
Request:
GET /api/modelsResponse:
[
{
"name": "kokoro-q4",
"displayName": "Kokoro Q4",
"description": "Kokoro ONNX Q4 model with direct OnnxRuntime inference.",
"supportedLanguages": [
"en-us", "en-gb", "es", "fr-fr", "hi",
"it", "ja-jp", "pt-br", "zh-cn"
],
"speakers": [
"af", "af_bella", "af_nicole", "af_sarah",
"af_sky", "am_adam", "am_michael", ...
],
"speakerMetadata": null
},
{
"name": "supertonic-3",
"displayName": "Supertonic 3",
"description": "Supertonic-3 multilingual ONNX TTS model — 31 languages, 10 preset voice styles, flow-matching inference.",
"supportedLanguages": [
"en", "es", "fr", "de", "it", "pt", "pl",
"tr", "ru", "nl", "cs", "ar", "zh", "ja",
"ko", "hu", "hi", "sv", "da", "no", "fi",
"el", "ro", "uk", "th", "vi", "id", "he", ...
],
"speakers": [
"F1", "F2", "F3", "F4", "F5",
"M1", "M2", "M3", "M4", "M5"
],
"speakerMetadata": [
{
"id": "F1",
"name": "Female 1",
"description": "Warm, friendly female voice"
},
...
]
}
]Status Codes:
200 OK- Success
Get list of available models in OpenAI-compatible format.
Request:
GET /v1/modelsResponse:
{
"object": "list",
"data": [
{
"id": "kokoro-q4",
"object": "model",
"created": 1715548800,
"owned_by": "fastttsr"
},
{
"id": "kokoro-full",
"object": "model",
"created": 1715548800,
"owned_by": "fastttsr"
},
{
"id": "supertonic-3",
"object": "model",
"created": 1715548800,
"owned_by": "fastttsr"
}
]
}Status Codes:
200 OK- Success
Convert text to speech audio.
Request:
POST /v1/audio/speech
Content-Type: application/json
{
"model": "kokoro-q4",
"input": "Hello world, this is a test of the text to speech system.",
"voice": "af_bella",
"speed": 1.0,
"language": "en-us",
"response_format": "wav"
}Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
model |
string | Yes | - | Model ID (kokoro-q4, kokoro-full, supertonic-3) |
input |
string | Yes | - | Text to synthesize (max ~500 characters recommended) |
voice |
string | No | First available | Speaker voice ID or OpenAI alias (alloy, echo, fable, onyx, nova, shimmer) |
speaker |
string | No | - | Alternative to voice parameter (same functionality) |
language |
string | No | en-us |
Language code (see supported languages below) |
speed |
number | No | 1.0 |
Playback speed (0.5 - 2.0) |
response_format |
string | No | wav |
Audio format (currently only wav supported) |
Supported Languages (Kokoro):
| Code | Language | Aliases |
|---|---|---|
en-us |
English (US) | a, en, english |
en-gb |
English (GB) | b |
es |
Spanish | d, spanish |
fr-fr |
French | e, f, fr, french |
hi |
Hindi | g, hindi |
it |
Italian | h, italian |
ja-jp |
Japanese | j, ja, japanese |
pt-br |
Portuguese (BR) | p, pt, portuguese |
zh-cn |
Chinese (CN) | z, zh, chinese |
Supported Languages (Supertonic-3):
31 languages including: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Japanese, Korean, Hungarian, Hindi, Swedish, Danish, Norwegian, Finnish, Greek, Romanian, Ukrainian, Thai, Vietnamese, Indonesian, Hebrew, and more.
Use 2-letter ISO codes: en, es, fr, de, it, pt, ja, ko, zh, etc.
OpenAI Voice Aliases (Kokoro):
| Alias | Maps To | Description |
|---|---|---|
alloy |
af_alloy |
Balanced, neutral voice |
echo |
am_echo |
Male voice |
fable |
bf_fable |
British female voice |
onyx |
bm_onyx |
British male voice |
nova |
af_nova |
Bright female voice |
shimmer |
af_shimmer |
Soft female voice |
Response:
Returns audio file as audio/wav stream.
Response Headers:
Content-Type: audio/wav
Content-Length: <size>
WAV Format:
- Sample Rate: 24000 Hz (Kokoro) or 22050 Hz (Supertonic-3)
- Bit Depth: 16-bit
- Channels: Mono
- Format: PCM
Status Codes:
200 OK- Success (returns WAV file)400 Bad Request- Invalid parameters404 Not Found- Model not found500 Internal Server Error- Synthesis failure (returns silent WAV)
Error Response Format:
{
"error": {
"code": "invalid_request",
"message": "Unsupported language 'xyz'. Supported languages: en-us, en-gb, es, ..."
}
}curl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-q4",
"input": "Hello world! This is a test of the FastTTSR text-to-speech system.",
"voice": "af_bella"
}' \
--output output.wavcurl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-q4",
"input": "This is a faster speech sample.",
"voice": "am_michael",
"speed": 1.5
}' \
--output fast.wavcurl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-q4",
"input": "こんにちは、世界!",
"voice": "af_bella",
"language": "ja-jp"
}' \
--output japanese.wavcurl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-q4",
"input": "你好世界!这是一个测试。",
"voice": "af_sky",
"language": "zh-cn"
}' \
--output chinese.wavcurl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "kokoro-q4",
"input": "Using OpenAI-compatible voice aliases.",
"voice": "alloy"
}' \
--output alloy.wavcurl -X POST http://localhost:5768/v1/audio/speech \
-H 'Content-Type: application/json' \
-d '{
"model": "supertonic-3",
"input": "This is the Supertonic three model with higher quality.",
"voice": "F1",
"language": "en"
}' \
--output supertonic.wavimport requests
url = "http://localhost:5768/v1/audio/speech"
headers = {"Content-Type": "application/json"}
payload = {
"model": "kokoro-q4",
"input": "Hello from Python!",
"voice": "af_bella",
"speed": 1.0
}
response = requests.post(url, json=payload, headers=headers)
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")
else:
print(f"Error: {response.status_code}")
print(response.json())from openai import OpenAI
# Point to FastTTSR instead of OpenAI
client = OpenAI(
api_key="not-needed", # No API key required
base_url="http://localhost:5768/v1"
)
response = client.audio.speech.create(
model="kokoro-q4",
voice="alloy",
input="Hello from OpenAI SDK!"
)
response.stream_to_file("output.wav")const response = await fetch('http://localhost:5768/v1/audio/speech', {
method: 'POST',
headers: {
'Content-Type': 'application/json'
},
body: JSON.stringify({
model: 'kokoro-q4',
input: 'Hello from JavaScript!',
voice: 'af_bella'
})
});
if (response.ok) {
const buffer = await response.arrayBuffer();
const fs = require('fs');
fs.writeFileSync('output.wav', Buffer.from(buffer));
console.log('Audio saved to output.wav');
} else {
const error = await response.json();
console.error('Error:', error);
}const axios = require('axios');
const fs = require('fs');
const response = await axios.post('http://localhost:5768/v1/audio/speech', {
model: 'kokoro-q4',
input: 'Hello from axios!',
voice: 'af_bella'
}, {
responseType: 'arraybuffer'
});
fs.writeFileSync('output.wav', response.data);
console.log('Audio saved to output.wav');Currently, no rate limiting is implemented at the application level. For production:
- Reverse Proxy: Implement rate limiting in nginx/Apache
- API Gateway: Use cloud provider's API Gateway with rate limits
- Application Level: Add rate limiting middleware (future enhancement)
Recommended Limits:
- 100 requests/minute per IP for free tier
- 1000 requests/minute for authenticated users
| Code | HTTP Status | Description | Solution |
|---|---|---|---|
model_not_found |
404 | Model ID not found | Check available models with GET /api/models |
invalid_request |
400 | Invalid request parameters | Check parameter format and values |
unsupported_language |
400 | Language not supported by model | Use supported language from model info |
unsupported_speaker |
400 | Speaker/voice not found | Use valid speaker from model info |
synthesis_failure |
500 | TTS synthesis failed | Check logs; service returns silent WAV |
Typical response times (including audio generation):
| Model | Input Length | Response Time | Notes |
|---|---|---|---|
| kokoro-q4 | 50 chars | 60-100ms | Fastest |
| kokoro-q4 | 200 chars | 150-250ms | Typical sentence |
| kokoro-full | 50 chars | 80-120ms | Higher quality |
| kokoro-full | 200 chars | 200-350ms | Best quality |
| supertonic-3 | 50 chars | 100-200ms | Multi-model |
| supertonic-3 | 200 chars | 300-500ms | Highest quality |
Factors Affecting Performance:
- Input text length
- Model complexity
- Server CPU/GPU
- Concurrent requests
- First request (model loading)
- Length: Keep under 500 characters for optimal performance
- Format: Plain text works best
- Special Characters: Automatically sanitized
- Markdown: Automatically cleaned (bullets, formatting, etc.)
- Always specify language for non-English text
- Use correct language code for pronunciation
- Mixed-language input may produce unexpected results
- Range: 0.5 (slow) to 2.0 (fast)
- Default: 1.0 (normal)
- Values outside range will be clamped
- Always check HTTP status code
- Parse error response for details
- Implement retry logic with exponential backoff
- On 500 errors, service returns valid (silent) WAV
- Reuse connections (HTTP keep-alive)
- Batch requests when possible
- Cache audio for repeated phrases
- Use appropriate model for use case (q4 vs full)
FastTTSR implements a subset of OpenAI's /v1/audio/speech endpoint:
Compatible:
- ✅
modelparameter - ✅
inputparameter - ✅
voiceparameter - ✅
speedparameter - ✅ Response format (audio/wav)
Extensions:
- ➕
languageparameter (explicitly set language) - ➕
speakerparameter (alternative tovoice) - ➕ Extended speaker catalog (510+ voices)
- ➕ Multiple TTS models
Not Implemented:
- ❌
response_formatother thanwav(mp3, opus, aac, flac) - ❌ Audio streaming (planned)
To migrate from OpenAI TTS:
- Change
base_urlto FastTTSR endpoint - Remove or ignore
api_key(not required) - Convert voice names if using custom voices
- Add explicit
languageparameter for non-English
Example:
# Before (OpenAI)
client = OpenAI(api_key="sk-...")
# After (FastTTSR)
client = OpenAI(
api_key="not-needed",
base_url="http://localhost:5768/v1"
)- Audio streaming (chunked transfer)
- Additional audio formats (mp3, opus, flac)
- Batch synthesis endpoint
- SSML support
- Voice cloning
- Real-time WebSocket API
- Audio post-processing (effects, normalization)
- Rate limiting and quota management
- API key authentication