API Reference

FastTTSR provides a RESTful API with OpenAI-compatible endpoints for text-to-speech synthesis.

Base URL

http://localhost:5768

Change the port via HOST_PORT environment variable in docker-compose.yml or HTTP_PORT in the container.

Interactive Documentation

Swagger UI is available at:

http://localhost:5768/swagger

OpenAPI specification:

http://localhost:5768/swagger/v1/swagger.json

Authentication

Currently, no authentication is required. For production deployments, implement authentication at the reverse proxy level (nginx, API Gateway, etc.).

Endpoints

Health Check

Check service health status.

Request:

GET /health

Response:

{
  "status": "ok"
}

Status Codes:

200 OK - Service is healthy

List Available Models (Detailed)

Get detailed information about all available TTS models.

Request:

GET /api/models

Response:

[
  {
    "name": "kokoro-q4",
    "displayName": "Kokoro Q4",
    "description": "Kokoro ONNX Q4 model with direct OnnxRuntime inference.",
    "supportedLanguages": [
      "en-us", "en-gb", "es", "fr-fr", "hi", 
      "it", "ja-jp", "pt-br", "zh-cn"
    ],
    "speakers": [
      "af", "af_bella", "af_nicole", "af_sarah", 
      "af_sky", "am_adam", "am_michael", ...
    ],
    "speakerMetadata": null
  },
  {
    "name": "supertonic-3",
    "displayName": "Supertonic 3",
    "description": "Supertonic-3 multilingual ONNX TTS model — 31 languages, 10 preset voice styles, flow-matching inference.",
    "supportedLanguages": [
      "en", "es", "fr", "de", "it", "pt", "pl", 
      "tr", "ru", "nl", "cs", "ar", "zh", "ja", 
      "ko", "hu", "hi", "sv", "da", "no", "fi", 
      "el", "ro", "uk", "th", "vi", "id", "he", ...
    ],
    "speakers": [
      "F1", "F2", "F3", "F4", "F5",
      "M1", "M2", "M3", "M4", "M5"
    ],
    "speakerMetadata": [
      {
        "id": "F1",
        "name": "Female 1",
        "description": "Warm, friendly female voice"
      },
      ...
    ]
  }
]

Status Codes:

200 OK - Success

List Models (OpenAI Compatible)

Get list of available models in OpenAI-compatible format.

Request:

GET /v1/models

Response:

{
  "object": "list",
  "data": [
    {
      "id": "kokoro-q4",
      "object": "model",
      "created": 1715548800,
      "owned_by": "fastttsr"
    },
    {
      "id": "kokoro-full",
      "object": "model",
      "created": 1715548800,
      "owned_by": "fastttsr"
    },
    {
      "id": "supertonic-3",
      "object": "model",
      "created": 1715548800,
      "owned_by": "fastttsr"
    }
  ]
}

Status Codes:

200 OK - Success

Synthesize Speech (OpenAI Compatible)

Convert text to speech audio.

Request:

POST /v1/audio/speech
Content-Type: application/json

{
  "model": "kokoro-q4",
  "input": "Hello world, this is a test of the text to speech system.",
  "voice": "af_bella",
  "speed": 1.0,
  "language": "en-us",
  "response_format": "wav"
}

Parameters:

Parameter	Type	Required	Default	Description
`model`	string	Yes	-	Model ID (`kokoro-q4`, `kokoro-full`, `supertonic-3`)
`input`	string	Yes	-	Text to synthesize (max ~500 characters recommended)
`voice`	string	No	First available	Speaker voice ID or OpenAI alias (`alloy`, `echo`, `fable`, `onyx`, `nova`, `shimmer`)
`speaker`	string	No	-	Alternative to `voice` parameter (same functionality)
`language`	string	No	`en-us`	Language code (see supported languages below)
`speed`	number	No	`1.0`	Playback speed (0.5 - 2.0)
`response_format`	string	No	`wav`	Audio format (currently only `wav` supported)

Supported Languages (Kokoro):

Code	Language	Aliases
`en-us`	English (US)	`a`, `en`, `english`
`en-gb`	English (GB)	`b`
`es`	Spanish	`d`, `spanish`
`fr-fr`	French	`e`, `f`, `fr`, `french`
`hi`	Hindi	`g`, `hindi`
`it`	Italian	`h`, `italian`
`ja-jp`	Japanese	`j`, `ja`, `japanese`
`pt-br`	Portuguese (BR)	`p`, `pt`, `portuguese`
`zh-cn`	Chinese (CN)	`z`, `zh`, `chinese`

Supported Languages (Supertonic-3):

31 languages including: English, Spanish, French, German, Italian, Portuguese, Polish, Turkish, Russian, Dutch, Czech, Arabic, Chinese, Japanese, Korean, Hungarian, Hindi, Swedish, Danish, Norwegian, Finnish, Greek, Romanian, Ukrainian, Thai, Vietnamese, Indonesian, Hebrew, and more.

Use 2-letter ISO codes: en, es, fr, de, it, pt, ja, ko, zh, etc.

OpenAI Voice Aliases (Kokoro):

Alias	Maps To	Description
`alloy`	`af_alloy`	Balanced, neutral voice
`echo`	`am_echo`	Male voice
`fable`	`bf_fable`	British female voice
`onyx`	`bm_onyx`	British male voice
`nova`	`af_nova`	Bright female voice
`shimmer`	`af_shimmer`	Soft female voice

Response:

Returns audio file as audio/wav stream.

Response Headers:

Content-Type: audio/wav
Content-Length: <size>

WAV Format:

Sample Rate: 24000 Hz (Kokoro) or 22050 Hz (Supertonic-3)
Bit Depth: 16-bit
Channels: Mono
Format: PCM

Status Codes:

200 OK - Success (returns WAV file)
400 Bad Request - Invalid parameters
404 Not Found - Model not found
500 Internal Server Error - Synthesis failure (returns silent WAV)

Error Response Format:

{
  "error": {
    "code": "invalid_request",
    "message": "Unsupported language 'xyz'. Supported languages: en-us, en-gb, es, ..."
  }
}

cURL Examples

Basic Synthesis (Kokoro)

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-q4",
    "input": "Hello world! This is a test of the FastTTSR text-to-speech system.",
    "voice": "af_bella"
  }' \
  --output output.wav

With Speed Control

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-q4",
    "input": "This is a faster speech sample.",
    "voice": "am_michael",
    "speed": 1.5
  }' \
  --output fast.wav

Japanese Synthesis

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-q4",
    "input": "こんにちは、世界！",
    "voice": "af_bella",
    "language": "ja-jp"
  }' \
  --output japanese.wav

Chinese Synthesis

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-q4",
    "input": "你好世界！这是一个测试。",
    "voice": "af_sky",
    "language": "zh-cn"
  }' \
  --output chinese.wav

Using OpenAI Aliases

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "kokoro-q4",
    "input": "Using OpenAI-compatible voice aliases.",
    "voice": "alloy"
  }' \
  --output alloy.wav

Supertonic-3 Model

curl -X POST http://localhost:5768/v1/audio/speech \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "supertonic-3",
    "input": "This is the Supertonic three model with higher quality.",
    "voice": "F1",
    "language": "en"
  }' \
  --output supertonic.wav

Python Examples

Using requests

import requests

url = "http://localhost:5768/v1/audio/speech"
headers = {"Content-Type": "application/json"}
payload = {
    "model": "kokoro-q4",
    "input": "Hello from Python!",
    "voice": "af_bella",
    "speed": 1.0
}

response = requests.post(url, json=payload, headers=headers)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio saved to output.wav")
else:
    print(f"Error: {response.status_code}")
    print(response.json())

Using OpenAI SDK

from openai import OpenAI

# Point to FastTTSR instead of OpenAI
client = OpenAI(
    api_key="not-needed",  # No API key required
    base_url="http://localhost:5768/v1"
)

response = client.audio.speech.create(
    model="kokoro-q4",
    voice="alloy",
    input="Hello from OpenAI SDK!"
)

response.stream_to_file("output.wav")

JavaScript/Node.js Examples

Using fetch

const response = await fetch('http://localhost:5768/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'kokoro-q4',
    input: 'Hello from JavaScript!',
    voice: 'af_bella'
  })
});

if (response.ok) {
  const buffer = await response.arrayBuffer();
  const fs = require('fs');
  fs.writeFileSync('output.wav', Buffer.from(buffer));
  console.log('Audio saved to output.wav');
} else {
  const error = await response.json();
  console.error('Error:', error);
}

Using axios

const axios = require('axios');
const fs = require('fs');

const response = await axios.post('http://localhost:5768/v1/audio/speech', {
  model: 'kokoro-q4',
  input: 'Hello from axios!',
  voice: 'af_bella'
}, {
  responseType: 'arraybuffer'
});

fs.writeFileSync('output.wav', response.data);
console.log('Audio saved to output.wav');

Rate Limiting

Currently, no rate limiting is implemented at the application level. For production:

Reverse Proxy: Implement rate limiting in nginx/Apache
API Gateway: Use cloud provider's API Gateway with rate limits
Application Level: Add rate limiting middleware (future enhancement)

Recommended Limits:

100 requests/minute per IP for free tier
1000 requests/minute for authenticated users

Error Codes

Code	HTTP Status	Description	Solution
`model_not_found`	404	Model ID not found	Check available models with `GET /api/models`
`invalid_request`	400	Invalid request parameters	Check parameter format and values
`unsupported_language`	400	Language not supported by model	Use supported language from model info
`unsupported_speaker`	400	Speaker/voice not found	Use valid speaker from model info
`synthesis_failure`	500	TTS synthesis failed	Check logs; service returns silent WAV

Response Time

Typical response times (including audio generation):

Model	Input Length	Response Time	Notes
kokoro-q4	50 chars	60-100ms	Fastest
kokoro-q4	200 chars	150-250ms	Typical sentence
kokoro-full	50 chars	80-120ms	Higher quality
kokoro-full	200 chars	200-350ms	Best quality
supertonic-3	50 chars	100-200ms	Multi-model
supertonic-3	200 chars	300-500ms	Highest quality

Factors Affecting Performance:

Input text length
Model complexity
Server CPU/GPU
Concurrent requests
First request (model loading)

Best Practices

1. Input Text

Length: Keep under 500 characters for optimal performance
Format: Plain text works best
Special Characters: Automatically sanitized
Markdown: Automatically cleaned (bullets, formatting, etc.)

2. Language Selection

Always specify language for non-English text
Use correct language code for pronunciation
Mixed-language input may produce unexpected results

3. Speed Parameter

Range: 0.5 (slow) to 2.0 (fast)
Default: 1.0 (normal)
Values outside range will be clamped

4. Error Handling

Always check HTTP status code
Parse error response for details
Implement retry logic with exponential backoff
On 500 errors, service returns valid (silent) WAV

5. Performance

Reuse connections (HTTP keep-alive)
Batch requests when possible
Cache audio for repeated phrases
Use appropriate model for use case (q4 vs full)

Compatibility

OpenAI API Compatibility

FastTTSR implements a subset of OpenAI's /v1/audio/speech endpoint:

Compatible:

✅ model parameter
✅ input parameter
✅ voice parameter
✅ speed parameter
✅ Response format (audio/wav)

Extensions:

➕ language parameter (explicitly set language)
➕ speaker parameter (alternative to voice)
➕ Extended speaker catalog (510+ voices)
➕ Multiple TTS models

Not Implemented:

❌ response_format other than wav (mp3, opus, aac, flac)
❌ Audio streaming (planned)

Migration from OpenAI

To migrate from OpenAI TTS:

Change base_url to FastTTSR endpoint
Remove or ignore api_key (not required)
Convert voice names if using custom voices
Add explicit language parameter for non-English

Example:

# Before (OpenAI)
client = OpenAI(api_key="sk-...")

# After (FastTTSR)
client = OpenAI(
    api_key="not-needed",
    base_url="http://localhost:5768/v1"
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Base URL

Interactive Documentation

Authentication

Endpoints

Health Check

List Available Models (Detailed)

List Models (OpenAI Compatible)

Synthesize Speech (OpenAI Compatible)

cURL Examples

Basic Synthesis (Kokoro)

With Speed Control

Japanese Synthesis

Chinese Synthesis

Using OpenAI Aliases

Supertonic-3 Model

Python Examples

Using requests

Using OpenAI SDK

JavaScript/Node.js Examples

Using fetch

Using axios

Rate Limiting

Error Codes

Response Time

Best Practices

1. Input Text

2. Language Selection

3. Speed Parameter

4. Error Handling

5. Performance

Compatibility

OpenAI API Compatibility

Migration from OpenAI

Future Enhancements

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Reference

Base URL

Interactive Documentation

Authentication

Endpoints

Health Check

List Available Models (Detailed)

List Models (OpenAI Compatible)

Synthesize Speech (OpenAI Compatible)

cURL Examples

Basic Synthesis (Kokoro)

With Speed Control

Japanese Synthesis

Chinese Synthesis

Using OpenAI Aliases

Supertonic-3 Model

Python Examples

Using requests

Using OpenAI SDK

JavaScript/Node.js Examples

Using fetch

Using axios

Rate Limiting

Error Codes

Response Time

Best Practices

1. Input Text

2. Language Selection

3. Speed Parameter

4. Error Handling

5. Performance

Compatibility

OpenAI API Compatibility

Migration from OpenAI

Future Enhancements