Audio (STT & TTS)
Transcribe audio to text and generate speech from text.
Overview
The audio endpoints support two operations:
- Speech-to-Text (STT) — transcribe audio files into text
- Text-to-Speech (TTS) — convert text into spoken audio
Both follow the OpenAI audio API format.
Speech-to-Text (Transcription)
Transcribe audio into text using the transcriptions endpoint.
POST https://api.universal-ai.dev/v1/audio/transcriptionsRequest
This endpoint accepts multipart/form-data with the audio file and parameters.
| Parameter | Type | Required | Description |
|---|---|---|---|
file | file | Yes | The audio file to transcribe. Supported formats: mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac. |
model | string | Yes | The transcription model to use (e.g., whisper-1, cf/whisper). |
language | string | No | ISO 639-1 language code (e.g., en, es, fr). Improves accuracy when specified. |
prompt | string | No | Optional context or spelling guidance for the transcription. |
response_format | string | No | Output format: json (default), text, srt, verbose_json, or vtt. |
temperature | number | No | Sampling temperature between 0 and 1. Default: 0. |
Example Request
curl https://api.universal-ai.dev/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F file=@recording.mp3 \
-F model=whisper-1 \
-F language=enResponse
{
"text": "Hello, this is a test recording for the Universal AI API transcription service."
}With response_format: "verbose_json":
{
"task": "transcribe",
"language": "english",
"duration": 5.42,
"text": "Hello, this is a test recording for the Universal AI API transcription service.",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 2.8,
"text": "Hello, this is a test recording"
},
{
"id": 1,
"start": 2.8,
"end": 5.42,
"text": " for the Universal AI API transcription service."
}
]
}SDK Examples
Python:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.universal-ai.dev/v1"
)
with open("recording.mp3", "rb") as audio_file:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language="en"
)
print(transcript.text)JavaScript / TypeScript:
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.universal-ai.dev/v1",
});
const transcript = await client.audio.transcriptions.create({
model: "whisper-1",
file: fs.createReadStream("recording.mp3"),
language: "en",
});
console.log(transcript.text);Text-to-Speech
Generate spoken audio from text using the speech endpoint.
POST https://api.universal-ai.dev/v1/audio/speechRequest
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | The TTS model to use (e.g., tts-1, tts-1-hd). |
input | string | Yes | The text to convert to speech. Maximum 4,096 characters. |
voice | string | Yes | The voice to use: alloy, echo, fable, onyx, nova, or shimmer. |
response_format | string | No | Audio format: mp3 (default), opus, aac, flac, or wav. |
speed | number | No | Playback speed from 0.25 to 4.0. Default: 1.0. |
Example Request
curl https://api.universal-ai.dev/v1/audio/speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1-hd",
"input": "Welcome to the Universal AI API. This is a text-to-speech demonstration.",
"voice": "nova"
}' \
--output speech.mp3The response body is the raw audio file in the requested format.
SDK Examples
Python:
from openai import OpenAI
from pathlib import Path
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.universal-ai.dev/v1"
)
response = client.audio.speech.create(
model="tts-1-hd",
input="Welcome to the Universal AI API.",
voice="nova"
)
Path("output.mp3").write_bytes(response.content)JavaScript / TypeScript:
import OpenAI from "openai";
import fs from "fs";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.universal-ai.dev/v1",
});
const response = await client.audio.speech.create({
model: "tts-1-hd",
input: "Welcome to the Universal AI API.",
voice: "nova",
});
const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", buffer);Supported Models
Transcription Models
| Model ID | Provider | Description |
|---|---|---|
whisper-1 | OpenAI | Whisper large-v2 — supports 50+ languages |
cf/whisper | Cloudflare | Whisper on Workers AI (low latency, no egress cost) |
cf/whisper-tiny-en | Cloudflare | Whisper Tiny English-only (fastest) |
deepgram/nova-2 | Deepgram | Nova-2 — high accuracy, real-time capable |
Text-to-Speech Models
| Model ID | Provider | Description |
|---|---|---|
tts-1 | OpenAI | Standard quality, low latency |
tts-1-hd | OpenAI | High definition, richer audio |
elevenlabs/eleven_multilingual_v2 | ElevenLabs | Multilingual, expressive voices |
Supported Audio Formats
Input (transcription): mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac
Output (speech): mp3, opus, aac, flac, wav
Maximum input file size for transcription is 25 MB.