Audio (STT & TTS)

Overview

The audio endpoints support two operations:

Speech-to-Text (STT) — transcribe audio files into text
Text-to-Speech (TTS) — convert text into spoken audio

Both follow the OpenAI audio API format.

Speech-to-Text (Transcription)

Transcribe audio into text using the transcriptions endpoint.

POST https://api.universal-ai.dev/v1/audio/transcriptions

Request

This endpoint accepts multipart/form-data with the audio file and parameters.

Parameter	Type	Required	Description
`file`	file	Yes	The audio file to transcribe. Supported formats: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`, `ogg`, `flac`.
`model`	string	Yes	The transcription model to use (e.g., `whisper-1`, `cf/whisper`).
`language`	string	No	ISO 639-1 language code (e.g., `en`, `es`, `fr`). Improves accuracy when specified.
`prompt`	string	No	Optional context or spelling guidance for the transcription.
`response_format`	string	No	Output format: `json` (default), `text`, `srt`, `verbose_json`, or `vtt`.
`temperature`	number	No	Sampling temperature between 0 and 1. Default: `0`.

Example Request

curl https://api.universal-ai.dev/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F file=@recording.mp3 \
  -F model=whisper-1 \
  -F language=en

Response

{
  "text": "Hello, this is a test recording for the Universal AI API transcription service."
}

With response_format: "verbose_json":

{
  "task": "transcribe",
  "language": "english",
  "duration": 5.42,
  "text": "Hello, this is a test recording for the Universal AI API transcription service.",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.8,
      "text": "Hello, this is a test recording"
    },
    {
      "id": 1,
      "start": 2.8,
      "end": 5.42,
      "text": " for the Universal AI API transcription service."
    }
  ]
}

SDK Examples

Python:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.universal-ai.dev/v1"
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
        language="en"
    )

print(transcript.text)

JavaScript / TypeScript:

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.universal-ai.dev/v1",
});

const transcript = await client.audio.transcriptions.create({
  model: "whisper-1",
  file: fs.createReadStream("recording.mp3"),
  language: "en",
});

console.log(transcript.text);

Text-to-Speech

Generate spoken audio from text using the speech endpoint.

POST https://api.universal-ai.dev/v1/audio/speech

Request

Parameter	Type	Required	Description
`model`	string	Yes	The TTS model to use (e.g., `tts-1`, `tts-1-hd`).
`input`	string	Yes	The text to convert to speech. Maximum 4,096 characters.
`voice`	string	Yes	The voice to use: `alloy`, `echo`, `fable`, `onyx`, `nova`, or `shimmer`.
`response_format`	string	No	Audio format: `mp3` (default), `opus`, `aac`, `flac`, or `wav`.
`speed`	number	No	Playback speed from 0.25 to 4.0. Default: `1.0`.

Example Request

curl https://api.universal-ai.dev/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1-hd",
    "input": "Welcome to the Universal AI API. This is a text-to-speech demonstration.",
    "voice": "nova"
  }' \
  --output speech.mp3

The response body is the raw audio file in the requested format.

SDK Examples

Python:

from openai import OpenAI
from pathlib import Path

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.universal-ai.dev/v1"
)

response = client.audio.speech.create(
    model="tts-1-hd",
    input="Welcome to the Universal AI API.",
    voice="nova"
)

Path("output.mp3").write_bytes(response.content)

JavaScript / TypeScript:

import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.universal-ai.dev/v1",
});

const response = await client.audio.speech.create({
  model: "tts-1-hd",
  input: "Welcome to the Universal AI API.",
  voice: "nova",
});

const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", buffer);

Supported Models

Transcription Models

Model ID	Provider	Description
`whisper-1`	OpenAI	Whisper large-v2 — supports 50+ languages
`cf/whisper`	Cloudflare	Whisper on Workers AI (low latency, no egress cost)
`cf/whisper-tiny-en`	Cloudflare	Whisper Tiny English-only (fastest)
`deepgram/nova-2`	Deepgram	Nova-2 — high accuracy, real-time capable

Text-to-Speech Models

Model ID	Provider	Description
`tts-1`	OpenAI	Standard quality, low latency
`tts-1-hd`	OpenAI	High definition, richer audio
`elevenlabs/eleven_multilingual_v2`	ElevenLabs	Multilingual, expressive voices

Supported Audio Formats

Input (transcription): mp3, mp4, mpeg, mpga, m4a, wav, webm, ogg, flac

Output (speech): mp3, opus, aac, flac, wav

Maximum input file size for transcription is 25 MB.

On this page