Chat Completions
Generate text responses using the OpenAI-compatible chat completions endpoint.
Overview
The chat completions endpoint generates text responses given a conversation history. It follows the OpenAI chat completions format exactly, so any code written for the OpenAI API works without modification.
POST https://api.universal-ai.dev/v1/chat/completionsRequest
Headers
| Header | Required | Description |
|---|---|---|
Authorization | Yes | Bearer YOUR_API_KEY |
Content-Type | Yes | application/json |
Body Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | No | Model ID to use (e.g., gpt-4o, anthropic/claude-sonnet-4-20250514, cf/llama-3.3-70b). If omitted, the smart router selects a model automatically. |
messages | array | Yes | Array of message objects representing the conversation. |
stream | boolean | No | If true, responses are streamed as server-sent events. Default: false. |
temperature | number | No | Sampling temperature between 0 and 2. Higher values produce more random output. Default: 1. |
max_tokens | integer | No | Maximum number of tokens to generate. |
top_p | number | No | Nucleus sampling parameter between 0 and 1. Default: 1. |
frequency_penalty | number | No | Penalizes tokens based on their frequency in the output so far. Range: -2.0 to 2.0. Default: 0. |
presence_penalty | number | No | Penalizes tokens based on whether they appear in the output so far. Range: -2.0 to 2.0. Default: 0. |
stop | string or array | No | Up to 4 sequences where the model stops generating. |
n | integer | No | Number of completions to generate. Default: 1. |
Message Object
Each message in the messages array has the following structure:
| Field | Type | Description |
|---|---|---|
role | string | One of system, user, or assistant. |
content | string | The text content of the message. |
Example Request
curl https://api.universal-ai.dev/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a Python function to reverse a string."}
],
"temperature": 0.7,
"max_tokens": 500
}'Response
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here's a Python function to reverse a string:\n\n```python\ndef reverse_string(s: str) -> str:\n return s[::-1]\n```"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 38,
"total_tokens": 63
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the completion. |
object | string | Always chat.completion. |
created | integer | Unix timestamp of when the completion was created. |
model | string | The model that generated the response. |
choices | array | Array of completion choices. |
choices[].index | integer | Index of the choice. |
choices[].message | object | The generated message (with role and content). |
choices[].finish_reason | string | Why generation stopped: stop, length, or content_filter. |
usage | object | Token usage statistics for the request. |
Streaming
Set stream: true to receive the response as a stream of server-sent events (SSE). This lets you display tokens to the user as they are generated.
curl https://api.universal-ai.dev/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a short joke."}],
"stream": true
}'Each SSE event contains a JSON chunk:
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Why"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" did"},"finish_reason":null}]}
data: [DONE]The stream ends with data: [DONE].
Streaming with Python
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.universal-ai.dev/v1"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Tell me a short joke."}],
stream=True
)
for chunk in stream:
content = chunk.choices[0].delta.content
if content:
print(content, end="", flush=True)Streaming with JavaScript
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "YOUR_API_KEY",
baseURL: "https://api.universal-ai.dev/v1",
});
const stream = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Tell me a short joke." }],
stream: true,
});
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content;
if (content) process.stdout.write(content);
}Model Selection
You can specify any supported model using the model parameter. Model IDs use the format provider/model-name:
| Model ID | Provider | Description |
|---|---|---|
gpt-4o | OpenAI | GPT-4o (also accessible as openai/gpt-4o) |
anthropic/claude-sonnet-4-20250514 | Anthropic | Claude Sonnet 4 |
google/gemini-2.0-flash | Gemini 2.0 Flash | |
cf/llama-3.3-70b | Cloudflare | Llama 3.3 70B on Workers AI |
mistral/mistral-large-latest | Mistral | Mistral Large |
groq/llama-3.3-70b | Groq | Llama 3.3 70B on Groq |
If you omit the model parameter, the smart routing engine automatically selects the best model for your request.
See the Models page for the full list of available models.
Error Handling
The API returns standard HTTP error codes with a JSON error body:
{
"error": {
"type": "invalid_request_error",
"message": "The 'messages' parameter is required.",
"param": "messages"
}
}| Status Code | Meaning |
|---|---|
400 | Invalid request (missing or malformed parameters) |
401 | Invalid or missing API key |
429 | Rate limit exceeded |
500 | Internal server error |
503 | Upstream provider unavailable |