Chat Completions

Generate text responses using the OpenAI-compatible chat completions endpoint.

Overview

The chat completions endpoint generates text responses given a conversation history. It follows the OpenAI chat completions format exactly, so any code written for the OpenAI API works without modification.

POST https://api.universal-ai.dev/v1/chat/completions

Request

Headers

HeaderRequiredDescription
AuthorizationYesBearer YOUR_API_KEY
Content-TypeYesapplication/json

Body Parameters

ParameterTypeRequiredDescription
modelstringNoModel ID to use (e.g., gpt-4o, anthropic/claude-sonnet-4-20250514, cf/llama-3.3-70b). If omitted, the smart router selects a model automatically.
messagesarrayYesArray of message objects representing the conversation.
streambooleanNoIf true, responses are streamed as server-sent events. Default: false.
temperaturenumberNoSampling temperature between 0 and 2. Higher values produce more random output. Default: 1.
max_tokensintegerNoMaximum number of tokens to generate.
top_pnumberNoNucleus sampling parameter between 0 and 1. Default: 1.
frequency_penaltynumberNoPenalizes tokens based on their frequency in the output so far. Range: -2.0 to 2.0. Default: 0.
presence_penaltynumberNoPenalizes tokens based on whether they appear in the output so far. Range: -2.0 to 2.0. Default: 0.
stopstring or arrayNoUp to 4 sequences where the model stops generating.
nintegerNoNumber of completions to generate. Default: 1.

Message Object

Each message in the messages array has the following structure:

FieldTypeDescription
rolestringOne of system, user, or assistant.
contentstringThe text content of the message.

Example Request

curl https://api.universal-ai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python function to reverse a string."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a Python function to reverse a string:\n\n```python\ndef reverse_string(s: str) -> str:\n    return s[::-1]\n```"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 38,
    "total_tokens": 63
  }
}

Response Fields

FieldTypeDescription
idstringUnique identifier for the completion.
objectstringAlways chat.completion.
createdintegerUnix timestamp of when the completion was created.
modelstringThe model that generated the response.
choicesarrayArray of completion choices.
choices[].indexintegerIndex of the choice.
choices[].messageobjectThe generated message (with role and content).
choices[].finish_reasonstringWhy generation stopped: stop, length, or content_filter.
usageobjectToken usage statistics for the request.

Streaming

Set stream: true to receive the response as a stream of server-sent events (SSE). This lets you display tokens to the user as they are generated.

curl https://api.universal-ai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a short joke."}],
    "stream": true
  }'

Each SSE event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Why"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" did"},"finish_reason":null}]}

data: [DONE]

The stream ends with data: [DONE].

Streaming with Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.universal-ai.dev/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a short joke."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Streaming with JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.universal-ai.dev/v1",
});

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a short joke." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Model Selection

You can specify any supported model using the model parameter. Model IDs use the format provider/model-name:

Model IDProviderDescription
gpt-4oOpenAIGPT-4o (also accessible as openai/gpt-4o)
anthropic/claude-sonnet-4-20250514AnthropicClaude Sonnet 4
google/gemini-2.0-flashGoogleGemini 2.0 Flash
cf/llama-3.3-70bCloudflareLlama 3.3 70B on Workers AI
mistral/mistral-large-latestMistralMistral Large
groq/llama-3.3-70bGroqLlama 3.3 70B on Groq

If you omit the model parameter, the smart routing engine automatically selects the best model for your request.

See the Models page for the full list of available models.

Error Handling

The API returns standard HTTP error codes with a JSON error body:

{
  "error": {
    "type": "invalid_request_error",
    "message": "The 'messages' parameter is required.",
    "param": "messages"
  }
}
Status CodeMeaning
400Invalid request (missing or malformed parameters)
401Invalid or missing API key
429Rate limit exceeded
500Internal server error
503Upstream provider unavailable