Chat Completions

Overview

The chat completions endpoint generates text responses given a conversation history. It follows the OpenAI chat completions format exactly, so any code written for the OpenAI API works without modification.

POST https://api.universal-ai.dev/v1/chat/completions

Request

Headers

Header	Required	Description
`Authorization`	Yes	`Bearer YOUR_API_KEY`
`Content-Type`	Yes	`application/json`

Body Parameters

Parameter	Type	Required	Description
`model`	string	No	Model ID to use (e.g., `gpt-4o`, `anthropic/claude-sonnet-4-20250514`, `cf/llama-3.3-70b`). If omitted, the smart router selects a model automatically.
`messages`	array	Yes	Array of message objects representing the conversation.
`stream`	boolean	No	If `true`, responses are streamed as server-sent events. Default: `false`.
`temperature`	number	No	Sampling temperature between 0 and 2. Higher values produce more random output. Default: `1`.
`max_tokens`	integer	No	Maximum number of tokens to generate.
`top_p`	number	No	Nucleus sampling parameter between 0 and 1. Default: `1`.
`frequency_penalty`	number	No	Penalizes tokens based on their frequency in the output so far. Range: -2.0 to 2.0. Default: `0`.
`presence_penalty`	number	No	Penalizes tokens based on whether they appear in the output so far. Range: -2.0 to 2.0. Default: `0`.
`stop`	string or array	No	Up to 4 sequences where the model stops generating.
`n`	integer	No	Number of completions to generate. Default: `1`.

Message Object

Each message in the messages array has the following structure:

Field	Type	Description
`role`	string	One of `system`, `user`, or `assistant`.
`content`	string	The text content of the message.

Example Request

curl https://api.universal-ai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a helpful coding assistant."},
      {"role": "user", "content": "Write a Python function to reverse a string."}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here's a Python function to reverse a string:\n\n```python\ndef reverse_string(s: str) -> str:\n    return s[::-1]\n```"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 38,
    "total_tokens": 63
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique identifier for the completion.
`object`	string	Always `chat.completion`.
`created`	integer	Unix timestamp of when the completion was created.
`model`	string	The model that generated the response.
`choices`	array	Array of completion choices.
`choices[].index`	integer	Index of the choice.
`choices[].message`	object	The generated message (with `role` and `content`).
`choices[].finish_reason`	string	Why generation stopped: `stop`, `length`, or `content_filter`.
`usage`	object	Token usage statistics for the request.

Streaming

Set stream: true to receive the response as a stream of server-sent events (SSE). This lets you display tokens to the user as they are generated.

curl https://api.universal-ai.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a short joke."}],
    "stream": true
  }'

Each SSE event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Why"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":" did"},"finish_reason":null}]}

data: [DONE]

The stream ends with data: [DONE].

Streaming with Python

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.universal-ai.dev/v1"
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Tell me a short joke."}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

Streaming with JavaScript

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.universal-ai.dev/v1",
});

const stream = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Tell me a short joke." }],
  stream: true,
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) process.stdout.write(content);
}

Model Selection

You can specify any supported model using the model parameter. Model IDs use the format provider/model-name:

Model ID	Provider	Description
`gpt-4o`	OpenAI	GPT-4o (also accessible as `openai/gpt-4o`)
`anthropic/claude-sonnet-4-20250514`	Anthropic	Claude Sonnet 4
`google/gemini-2.0-flash`	Google	Gemini 2.0 Flash
`cf/llama-3.3-70b`	Cloudflare	Llama 3.3 70B on Workers AI
`mistral/mistral-large-latest`	Mistral	Mistral Large
`groq/llama-3.3-70b`	Groq	Llama 3.3 70B on Groq

If you omit the model parameter, the smart routing engine automatically selects the best model for your request.

See the Models page for the full list of available models.

Error Handling

The API returns standard HTTP error codes with a JSON error body:

{
  "error": {
    "type": "invalid_request_error",
    "message": "The 'messages' parameter is required.",
    "param": "messages"
  }
}

Status Code	Meaning
`400`	Invalid request (missing or malformed parameters)
`401`	Invalid or missing API key
`429`	Rate limit exceeded
`500`	Internal server error
`503`	Upstream provider unavailable

On this page