Chat Completions API

The Chat Completions API is TokenFlux’s primary interface for multi-turn conversations. TokenFlux accepts canonical model IDs, resolves them to the best upstream provider, and streams responses using the same wire format as the OpenAI Chat Completions API.
All request and response payloads follow openai.ChatCompletion semantics. You can point any OpenAI-compatible SDK at the TokenFlux base URL and continue to use familiar request shapes, including tools, images, and reasoning tokens.

Create chat completion

Endpoint

POST /v1/chat/completions

Authentication

Include an API credential using either header. API keys are validated before the request is proxied to a provider.
Authorization: Bearer <tokenflux_api_key>
or
X-Api-Key: <tokenflux_api_key>
Requests from users whose remaining quota is below 0.01 credits are rejected with 403 Forbidden and an insufficient quota error.

Request body

TokenFlux accepts the same JSON schema as OpenAI’s chat.completions.create. The most important fields are:
FieldTypeRequiredDescription
modelstringYesCanonical model identifier from the Models API. Aliases like gpt-4.1 are accepted but internally resolved to a canonical ID.
messagesarrayYesOrdered conversation history. Each message uses OpenAI’s role/content schema and can include multimodal content.
streambooleanNoWhen true, TokenFlux upgrades the response to an SSE stream. Defaults to false.
temperature, top_p, max_tokens, stop, response_format, seed, frequency_penalty, presence_penalty, logprobs, top_logprobsvariousNoForwarded to the upstream provider when the selected model advertises support (see supported_parameters in the Models API).
tools, tool_choicearray|objectNoFunction/tool calling definitions and invocation controls.
modalities, audio, vision, reasoning, metadataobjectNoAdvanced options for multimodal and reasoning-capable models (passed through when supported).
userstringNoApplication-defined user identifier for auditing.

Message content

Message payloads accept the same typed segments as OpenAI, including structured arrays such as:
{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe the attached image." },
    { "type": "image_url", "image_url": { "url": "https://example.com/street.png" } }
  ]
}

Response (non-streaming)

Non-streaming responses are identical to OpenAI’s chat.completion object. TokenFlux injects the provider’s server-side model ID into the model field (for example gpt-4o) and appends aggregate usage data before returning the payload.
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1734470400,
  "model": "gpt-4o",
  "system_fingerprint": "fp_4oA1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Response (streaming)

When stream: true, TokenFlux returns Server-Sent Events (Content-Type: text/event-stream). Each data: line contains an OpenAI chat.completion.chunk JSON payload. TokenFlux automatically requests usage information from providers, so the final chunk includes a usage object alongside the delta, followed by data: [DONE].
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470400,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470401,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470402,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]
Errors during streaming are emitted as SSE messages in the form data: {"error":{"message":"..."}} before the stream closes.

Error handling

StatusWhen it occursBody
400 Bad RequestModel is unknown, request schema fails to validate, or the upstream provider rejects the payload.Plain-text error string for non-streaming requests.
403 ForbiddenAccount quota is exhausted.JSON error body with message insufficient quota.
500 Internal Server ErrorAll providers return retriable errors or the upstream service fails.Plain-text error string for non-streaming requests; streaming responses emit an SSE error chunk before closing.

Examples

cURL (non-streaming)

curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Summarize the key points from this meeting."}
    ],
    "max_tokens": 200
  }'

cURL (streaming)

curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Api-Key: YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "anthropic/claude-3.5-sonnet",
    "messages": [
      {"role": "user", "content": "Draft a launch announcement for our new feature."}
    ],
    "stream": true
  }' \
  --no-buffer

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const completion = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [
    { role: 'system', content: 'You write concise summaries.' },
    { role: 'user', content: 'Summarize the highlights from the attached PDF.' }
  ],
  max_tokens: 150
});

console.log(completion.choices[0].message.content);

Streaming in Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const stream = await client.chat.completions.create({
  model: 'deepseek/deepseek-r1',
  messages: [{ role: 'user', content: 'Solve the puzzle step by step.' }],
  stream: true
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta?.reasoning) {
    process.stdout.write(`\n[reasoning] ${delta.reasoning}\n`);
  }
  if (delta?.content) {
    process.stdout.write(delta.content);
  }
  if (chunk.usage) {
    console.log('\nUsage:', chunk.usage);
  }
}

Operational notes

  • TokenFlux retries alternate providers transparently when an upstream returns a retriable error, up to three attempts per request. You receive the first successful response without extra coordination.
  • Usage for both streaming and non-streaming completions is persisted immediately after completion so that billing dashboards and quota enforcement stay in sync.
  • Pair this API with List models to dynamically select models and their supported parameters at runtime.