Chat Completions API

The Chat Completions API is TokenFlux’s primary interface for multi-turn conversations. TokenFlux accepts canonical model IDs, resolves them to the best upstream provider, and streams responses using the same wire format as the OpenAI Chat Completions API.

All request and response payloads follow openai.ChatCompletion semantics. You can point any OpenAI-compatible SDK at the TokenFlux base URL and continue to use familiar request shapes, including tools, images, and reasoning tokens.

Create chat completion

Endpoint

POST /v1/chat/completions

Authentication

Include an API credential using either header. API keys are validated before the request is proxied to a provider.

Authorization: Bearer <tokenflux_api_key>

X-Api-Key: <tokenflux_api_key>

Requests from users whose remaining quota is below 0.01 credits are rejected with 403 Forbidden and an insufficient quota error.

Request body

TokenFlux accepts the same JSON schema as OpenAI’s chat.completions.create. The most important fields are:

Field	Type	Required	Description
`model`	string	Yes	Canonical model identifier from the Models API. Aliases like `gpt-4.1` are accepted but internally resolved to a canonical ID.
`messages`	array	Yes	Ordered conversation history. Each message uses OpenAI’s `role`/`content` schema and can include multimodal content.
`stream`	boolean	No	When `true`, TokenFlux upgrades the response to an SSE stream. Defaults to `false`.
`temperature`, `top_p`, `max_tokens`, `stop`, `response_format`, `seed`, `frequency_penalty`, `presence_penalty`, `logprobs`, `top_logprobs`	various	No	Forwarded to the upstream provider when the selected model advertises support (see `supported_parameters` in the Models API).
`tools`, `tool_choice`	array\|object	No	Function/tool calling definitions and invocation controls.
`modalities`, `audio`, `vision`, `reasoning`, `metadata`	object	No	Advanced options for multimodal and reasoning-capable models (passed through when supported).
`user`	string	No	Application-defined user identifier for auditing.

Message content

Message payloads accept the same typed segments as OpenAI, including structured arrays such as:

{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe the attached image." },
    { "type": "image_url", "image_url": { "url": "https://example.com/street.png" } }
  ]
}

Response (non-streaming)

Non-streaming responses are identical to OpenAI’s chat.completion object. TokenFlux injects the provider’s server-side model ID into the model field (for example gpt-4o) and appends aggregate usage data before returning the payload.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1734470400,
  "model": "gpt-4o",
  "system_fingerprint": "fp_4oA1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}

Response (streaming)

When stream: true, TokenFlux returns Server-Sent Events (Content-Type: text/event-stream). Each data: line contains an OpenAI chat.completion.chunk JSON payload. TokenFlux automatically requests usage information from providers, so the final chunk includes a usage object alongside the delta, followed by data: [DONE].

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470400,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470401,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470402,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]

Errors during streaming are emitted as SSE messages in the form data: {"error":{"message":"..."}} before the stream closes.

Error handling

Status	When it occurs	Body
`400 Bad Request`	Model is unknown, request schema fails to validate, or the upstream provider rejects the payload.	Plain-text error string for non-streaming requests.
`403 Forbidden`	Account quota is exhausted.	JSON error body with message `insufficient quota`.
`500 Internal Server Error`	All providers return retriable errors or the upstream service fails.	Plain-text error string for non-streaming requests; streaming responses emit an SSE error chunk before closing.

Examples

cURL (non-streaming)

curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Summarize the key points from this meeting."}
    ],
    "max_tokens": 200
  }'

cURL (streaming)

curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Api-Key: YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "anthropic/claude-3.5-sonnet",
    "messages": [
      {"role": "user", "content": "Draft a launch announcement for our new feature."}
    ],
    "stream": true
  }' \
  --no-buffer

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const completion = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [
    { role: 'system', content: 'You write concise summaries.' },
    { role: 'user', content: 'Summarize the highlights from the attached PDF.' }
  ],
  max_tokens: 150
});

console.log(completion.choices[0].message.content);

Streaming in Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const stream = await client.chat.completions.create({
  model: 'deepseek/deepseek-r1',
  messages: [{ role: 'user', content: 'Solve the puzzle step by step.' }],
  stream: true
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta?.reasoning) {
    process.stdout.write(`\n[reasoning] ${delta.reasoning}\n`);
  }
  if (delta?.content) {
    process.stdout.write(delta.content);
  }
  if (chunk.usage) {
    console.log('\nUsage:', chunk.usage);
  }
}

Operational notes

TokenFlux retries alternate providers transparently when an upstream returns a retriable error, up to three attempts per request. You receive the first successful response without extra coordination.
Usage for both streaming and non-streaming completions is persisted immediately after completion so that billing dashboards and quota enforcement stay in sync.
Pair this API with List models to dynamically select models and their supported parameters at runtime.

LLM Endpoints

Image Generation

Chat Completions API

Chat Completions API

Create chat completion

Endpoint

Authentication

Request body

Message content

Response (non-streaming)

Response (streaming)

Error handling

Examples

cURL (non-streaming)

cURL (streaming)

JavaScript (OpenAI SDK)

Streaming in Node.js

Operational notes

LLM Endpoints

Image Generation

​Chat Completions API

​Create chat completion

​Endpoint

​Authentication

​Request body

​Message content

​Response (non-streaming)

​Response (streaming)

​Error handling

​Examples

​cURL (non-streaming)

​cURL (streaming)

​JavaScript (OpenAI SDK)

​Streaming in Node.js

​Operational notes

Chat Completions API

Create chat completion

Endpoint

Authentication

Request body

Message content

Response (non-streaming)

Response (streaming)

Error handling

Examples

cURL (non-streaming)

cURL (streaming)

JavaScript (OpenAI SDK)

Streaming in Node.js

Operational notes