> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenflux.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat Completions API

> OpenAI-compatible conversational completions with streaming and multi-provider routing

# Chat Completions API

The Chat Completions API is TokenFlux’s primary interface for multi-turn conversations. TokenFlux accepts canonical model IDs, resolves them to the best upstream provider, and streams responses using the same wire format as the OpenAI Chat Completions API.

<Info>
  All request and response payloads follow `openai.ChatCompletion` semantics. You can point any OpenAI-compatible SDK at the TokenFlux base URL and continue to use familiar request shapes, including tools, images, and reasoning tokens.
</Info>

## Create chat completion

### Endpoint

```http theme={null}
POST /v1/chat/completions
```

### Authentication

Include an API credential using either header. API keys are validated before the request is proxied to a provider.

```http theme={null}
Authorization: Bearer <tokenflux_api_key>
```

or

```http theme={null}
X-Api-Key: <tokenflux_api_key>
```

Requests from users whose remaining quota is below `0.01` credits are rejected with `403 Forbidden` and an `insufficient quota` error.

### Request body

TokenFlux accepts the same JSON schema as OpenAI’s `chat.completions.create`. The most important fields are:

| Field                                                                                                                                        | Type          | Required | Description                                                                                                                                |
| -------------------------------------------------------------------------------------------------------------------------------------------- | ------------- | -------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
| `model`                                                                                                                                      | string        | **Yes**  | Canonical model identifier from the [Models API](./models). Aliases like `gpt-4.1` are accepted but internally resolved to a canonical ID. |
| `messages`                                                                                                                                   | array         | **Yes**  | Ordered conversation history. Each message uses OpenAI’s `role`/`content` schema and can include multimodal content.                       |
| `stream`                                                                                                                                     | boolean       | No       | When `true`, TokenFlux upgrades the response to an SSE stream. Defaults to `false`.                                                        |
| `temperature`, `top_p`, `max_tokens`, `stop`, `response_format`, `seed`, `frequency_penalty`, `presence_penalty`, `logprobs`, `top_logprobs` | various       | No       | Forwarded to the upstream provider when the selected model advertises support (see `supported_parameters` in the Models API).              |
| `tools`, `tool_choice`                                                                                                                       | array\|object | No       | Function/tool calling definitions and invocation controls.                                                                                 |
| `modalities`, `audio`, `vision`, `reasoning`, `metadata`                                                                                     | object        | No       | Advanced options for multimodal and reasoning-capable models (passed through when supported).                                              |
| `user`                                                                                                                                       | string        | No       | Application-defined user identifier for auditing.                                                                                          |

#### Message content

Message payloads accept the same typed segments as OpenAI, including structured arrays such as:

```json theme={null}
{
  "role": "user",
  "content": [
    { "type": "text", "text": "Describe the attached image." },
    { "type": "image_url", "image_url": { "url": "https://example.com/street.png" } }
  ]
}
```

### Response (non-streaming)

Non-streaming responses are identical to OpenAI’s `chat.completion` object. TokenFlux injects the provider’s server-side model ID into the `model` field (for example `gpt-4o`) and appends aggregate usage data before returning the payload.

```json theme={null}
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1734470400,
  "model": "gpt-4o",
  "system_fingerprint": "fp_4oA1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop",
      "logprobs": null
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 8,
    "total_tokens": 20
  }
}
```

### Response (streaming)

When `stream: true`, TokenFlux returns Server-Sent Events (`Content-Type: text/event-stream`). Each `data:` line contains an OpenAI `chat.completion.chunk` JSON payload. TokenFlux automatically requests usage information from providers, so the final chunk includes a `usage` object alongside the delta, followed by `data: [DONE]`.

```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470400,"model":"gpt-4o","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470401,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1734470402,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":8,"total_tokens":20}}

data: [DONE]
```

Errors during streaming are emitted as SSE messages in the form `data: {"error":{"message":"..."}}` before the stream closes.

### Error handling

| Status                      | When it occurs                                                                                    | Body                                                                                                            |
| --------------------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
| `400 Bad Request`           | Model is unknown, request schema fails to validate, or the upstream provider rejects the payload. | Plain-text error string for non-streaming requests.                                                             |
| `403 Forbidden`             | Account quota is exhausted.                                                                       | JSON error body with message `insufficient quota`.                                                              |
| `500 Internal Server Error` | All providers return retriable errors or the upstream service fails.                              | Plain-text error string for non-streaming requests; streaming responses emit an SSE error chunk before closing. |

### Examples

#### cURL (non-streaming)

```bash theme={null}
curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user", "content": "Summarize the key points from this meeting."}
    ],
    "max_tokens": 200
  }'
```

#### cURL (streaming)

```bash theme={null}
curl https://tokenflux.ai/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -H 'X-Api-Key: YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "anthropic/claude-3.5-sonnet",
    "messages": [
      {"role": "user", "content": "Draft a launch announcement for our new feature."}
    ],
    "stream": true
  }' \
  --no-buffer
```

#### JavaScript (OpenAI SDK)

```javascript theme={null}
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const completion = await client.chat.completions.create({
  model: 'openai/gpt-4o-mini',
  messages: [
    { role: 'system', content: 'You write concise summaries.' },
    { role: 'user', content: 'Summarize the highlights from the attached PDF.' }
  ],
  max_tokens: 150
});

console.log(completion.choices[0].message.content);
```

#### Streaming in Node.js

```javascript theme={null}
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const stream = await client.chat.completions.create({
  model: 'deepseek/deepseek-r1',
  messages: [{ role: 'user', content: 'Solve the puzzle step by step.' }],
  stream: true
});

for await (const chunk of stream) {
  const delta = chunk.choices[0].delta;
  if (delta?.reasoning) {
    process.stdout.write(`\n[reasoning] ${delta.reasoning}\n`);
  }
  if (delta?.content) {
    process.stdout.write(delta.content);
  }
  if (chunk.usage) {
    console.log('\nUsage:', chunk.usage);
  }
}
```

### Operational notes

* TokenFlux retries alternate providers transparently when an upstream returns a retriable error, up to three attempts per request. You receive the first successful response without extra coordination.
* Usage for both streaming and non-streaming completions is persisted immediately after completion so that billing dashboards and quota enforcement stay in sync.
* Pair this API with [List models](./models) to dynamically select models and their supported parameters at runtime.