> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenflux.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Embeddings API

> High-quality text embeddings with automatic token accounting

# Embeddings API

Use the Embeddings API to transform text into dense vector representations for semantic search, clustering, reranking, and Retrieval-Augmented Generation (RAG) workflows. TokenFlux forwards requests to the configured provider for the requested model and normalizes the response so that every embedding arrives as an array of floats.

<Info>
  This endpoint is wire-compatible with OpenAI’s `/v1/embeddings`, so you can reuse existing SDKs by swapping the base URL.
</Info>

## Create embeddings

### Endpoint

```http theme={null}
POST /v1/embeddings
```

### Authentication

Send an API key using either header format. Quota is checked before the upstream provider is contacted; requests are rejected with `403 Forbidden` when the remaining balance is below `0.01` credits.

```http theme={null}
Authorization: Bearer <tokenflux_api_key>
```

or

```http theme={null}
X-Api-Key: <tokenflux_api_key>
```

### Request body

| Field             | Type          | Required | Description                                                                                                                                           |
| ----------------- | ------------- | -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| `model`           | string        | **Yes**  | Canonical embedding model identifier from the [Models API](./models).                                                                                 |
| `input`           | string\|array | **Yes**  | Single string, array of strings, or array of token arrays to embed. TokenFlux automatically counts tokens to populate usage statistics.               |
| `encoding_format` | string        | No       | `"float"` (default) or `"base64"`. Base64 responses are decoded to float arrays before returning to the client for consistency.                       |
| `dimensions`      | integer       | No       | Requested dimensionality for providers that support adjustable vector sizes. Values outside the supported list are rejected by the upstream provider. |
| `user`            | string        | No       | End-user identifier for auditing and rate-limiting.                                                                                                   |

#### Input formats

* **Single string**: Generates one embedding.
* **Array of strings**: Generates one embedding per entry.
* **Array of token arrays**: Forward raw token IDs. TokenFlux falls back to a character-count heuristic if the tokenizer is unavailable.

### Response

TokenFlux returns an OpenAI-style embedding response with normalized float vectors and usage accounting. Errors are serialized by the global error handler into `{ "success": false, "code": <status>, "message": "..." }`.

```json theme={null}
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.018224157,
        0.00456132
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}
```

#### Usage object

TokenFlux records embedding usage even when the upstream provider omits token counts by estimating tokens locally. `prompt_tokens` and `total_tokens` always reflect the tokenized length of `input` so that your dashboards and quotas remain accurate.

### Error handling

| Status                      | When it occurs                                                                     | Body                                                                                     |
| --------------------------- | ---------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------- |
| `400 Bad Request`           | Invalid JSON payload, unsupported `encoding_format`, or provider validation error. | `{ "success": false, "code": 400, "message": "failed to bind request: ..." }`            |
| `403 Forbidden`             | User quota is exhausted.                                                           | `{ "success": false, "code": 403, "message": "user quota exceeded" }`                    |
| `500 Internal Server Error` | Provider returned an error or TokenFlux failed to save usage.                      | `{ "success": false, "code": 500, "message": "internal error, please try again later" }` |

### Examples

#### cURL

```bash theme={null}
curl https://tokenflux.ai/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'
```

#### Python

```python theme={null}
import requests

BASE_URL = "https://tokenflux.ai/v1"
API_KEY = "YOUR_TOKENFLUX_KEY"

payload = {
    "model": "openai/text-embedding-3-small",
    "input": [
        "The quick brown fox jumps over the lazy dog",
        "A similar sentence for comparison"
    ]
}

resp = requests.post(
    f"{BASE_URL}/embeddings",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json=payload,
    timeout=30
)
resp.raise_for_status()
embeddings = resp.json()["data"]
print(f"Generated {len(embeddings)} embeddings")
```

#### JavaScript (OpenAI SDK)

```javascript theme={null}
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const result = await client.embeddings.create({
  model: 'qwen/text-embedding-v4',
  input: '向量检索有什么用途？',
  dimensions: 768
});

console.log(result.data[0].embedding.length); // -> 768
```

### Operational notes

* Base64 responses are decoded server-side so that every client receives consistent float arrays regardless of provider quirks.
* TokenFlux saves usage immediately after the upstream response is received, ensuring accurate quotas and billing dashboards.
* Use the [Models API](./models) to determine which embedding models support adjustable dimensions and what token pricing applies.
