Embeddings API

Use the Embeddings API to transform text into dense vector representations for semantic search, clustering, reranking, and Retrieval-Augmented Generation (RAG) workflows. TokenFlux forwards requests to the configured provider for the requested model and normalizes the response so that every embedding arrives as an array of floats.

This endpoint is wire-compatible with OpenAI’s /v1/embeddings, so you can reuse existing SDKs by swapping the base URL.

Create embeddings

Endpoint

POST /v1/embeddings

Authentication

Send an API key using either header format. Quota is checked before the upstream provider is contacted; requests are rejected with 403 Forbidden when the remaining balance is below 0.01 credits.

Authorization: Bearer <tokenflux_api_key>

X-Api-Key: <tokenflux_api_key>

Request body

Field	Type	Required	Description
`model`	string	Yes	Canonical embedding model identifier from the Models API.
`input`	string\|array	Yes	Single string, array of strings, or array of token arrays to embed. TokenFlux automatically counts tokens to populate usage statistics.
`encoding_format`	string	No	`"float"` (default) or `"base64"`. Base64 responses are decoded to float arrays before returning to the client for consistency.
`dimensions`	integer	No	Requested dimensionality for providers that support adjustable vector sizes. Values outside the supported list are rejected by the upstream provider.
`user`	string	No	End-user identifier for auditing and rate-limiting.

Input formats

Single string: Generates one embedding.
Array of strings: Generates one embedding per entry.
Array of token arrays: Forward raw token IDs. TokenFlux falls back to a character-count heuristic if the tokenizer is unavailable.

Response

TokenFlux returns an OpenAI-style embedding response with normalized float vectors and usage accounting. Errors are serialized by the global error handler into { "success": false, "code": <status>, "message": "..." }.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.018224157,
        0.00456132
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Usage object

TokenFlux records embedding usage even when the upstream provider omits token counts by estimating tokens locally. prompt_tokens and total_tokens always reflect the tokenized length of input so that your dashboards and quotas remain accurate.

Error handling

Status	When it occurs	Body
`400 Bad Request`	Invalid JSON payload, unsupported `encoding_format`, or provider validation error.	`{ "success": false, "code": 400, "message": "failed to bind request: ..." }`
`403 Forbidden`	User quota is exhausted.	`{ "success": false, "code": 403, "message": "user quota exceeded" }`
`500 Internal Server Error`	Provider returned an error or TokenFlux failed to save usage.	`{ "success": false, "code": 500, "message": "internal error, please try again later" }`

Examples

cURL

curl https://tokenflux.ai/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Python

import requests

BASE_URL = "https://tokenflux.ai/v1"
API_KEY = "YOUR_TOKENFLUX_KEY"

payload = {
    "model": "openai/text-embedding-3-small",
    "input": [
        "The quick brown fox jumps over the lazy dog",
        "A similar sentence for comparison"
    ]
}

resp = requests.post(
    f"{BASE_URL}/embeddings",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json=payload,
    timeout=30
)
resp.raise_for_status()
embeddings = resp.json()["data"]
print(f"Generated {len(embeddings)} embeddings")

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const result = await client.embeddings.create({
  model: 'qwen/text-embedding-v4',
  input: '向量检索有什么用途？',
  dimensions: 768
});

console.log(result.data[0].embedding.length); // -> 768

Operational notes

Base64 responses are decoded server-side so that every client receives consistent float arrays regardless of provider quirks.
TokenFlux saves usage immediately after the upstream response is received, ensuring accurate quotas and billing dashboards.
Use the Models API to determine which embedding models support adjustable dimensions and what token pricing applies.

LLM Endpoints

Image Generation

Embeddings API

Embeddings API

Create embeddings

Endpoint

Authentication

Request body

Input formats

Response

Usage object

Error handling

Examples

cURL

Python

JavaScript (OpenAI SDK)

Operational notes

LLM Endpoints

Image Generation

​Embeddings API

​Create embeddings

​Endpoint

​Authentication

​Request body

​Input formats

​Response

​Usage object

​Error handling

​Examples

​cURL

​Python

​JavaScript (OpenAI SDK)

​Operational notes

Embeddings API

Create embeddings

Endpoint

Authentication

Request body

Input formats

Response

Usage object

Error handling

Examples

cURL

Python

JavaScript (OpenAI SDK)

Operational notes