Embeddings API

Use the Embeddings API to transform text into dense vector representations for semantic search, clustering, reranking, and Retrieval-Augmented Generation (RAG) workflows. TokenFlux forwards requests to the configured provider for the requested model and normalizes the response so that every embedding arrives as an array of floats.
This endpoint is wire-compatible with OpenAI’s /v1/embeddings, so you can reuse existing SDKs by swapping the base URL.

Create embeddings

Endpoint

POST /v1/embeddings

Authentication

Send an API key using either header format. Quota is checked before the upstream provider is contacted; requests are rejected with 403 Forbidden when the remaining balance is below 0.01 credits.
Authorization: Bearer <tokenflux_api_key>
or
X-Api-Key: <tokenflux_api_key>

Request body

FieldTypeRequiredDescription
modelstringYesCanonical embedding model identifier from the Models API.
inputstring|arrayYesSingle string, array of strings, or array of token arrays to embed. TokenFlux automatically counts tokens to populate usage statistics.
encoding_formatstringNo"float" (default) or "base64". Base64 responses are decoded to float arrays before returning to the client for consistency.
dimensionsintegerNoRequested dimensionality for providers that support adjustable vector sizes. Values outside the supported list are rejected by the upstream provider.
userstringNoEnd-user identifier for auditing and rate-limiting.

Input formats

  • Single string: Generates one embedding.
  • Array of strings: Generates one embedding per entry.
  • Array of token arrays: Forward raw token IDs. TokenFlux falls back to a character-count heuristic if the tokenizer is unavailable.

Response

TokenFlux returns an OpenAI-style embedding response with normalized float vectors and usage accounting. Errors are serialized by the global error handler into { "success": false, "code": <status>, "message": "..." }.
{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.018224157,
        0.00456132
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 8,
    "total_tokens": 8
  }
}

Usage object

TokenFlux records embedding usage even when the upstream provider omits token counts by estimating tokens locally. prompt_tokens and total_tokens always reflect the tokenized length of input so that your dashboards and quotas remain accurate.

Error handling

StatusWhen it occursBody
400 Bad RequestInvalid JSON payload, unsupported encoding_format, or provider validation error.{ "success": false, "code": 400, "message": "failed to bind request: ..." }
403 ForbiddenUser quota is exhausted.{ "success": false, "code": 403, "message": "user quota exceeded" }
500 Internal Server ErrorProvider returned an error or TokenFlux failed to save usage.{ "success": false, "code": 500, "message": "internal error, please try again later" }

Examples

cURL

curl https://tokenflux.ai/v1/embeddings \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog"
  }'

Python

import requests

BASE_URL = "https://tokenflux.ai/v1"
API_KEY = "YOUR_TOKENFLUX_KEY"

payload = {
    "model": "openai/text-embedding-3-small",
    "input": [
        "The quick brown fox jumps over the lazy dog",
        "A similar sentence for comparison"
    ]
}

resp = requests.post(
    f"{BASE_URL}/embeddings",
    headers={
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json"
    },
    json=payload,
    timeout=30
)
resp.raise_for_status()
embeddings = resp.json()["data"]
print(f"Generated {len(embeddings)} embeddings")

JavaScript (OpenAI SDK)

import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: process.env.TOKENFLUX_KEY,
  baseURL: 'https://tokenflux.ai/v1'
});

const result = await client.embeddings.create({
  model: 'qwen/text-embedding-v4',
  input: '向量检索有什么用途?',
  dimensions: 768
});

console.log(result.data[0].embedding.length); // -> 768

Operational notes

  • Base64 responses are decoded server-side so that every client receives consistent float arrays regardless of provider quirks.
  • TokenFlux saves usage immediately after the upstream response is received, ensuring accurate quotas and billing dashboards.
  • Use the Models API to determine which embedding models support adjustable dimensions and what token pricing applies.