Embeddings API
Use the Embeddings API to transform text into dense vector representations for semantic search, clustering, reranking, and Retrieval-Augmented Generation (RAG) workflows. TokenFlux forwards requests to the configured provider for the requested model and normalizes the response so that every embedding arrives as an array of floats.
This endpoint is wire-compatible with OpenAI’s /v1/embeddings
, so you can reuse existing SDKs by swapping the base URL.
Create embeddings
Endpoint
Authentication
Send an API key using either header format. Quota is checked before the upstream provider is contacted; requests are rejected with 403 Forbidden
when the remaining balance is below 0.01
credits.
Authorization: Bearer <tokenflux_api_key>
or
X-Api-Key: <tokenflux_api_key>
Request body
Field | Type | Required | Description |
---|
model | string | Yes | Canonical embedding model identifier from the Models API. |
input | string|array | Yes | Single string, array of strings, or array of token arrays to embed. TokenFlux automatically counts tokens to populate usage statistics. |
encoding_format | string | No | "float" (default) or "base64" . Base64 responses are decoded to float arrays before returning to the client for consistency. |
dimensions | integer | No | Requested dimensionality for providers that support adjustable vector sizes. Values outside the supported list are rejected by the upstream provider. |
user | string | No | End-user identifier for auditing and rate-limiting. |
- Single string: Generates one embedding.
- Array of strings: Generates one embedding per entry.
- Array of token arrays: Forward raw token IDs. TokenFlux falls back to a character-count heuristic if the tokenizer is unavailable.
Response
TokenFlux returns an OpenAI-style embedding response with normalized float vectors and usage accounting. Errors are serialized by the global error handler into { "success": false, "code": <status>, "message": "..." }
.
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
0.0023064255,
-0.009327292,
0.018224157,
0.00456132
]
}
],
"model": "openai/text-embedding-3-small",
"usage": {
"prompt_tokens": 8,
"total_tokens": 8
}
}
Usage object
TokenFlux records embedding usage even when the upstream provider omits token counts by estimating tokens locally. prompt_tokens
and total_tokens
always reflect the tokenized length of input
so that your dashboards and quotas remain accurate.
Error handling
Status | When it occurs | Body |
---|
400 Bad Request | Invalid JSON payload, unsupported encoding_format , or provider validation error. | { "success": false, "code": 400, "message": "failed to bind request: ..." } |
403 Forbidden | User quota is exhausted. | { "success": false, "code": 403, "message": "user quota exceeded" } |
500 Internal Server Error | Provider returned an error or TokenFlux failed to save usage. | { "success": false, "code": 500, "message": "internal error, please try again later" } |
Examples
cURL
curl https://tokenflux.ai/v1/embeddings \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_TOKENFLUX_KEY' \
-d '{
"model": "openai/text-embedding-3-small",
"input": "The quick brown fox jumps over the lazy dog"
}'
Python
import requests
BASE_URL = "https://tokenflux.ai/v1"
API_KEY = "YOUR_TOKENFLUX_KEY"
payload = {
"model": "openai/text-embedding-3-small",
"input": [
"The quick brown fox jumps over the lazy dog",
"A similar sentence for comparison"
]
}
resp = requests.post(
f"{BASE_URL}/embeddings",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json"
},
json=payload,
timeout=30
)
resp.raise_for_status()
embeddings = resp.json()["data"]
print(f"Generated {len(embeddings)} embeddings")
JavaScript (OpenAI SDK)
import OpenAI from 'openai';
const client = new OpenAI({
apiKey: process.env.TOKENFLUX_KEY,
baseURL: 'https://tokenflux.ai/v1'
});
const result = await client.embeddings.create({
model: 'qwen/text-embedding-v4',
input: '向量检索有什么用途?',
dimensions: 768
});
console.log(result.data[0].embedding.length); // -> 768
Operational notes
- Base64 responses are decoded server-side so that every client receives consistent float arrays regardless of provider quirks.
- TokenFlux saves usage immediately after the upstream response is received, ensuring accurate quotas and billing dashboards.
- Use the Models API to determine which embedding models support adjustable dimensions and what token pricing applies.