Models API
The Models API returns the canonical catalog of language and embedding models that TokenFlux can route to across all configured providers. Use this endpoint to discover model capabilities, pricing, supported parameters, and provider provenance before making chat or embedding requests.GET /v1/models
mirrors the OpenAI Models API surface, but it returns TokenFlux-specific metadata (pricing, architecture, canonical IDs). The same response is also available at GET /models
for compatibility with older clients.List models
Endpoint
Authentication
No authentication is required. The catalog is publicly accessible so that you can inspect pricing and capabilities before generating traffic.Query parameters
This endpoint does not accept query parameters.Response structure
The response body is a JSON object with two properties:Field | Type | Description |
---|---|---|
object | string | Always "list" . |
data | array | Array of model objects. |
Model object
Each item indata
is a canonical Model
description that TokenFlux uses for routing and billing. The properties are:
Field | Type | Description |
---|---|---|
id | string | Canonical identifier to use in TokenFlux API calls (for example openai/gpt-4o ). |
canonical_slug | string | Stable slug that uniquely names the model. Usually matches id . |
hugging_face_id | string|null | Hugging Face model reference when available. |
name | string | Human-friendly display name. |
type | string | Model family, such as chat or embedding . |
created | integer | Unix timestamp (seconds) when the metadata was published. Models without a published timestamp return 0 . |
description | string | Rich Markdown description of the model’s capabilities. |
context_length | integer | Maximum prompt length in tokens supported by the model. |
architecture | object | Input/output modality information (see below). |
pricing | object | Token and request pricing metadata (see below). |
supported_parameters | array | Names of request parameters that the upstream provider supports. Empty if the provider does not publish the list. |
model_provider | string | Name of the upstream provider (for example openai , anthropic , qwen ). |
dimensions | array | (Embedding models only) Allowed output dimensionalities. |
max_dimension | integer | (Embedding models only) Maximum output dimension. Omitted for chat models. |
Architecture object
Field | Type | Description |
---|---|---|
modality | string | Combined view of inputs and outputs (for example text+image->text ). |
input_modalities | array | Accepted input channels such as text , image , or file . |
output_modalities | array | Output channels that the model can emit. |
tokenizer | string | Tokenizer family used by the provider. |
instruct_type | string|null | Optional provider-specific instruction tuning type. |
Pricing object
Pricing values are strings so that high-precision rates can be represented exactly. Interpret them as “price perunit
tokens” in the provider’s currency
. TokenFlux converts currencies (for example CNY) to USD internally when tracking usage. The fields are:
Field | Type | Description |
---|---|---|
prompt | string | Cost for prompt tokens. |
completion | string | Cost for generated tokens (chat models). |
input_cache_read | string | Price for cache hits when a provider exposes prompt caching. |
input_cache_write | string | Price to store prompts in the provider’s cache. |
request | string | Flat per-request charge when applicable. |
image | string | Additional charge for multimodal inputs (vision-enabled chat models). |
web_search | string | Price for provider-hosted search augmentation. |
internal_reasoning | string | Provider-specific charge for reasoning tokens. |
unit | integer | Token quantum the prices apply to (for example 1 token or 1,000,000 tokens). Defaults to 1 . |
currency | string | Currency code the provider bills in, such as USD or CNY . |
Aliases and routing
TokenFlux resolves convenient aliases likegpt-4.1
or claude-sonnet-4
to their canonical identifiers before contacting upstream providers. Always send the canonical id
returned by this endpoint in new integrations. Responses from chat completions echo the provider’s server-side model ID (for example gpt-4o
), which may omit the vendor prefix for some providers.
Example
Usage tips
- The array is sorted lexicographically by
id
for deterministic paging in client SDKs. - Cache responses for up to 24 hours—TokenFlux keeps the model list fresh on an hourly cadence using an in-memory cache.
- Use
supported_parameters
to tailor request bodies to each provider. Parameters not listed there are silently ignored by many vendors, so skipping unsupported options avoids confusing results.