Models API

The Models API returns the canonical catalog of language and embedding models that TokenFlux can route to across all configured providers. Use this endpoint to discover model capabilities, pricing, supported parameters, and provider provenance before making chat or embedding requests.

GET /v1/models mirrors the OpenAI Models API surface, but it returns TokenFlux-specific metadata (pricing, architecture, canonical IDs). The same response is also available at GET /models for compatibility with older clients.

List models

Endpoint

GET /v1/models

Authentication

No authentication is required. The catalog is publicly accessible so that you can inspect pricing and capabilities before generating traffic.

Query parameters

This endpoint does not accept query parameters.

Response structure

The response body is a JSON object with two properties:

Field	Type	Description
`object`	string	Always `"list"`.
`data`	array	Array of model objects.

Model object

Each item in data is a canonical Model description that TokenFlux uses for routing and billing. The properties are:

Field	Type	Description
`id`	string	Canonical identifier to use in TokenFlux API calls (for example `openai/gpt-4o`).
`canonical_slug`	string	Stable slug that uniquely names the model. Usually matches `id`.
`hugging_face_id`	string\|null	Hugging Face model reference when available.
`name`	string	Human-friendly display name.
`type`	string	Model family, such as `chat` or `embedding`.
`created`	integer	Unix timestamp (seconds) when the metadata was published. Models without a published timestamp return `0`.
`description`	string	Rich Markdown description of the model’s capabilities.
`context_length`	integer	Maximum prompt length in tokens supported by the model.
`architecture`	object	Input/output modality information (see below).
`pricing`	object	Token and request pricing metadata (see below).
`supported_parameters`	array	Names of request parameters that the upstream provider supports. Empty if the provider does not publish the list.
`model_provider`	string	Name of the upstream provider (for example `openai`, `anthropic`, `qwen`).
`dimensions`	array	(Embedding models only) Allowed output dimensionalities.
`max_dimension`	integer	(Embedding models only) Maximum output dimension. Omitted for chat models.

Architecture object

Field	Type	Description
`modality`	string	Combined view of inputs and outputs (for example `text+image->text`).
`input_modalities`	array	Accepted input channels such as `text`, `image`, or `file`.
`output_modalities`	array	Output channels that the model can emit.
`tokenizer`	string	Tokenizer family used by the provider.
`instruct_type`	string\|null	Optional provider-specific instruction tuning type.

Pricing object

Pricing values are strings so that high-precision rates can be represented exactly. Interpret them as “price per unit tokens” in the provider’s currency. TokenFlux converts currencies (for example CNY) to USD internally when tracking usage. The fields are:

Field	Type	Description
`prompt`	string	Cost for prompt tokens.
`completion`	string	Cost for generated tokens (chat models).
`input_cache_read`	string	Price for cache hits when a provider exposes prompt caching.
`input_cache_write`	string	Price to store prompts in the provider’s cache.
`request`	string	Flat per-request charge when applicable.
`image`	string	Additional charge for multimodal inputs (vision-enabled chat models).
`web_search`	string	Price for provider-hosted search augmentation.
`internal_reasoning`	string	Provider-specific charge for reasoning tokens.
`unit`	integer	Token quantum the prices apply to (for example 1 token or 1,000,000 tokens). Defaults to `1`.
`currency`	string	Currency code the provider bills in, such as `USD` or `CNY`.

Aliases and routing

TokenFlux resolves convenient aliases like gpt-4.1 or claude-sonnet-4 to their canonical identifiers before contacting upstream providers. Always send the canonical id returned by this endpoint in new integrations. Responses from chat completions echo the provider’s server-side model ID (for example gpt-4o), which may omit the vendor prefix for some providers.

Example

{
  "object": "list",
  "data": [
    {
      "id": "openai/gpt-4o",
      "canonical_slug": "openai/gpt-4o",
      "hugging_face_id": "",
      "name": "OpenAI: GPT-4o",
      "type": "chat",
      "created": 1715558400,
      "description": "GPT-4o (\"o\" for \"omni\") is OpenAI's latest AI model, supporting both text and image inputs with text outputs...",
      "context_length": 128000,
      "architecture": {
        "modality": "text+image->text",
        "input_modalities": ["text", "image", "file"],
        "output_modalities": ["text"],
        "tokenizer": "GPT",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000025",
        "completion": "0.00001",
        "input_cache_read": "0.00000125",
        "input_cache_write": "",
        "request": "0",
        "image": "0.003613",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1,
        "currency": "USD",
        "volumes": []
      },
      "supported_parameters": [
        "frequency_penalty",
        "logit_bias",
        "logprobs",
        "max_tokens",
        "presence_penalty",
        "response_format",
        "seed",
        "stop",
        "structured_outputs",
        "temperature",
        "tool_choice",
        "tools",
        "top_logprobs",
        "top_p",
        "web_search_options"
      ],
      "model_provider": "openai"
    },
    {
      "id": "qwen/text-embedding-v4",
      "canonical_slug": "qwen/text-embedding-v4",
      "hugging_face_id": null,
      "name": "Qwen: Text Embedding v4",
      "type": "embedding",
      "created": 0,
      "description": "The Qwen3 Embedding model series is the latest proprietary model ...",
      "context_length": 8192,
      "architecture": {
        "modality": "text->text",
        "input_modalities": ["text"],
        "output_modalities": ["text"],
        "tokenizer": "Qwen",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.5",
        "completion": "",
        "input_cache_read": "",
        "input_cache_write": "",
        "request": "0",
        "image": "0",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1000000,
        "currency": "CNY",
        "volumes": []
      },
      "supported_parameters": [],
      "model_provider": "qwen",
      "dimensions": [64, 128, 256, 512, 768, 1024, 1536, 2048]
    }
  ]
}

Usage tips

The array is sorted lexicographically by id for deterministic paging in client SDKs.
Cache responses for up to 24 hours—TokenFlux keeps the model list fresh on an hourly cadence using an in-memory cache.
Use supported_parameters to tailor request bodies to each provider. Parameters not listed there are silently ignored by many vendors, so skipping unsupported options avoids confusing results.

LLM Endpoints

Image Generation

Models API

Models API

List models

Endpoint

Authentication

Query parameters

Response structure

Model object

Architecture object

Pricing object

Aliases and routing

Example

Usage tips

LLM Endpoints

Image Generation

​Models API

​List models

​Endpoint

​Authentication

​Query parameters

​Response structure

​Model object

Architecture object

Pricing object

​Aliases and routing

​Example

​Usage tips

Models API

List models

Endpoint

Authentication

Query parameters

Response structure

Model object

Aliases and routing

Example

Usage tips