Models API

The Models API returns the canonical catalog of language and embedding models that TokenFlux can route to across all configured providers. Use this endpoint to discover model capabilities, pricing, supported parameters, and provider provenance before making chat or embedding requests.
GET /v1/models mirrors the OpenAI Models API surface, but it returns TokenFlux-specific metadata (pricing, architecture, canonical IDs). The same response is also available at GET /models for compatibility with older clients.

List models

Endpoint

GET /v1/models

Authentication

No authentication is required. The catalog is publicly accessible so that you can inspect pricing and capabilities before generating traffic.

Query parameters

This endpoint does not accept query parameters.

Response structure

The response body is a JSON object with two properties:
FieldTypeDescription
objectstringAlways "list".
dataarrayArray of model objects.

Model object

Each item in data is a canonical Model description that TokenFlux uses for routing and billing. The properties are:
FieldTypeDescription
idstringCanonical identifier to use in TokenFlux API calls (for example openai/gpt-4o).
canonical_slugstringStable slug that uniquely names the model. Usually matches id.
hugging_face_idstring|nullHugging Face model reference when available.
namestringHuman-friendly display name.
typestringModel family, such as chat or embedding.
createdintegerUnix timestamp (seconds) when the metadata was published. Models without a published timestamp return 0.
descriptionstringRich Markdown description of the model’s capabilities.
context_lengthintegerMaximum prompt length in tokens supported by the model.
architectureobjectInput/output modality information (see below).
pricingobjectToken and request pricing metadata (see below).
supported_parametersarrayNames of request parameters that the upstream provider supports. Empty if the provider does not publish the list.
model_providerstringName of the upstream provider (for example openai, anthropic, qwen).
dimensionsarray(Embedding models only) Allowed output dimensionalities.
max_dimensioninteger(Embedding models only) Maximum output dimension. Omitted for chat models.
Architecture object
FieldTypeDescription
modalitystringCombined view of inputs and outputs (for example text+image->text).
input_modalitiesarrayAccepted input channels such as text, image, or file.
output_modalitiesarrayOutput channels that the model can emit.
tokenizerstringTokenizer family used by the provider.
instruct_typestring|nullOptional provider-specific instruction tuning type.
Pricing object
Pricing values are strings so that high-precision rates can be represented exactly. Interpret them as “price per unit tokens” in the provider’s currency. TokenFlux converts currencies (for example CNY) to USD internally when tracking usage. The fields are:
FieldTypeDescription
promptstringCost for prompt tokens.
completionstringCost for generated tokens (chat models).
input_cache_readstringPrice for cache hits when a provider exposes prompt caching.
input_cache_writestringPrice to store prompts in the provider’s cache.
requeststringFlat per-request charge when applicable.
imagestringAdditional charge for multimodal inputs (vision-enabled chat models).
web_searchstringPrice for provider-hosted search augmentation.
internal_reasoningstringProvider-specific charge for reasoning tokens.
unitintegerToken quantum the prices apply to (for example 1 token or 1,000,000 tokens). Defaults to 1.
currencystringCurrency code the provider bills in, such as USD or CNY.

Aliases and routing

TokenFlux resolves convenient aliases like gpt-4.1 or claude-sonnet-4 to their canonical identifiers before contacting upstream providers. Always send the canonical id returned by this endpoint in new integrations. Responses from chat completions echo the provider’s server-side model ID (for example gpt-4o), which may omit the vendor prefix for some providers.

Example

{
  "object": "list",
  "data": [
    {
      "id": "openai/gpt-4o",
      "canonical_slug": "openai/gpt-4o",
      "hugging_face_id": "",
      "name": "OpenAI: GPT-4o",
      "type": "chat",
      "created": 1715558400,
      "description": "GPT-4o (\"o\" for \"omni\") is OpenAI's latest AI model, supporting both text and image inputs with text outputs...",
      "context_length": 128000,
      "architecture": {
        "modality": "text+image->text",
        "input_modalities": ["text", "image", "file"],
        "output_modalities": ["text"],
        "tokenizer": "GPT",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000025",
        "completion": "0.00001",
        "input_cache_read": "0.00000125",
        "input_cache_write": "",
        "request": "0",
        "image": "0.003613",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1,
        "currency": "USD",
        "volumes": []
      },
      "supported_parameters": [
        "frequency_penalty",
        "logit_bias",
        "logprobs",
        "max_tokens",
        "presence_penalty",
        "response_format",
        "seed",
        "stop",
        "structured_outputs",
        "temperature",
        "tool_choice",
        "tools",
        "top_logprobs",
        "top_p",
        "web_search_options"
      ],
      "model_provider": "openai"
    },
    {
      "id": "qwen/text-embedding-v4",
      "canonical_slug": "qwen/text-embedding-v4",
      "hugging_face_id": null,
      "name": "Qwen: Text Embedding v4",
      "type": "embedding",
      "created": 0,
      "description": "The Qwen3 Embedding model series is the latest proprietary model ...",
      "context_length": 8192,
      "architecture": {
        "modality": "text->text",
        "input_modalities": ["text"],
        "output_modalities": ["text"],
        "tokenizer": "Qwen",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.5",
        "completion": "",
        "input_cache_read": "",
        "input_cache_write": "",
        "request": "0",
        "image": "0",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1000000,
        "currency": "CNY",
        "volumes": []
      },
      "supported_parameters": [],
      "model_provider": "qwen",
      "dimensions": [64, 128, 256, 512, 768, 1024, 1536, 2048]
    }
  ]
}

Usage tips

  • The array is sorted lexicographically by id for deterministic paging in client SDKs.
  • Cache responses for up to 24 hours—TokenFlux keeps the model list fresh on an hourly cadence using an in-memory cache.
  • Use supported_parameters to tailor request bodies to each provider. Parameters not listed there are silently ignored by many vendors, so skipping unsupported options avoids confusing results.