> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tokenflux.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Models API

> Discover canonical model metadata served by TokenFlux

# Models API

The Models API returns the canonical catalog of language and embedding models that TokenFlux can route to across all configured providers. Use this endpoint to discover model capabilities, pricing, supported parameters, and provider provenance before making chat or embedding requests.

<Info>
  `GET /v1/models` mirrors the OpenAI Models API surface, but it returns TokenFlux-specific metadata (pricing, architecture, canonical IDs). The same response is also available at `GET /models` for compatibility with older clients.
</Info>

## List models

### Endpoint

```http theme={null}
GET /v1/models
```

### Authentication

No authentication is required. The catalog is publicly accessible so that you can inspect pricing and capabilities before generating traffic.

### Query parameters

This endpoint does not accept query parameters.

### Response structure

The response body is a JSON object with two properties:

| Field    | Type   | Description             |
| -------- | ------ | ----------------------- |
| `object` | string | Always `"list"`.        |
| `data`   | array  | Array of model objects. |

#### Model object

Each item in `data` is a canonical `Model` description that TokenFlux uses for routing and billing. The properties are:

| Field                  | Type         | Description                                                                                                       |
| ---------------------- | ------------ | ----------------------------------------------------------------------------------------------------------------- |
| `id`                   | string       | Canonical identifier to use in TokenFlux API calls (for example `openai/gpt-4o`).                                 |
| `canonical_slug`       | string       | Stable slug that uniquely names the model. Usually matches `id`.                                                  |
| `hugging_face_id`      | string\|null | Hugging Face model reference when available.                                                                      |
| `name`                 | string       | Human-friendly display name.                                                                                      |
| `type`                 | string       | Model family, such as `chat` or `embedding`.                                                                      |
| `created`              | integer      | Unix timestamp (seconds) when the metadata was published. Models without a published timestamp return `0`.        |
| `description`          | string       | Rich Markdown description of the model’s capabilities.                                                            |
| `context_length`       | integer      | Maximum prompt length in tokens supported by the model.                                                           |
| `architecture`         | object       | Input/output modality information (see below).                                                                    |
| `pricing`              | object       | Token and request pricing metadata (see below).                                                                   |
| `supported_parameters` | array        | Names of request parameters that the upstream provider supports. Empty if the provider does not publish the list. |
| `model_provider`       | string       | Name of the upstream provider (for example `openai`, `anthropic`, `qwen`).                                        |
| `dimensions`           | array        | (Embedding models only) Allowed output dimensionalities.                                                          |
| `max_dimension`        | integer      | (Embedding models only) Maximum output dimension. Omitted for chat models.                                        |

##### Architecture object

| Field               | Type         | Description                                                           |
| ------------------- | ------------ | --------------------------------------------------------------------- |
| `modality`          | string       | Combined view of inputs and outputs (for example `text+image->text`). |
| `input_modalities`  | array        | Accepted input channels such as `text`, `image`, or `file`.           |
| `output_modalities` | array        | Output channels that the model can emit.                              |
| `tokenizer`         | string       | Tokenizer family used by the provider.                                |
| `instruct_type`     | string\|null | Optional provider-specific instruction tuning type.                   |

##### Pricing object

Pricing values are strings so that high-precision rates can be represented exactly. Interpret them as “price per `unit` tokens” in the provider’s `currency`. TokenFlux converts currencies (for example CNY) to USD internally when tracking usage. The fields are:

| Field                | Type    | Description                                                                                   |
| -------------------- | ------- | --------------------------------------------------------------------------------------------- |
| `prompt`             | string  | Cost for prompt tokens.                                                                       |
| `completion`         | string  | Cost for generated tokens (chat models).                                                      |
| `input_cache_read`   | string  | Price for cache hits when a provider exposes prompt caching.                                  |
| `input_cache_write`  | string  | Price to store prompts in the provider’s cache.                                               |
| `request`            | string  | Flat per-request charge when applicable.                                                      |
| `image`              | string  | Additional charge for multimodal inputs (vision-enabled chat models).                         |
| `web_search`         | string  | Price for provider-hosted search augmentation.                                                |
| `internal_reasoning` | string  | Provider-specific charge for reasoning tokens.                                                |
| `unit`               | integer | Token quantum the prices apply to (for example 1 token or 1,000,000 tokens). Defaults to `1`. |
| `currency`           | string  | Currency code the provider bills in, such as `USD` or `CNY`.                                  |

### Aliases and routing

TokenFlux resolves convenient aliases like `gpt-4.1` or `claude-sonnet-4` to their canonical identifiers before contacting upstream providers. Always send the canonical `id` returned by this endpoint in new integrations. Responses from chat completions echo the provider’s server-side model ID (for example `gpt-4o`), which may omit the vendor prefix for some providers.

### Example

```json theme={null}
{
  "object": "list",
  "data": [
    {
      "id": "openai/gpt-4o",
      "canonical_slug": "openai/gpt-4o",
      "hugging_face_id": "",
      "name": "OpenAI: GPT-4o",
      "type": "chat",
      "created": 1715558400,
      "description": "GPT-4o (\"o\" for \"omni\") is OpenAI's latest AI model, supporting both text and image inputs with text outputs...",
      "context_length": 128000,
      "architecture": {
        "modality": "text+image->text",
        "input_modalities": ["text", "image", "file"],
        "output_modalities": ["text"],
        "tokenizer": "GPT",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.0000025",
        "completion": "0.00001",
        "input_cache_read": "0.00000125",
        "input_cache_write": "",
        "request": "0",
        "image": "0.003613",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1,
        "currency": "USD",
        "volumes": []
      },
      "supported_parameters": [
        "frequency_penalty",
        "logit_bias",
        "logprobs",
        "max_tokens",
        "presence_penalty",
        "response_format",
        "seed",
        "stop",
        "structured_outputs",
        "temperature",
        "tool_choice",
        "tools",
        "top_logprobs",
        "top_p",
        "web_search_options"
      ],
      "model_provider": "openai"
    },
    {
      "id": "qwen/text-embedding-v4",
      "canonical_slug": "qwen/text-embedding-v4",
      "hugging_face_id": null,
      "name": "Qwen: Text Embedding v4",
      "type": "embedding",
      "created": 0,
      "description": "The Qwen3 Embedding model series is the latest proprietary model ...",
      "context_length": 8192,
      "architecture": {
        "modality": "text->text",
        "input_modalities": ["text"],
        "output_modalities": ["text"],
        "tokenizer": "Qwen",
        "instruct_type": null
      },
      "pricing": {
        "prompt": "0.5",
        "completion": "",
        "input_cache_read": "",
        "input_cache_write": "",
        "request": "0",
        "image": "0",
        "web_search": "0",
        "internal_reasoning": "0",
        "unit": 1000000,
        "currency": "CNY",
        "volumes": []
      },
      "supported_parameters": [],
      "model_provider": "qwen",
      "dimensions": [64, 128, 256, 512, 768, 1024, 1536, 2048]
    }
  ]
}
```

### Usage tips

* The array is sorted lexicographically by `id` for deterministic paging in client SDKs.
* Cache responses for up to 24 hours—TokenFlux keeps the model list fresh on an hourly cadence using an in-memory cache.
* Use `supported_parameters` to tailor request bodies to each provider. Parameters not listed there are silently ignored by many vendors, so skipping unsupported options avoids confusing results.
