Models

The GLM 5.2 model variants available on Drise, their context windows, reasoning modes, and quantisation.

Drise exposes five GLM 5.2 model variants, all FP8-quantised for cheaper, faster inference with the same output quality.

Model variants

ID	Context	Reasoning	Notes
`drise-glm-5.2`	1,000,000 tokens	yes	Full GLM 5.2 with reasoning. Best for complex coding and deep work.
`drise-glm-5.2-fast`	1,000,000 tokens	no	Same model, reasoning disabled. Lowest latency, lower token consumption.
`drise-glm-5.2-short`	200,000 tokens	yes	Reasoning retained, smaller context window. Optimised for short, focused tasks.
`drise-glm-5.2-short-fast`	200,000 tokens	no	Reasoning disabled, smaller context. Fastest variant.
`drise-vision`	-	-	GLM 5.2 with vision capabilities. Send images plus text.

Quantisation

All five variants are FP8-quantised. That keeps token costs and latency low without meaningful quality loss. You do not have to do anything to enable FP8 - every plan ships with it.

Choosing a model

Default to drise-glm-5.2 for general coding work that benefits from reasoning.
Switch to drise-glm-5.2-fast for high-throughput bulk tasks where you do not need the chain-of-thought.
Use drise-glm-5.2-short and drise-glm-5.2-short-fast for short, focused passes where a million-token context is not needed.
Send images and text together with drise-vision.

Using a model

Pass the model ID in the model field of any OpenAI-compatible request:

curl https://platform.drise.ai/v1/chat/completions \
  -H "Authorization: Bearer $DRISE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "drise-glm-5.2",
    "messages": [{"role": "user", "content": "write a Python script that lists files recursively"}]
  }'

See the API Reference for the full request shape, and Pricing for the plan that fits your throughput.

Model variants

Quantisation

Choosing a model

Using a model

On this page