Models
The GLM 5.2 model variants available on Drise, their context windows, reasoning modes, and quantisation.
Drise exposes five GLM 5.2 model variants, all FP8-quantised for cheaper, faster inference with the same output quality.
Model variants
| ID | Context | Reasoning | Notes |
|---|---|---|---|
drise-glm-5.2 | 1,000,000 tokens | yes | Full GLM 5.2 with reasoning. Best for complex coding and deep work. |
drise-glm-5.2-fast | 1,000,000 tokens | no | Same model, reasoning disabled. Lowest latency, lower token consumption. |
drise-glm-5.2-short | 200,000 tokens | yes | Reasoning retained, smaller context window. Optimised for short, focused tasks. |
drise-glm-5.2-short-fast | 200,000 tokens | no | Reasoning disabled, smaller context. Fastest variant. |
drise-vision | - | - | GLM 5.2 with vision capabilities. Send images plus text. |
Quantisation
All five variants are FP8-quantised. That keeps token costs and latency low without meaningful quality loss. You do not have to do anything to enable FP8 - every plan ships with it.
Choosing a model
- Default to
drise-glm-5.2for general coding work that benefits from reasoning. - Switch to
drise-glm-5.2-fastfor high-throughput bulk tasks where you do not need the chain-of-thought. - Use
drise-glm-5.2-shortanddrise-glm-5.2-short-fastfor short, focused passes where a million-token context is not needed. - Send images and text together with
drise-vision.
Using a model
Pass the model ID in the model field of any OpenAI-compatible request:
curl https://platform.drise.ai/v1/chat/completions \
-H "Authorization: Bearer $DRISE_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "drise-glm-5.2",
"messages": [{"role": "user", "content": "write a Python script that lists files recursively"}]
}'See the API Reference for the full request shape, and Pricing for the plan that fits your throughput.