Drise

Models

The GLM 5.2 model variants available on Drise, their context windows, reasoning modes, and quantisation.

Drise exposes five GLM 5.2 model variants, all FP8-quantised for cheaper, faster inference with the same output quality.

Model variants

IDContextReasoningNotes
drise-glm-5.21,000,000 tokensyesFull GLM 5.2 with reasoning. Best for complex coding and deep work.
drise-glm-5.2-fast1,000,000 tokensnoSame model, reasoning disabled. Lowest latency, lower token consumption.
drise-glm-5.2-short200,000 tokensyesReasoning retained, smaller context window. Optimised for short, focused tasks.
drise-glm-5.2-short-fast200,000 tokensnoReasoning disabled, smaller context. Fastest variant.
drise-vision--GLM 5.2 with vision capabilities. Send images plus text.

Quantisation

All five variants are FP8-quantised. That keeps token costs and latency low without meaningful quality loss. You do not have to do anything to enable FP8 - every plan ships with it.

Choosing a model

  • Default to drise-glm-5.2 for general coding work that benefits from reasoning.
  • Switch to drise-glm-5.2-fast for high-throughput bulk tasks where you do not need the chain-of-thought.
  • Use drise-glm-5.2-short and drise-glm-5.2-short-fast for short, focused passes where a million-token context is not needed.
  • Send images and text together with drise-vision.

Using a model

Pass the model ID in the model field of any OpenAI-compatible request:

curl https://platform.drise.ai/v1/chat/completions \
  -H "Authorization: Bearer $DRISE_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "drise-glm-5.2",
    "messages": [{"role": "user", "content": "write a Python script that lists files recursively"}]
  }'

See the API Reference for the full request shape, and Pricing for the plan that fits your throughput.

On this page