the AI bench
VERIFIED JUNE 2026
All models

MODEL · ALIBABA · 122B TOTAL / 10B ACTIVE

Qwen 3.5 122B-A10B

The 128 GB-tier daily driver — 122B parameters with only 10B active per token, native multimodal (vision + text), and a community-measured 60.6 tok/s on M5 Max 128 GB at 4-bit MLX. The realistic flagship for buyers of M5 Max 128 GB MBPs, DGX Spark, and Framework Desktop.

License: Apache 2.0 · Context: 262K native, extendable to ~1M via YaRN · Released: February 16, 2026

The decision in five lines

The call
Skip for local — for coding
Best for
coding · chat · docs · agents
Runs on
4 hardware picks fit (cheapest: Framework Desktop (Ryzen AI Max+ 395) · $1,999)
Watch out
Anything under 96 GB unified or 80 GB discrete — at 4-bit MLX it needs ~70 GB on disk plus headroom.
Evidence
Estimated · last verified April 2026

122B total
PARAMETERS
MOE
TYPE
262K
CONTEXT
~70 GB (4-bit MLX) / ~62 GB (Q4_K_M GGUF)
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING ·
Qwen 3.5 122B-A10B (4-bit MLX, multimodal)60.6 tok/s calibrated on M5 Max 128 GB; native multimodal; Apache 2.0. Mac 96 GB needs sysctl wired-memory tweak; M5 Max 128 GB and DGX Spark run it without.
CHAT ·
Qwen 3.5 122B-A10B (4-bit MLX, multimodal)Native multimodal — handles text + image + video input. 10B active, 60.6 tok/s calibrated on M5 Max 128 GB. Apache 2.0.
DOCS ·
Qwen 3.5 122B-A10B (262K native, extensible to 1M via YaRN)Biggest reliable-context model for long-doc synthesis at this tier. 10B active keeps 262K responsive. Multimodal for diagrams + tables. Apache 2.0.
AGENTS ·
Qwen 3.5 122B-A10B (multimodal agent, 262K)Multimodal + 10B active for tool-use latency. 262K context for long agentic loops. Apache 2.0.

The call

The 128 GB-tier daily driver — 122B parameters with only 10B active per token, native multimodal (vision + text), and a community-measured 60.6 tok/s on M5 Max 128 GB at 4-bit MLX. The realistic flagship for buyers of M5 Max 128 GB MBPs, DGX Spark, and Framework Desktop.

When not to use: Anything under 96 GB unified or 80 GB discrete — at 4-bit MLX it needs ~70 GB on disk plus headroom. For 24-32 GB rigs use the 35B-A3B sibling, which has the same architecture style at 5× smaller footprint.

Runner notes

`mlx-community/Qwen3.5-122B-A10B-4bit` (69.6 GB) is the canonical Mac path. GGUF builds via `unsloth/Qwen3.5-122B-A10B-Instruct-GGUF` for llama.cpp. Vision-enabled variants (`spicyneuron/Qwen3.5-122B-A10B-MLX-vision-4.7-bit`) require an MLX vision server. Native Ollama tag may take 1-2 weeks to land post-release.

License
Apache 2.0
Released
February 16, 2026
Maker
Alibaba

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this