Qwen 3.5 122B-A10B

The 128 GB-tier daily driver — 122B parameters with only 10B active per token, native multimodal (vision + text), and a community-measured 60.6 tok/s on M5 Max 128 GB at 4-bit MLX. The realistic flagship for buyers of M5 Max 128 GB MBPs, DGX Spark, and Framework Desktop.

License: Apache 2.0 · Context: 262K native, extendable to ~1M via YaRN · Released: February 16, 2026

The decision in five lines

The call: Skip for local — for coding
Best for: coding · chat · docs · agents
Runs on: 4 hardware picks fit (cheapest: Framework Desktop (Ryzen AI Max+ 395) · $1,999)
Watch out: Anything under 96 GB unified or 80 GB discrete — at 4-bit MLX it needs ~70 GB on disk plus headroom.
Evidence: Estimated · last verified April 2026

122B total: PARAMETERS
MOE: TYPE
262K: CONTEXT
~70 GB (4-bit MLX) / ~62 GB (Q4_K_M GGUF): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CODING ·

Qwen 3.5 122B-A10B (4-bit MLX, multimodal)60.6 tok/s calibrated on M5 Max 128 GB; native multimodal; Apache 2.0. Mac 96 GB needs sysctl wired-memory tweak; M5 Max 128 GB and DGX Spark run it without.

CHAT ·

Qwen 3.5 122B-A10B (4-bit MLX, multimodal)Native multimodal — handles text + image + video input. 10B active, 60.6 tok/s calibrated on M5 Max 128 GB. Apache 2.0.

DOCS ·

Qwen 3.5 122B-A10B (262K native, extensible to 1M via YaRN)Biggest reliable-context model for long-doc synthesis at this tier. 10B active keeps 262K responsive. Multimodal for diagrams + tables. Apache 2.0.

AGENTS ·

Qwen 3.5 122B-A10B (multimodal agent, 262K)Multimodal + 10B active for tool-use latency. 262K context for long agentic loops. Apache 2.0.

The call

The 128 GB-tier daily driver — 122B parameters with only 10B active per token, native multimodal (vision + text), and a community-measured 60.6 tok/s on M5 Max 128 GB at 4-bit MLX. The realistic flagship for buyers of M5 Max 128 GB MBPs, DGX Spark, and Framework Desktop.
When not to use: Anything under 96 GB unified or 80 GB discrete — at 4-bit MLX it needs ~70 GB on disk plus headroom. For 24-32 GB rigs use the 35B-A3B sibling, which has the same architecture style at 5× smaller footprint.

Runner notes

`mlx-community/Qwen3.5-122B-A10B-4bit` (69.6 GB) is the canonical Mac path. GGUF builds via `unsloth/Qwen3.5-122B-A10B-Instruct-GGUF` for llama.cpp. Vision-enabled variants (`spicyneuron/Qwen3.5-122B-A10B-MLX-vision-4.7-bit`) require an MLX vision server. Native Ollama tag may take 1-2 weeks to land post-release.

License: Apache 2.0
Released: February 16, 2026
Maker: Alibaba
Model card: huggingface.co/Qwen/Qwen3.5-122B-A10B →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→