MODEL · GOOGLE · 31B DENSE / 26B TOTAL + 3.8B ACTIVE (MOE) / 12B DENSE
Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal)
Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".
License: Apache 2.0 (moved off Gemma Terms) · Context: 256K · Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)
The decision in five lines
- The call
- Buy — for chat
- Best for
- chat · docs
- Runs on
- 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
- Watch out
- Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context.
- Evidence
- Estimated
- 31B dense
- PARAMETERS
- DENSE + MOE
- TYPE
- 256K
- CONTEXT
- ~18 GB (31B dense) / ~15 GB (26B MoE) / ~8 GB (12B dense)
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".
When not to use: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context. For 6 GB and under, Qwen 3.5 4B fits better.
Runner notes
Ollama tags `gemma4:31b`, `gemma4:26b`, and `gemma4:12b`. Ollama may lag on the audio modality path — use llama.cpp head for full multimodal. The 12B is encoder-free (vision/audio flow straight into the backbone) and ships MTP drafters for lower decode latency. MoE routing overhead can hurt vLLM concurrency vs dense equivalents under heavy batching.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Intel Arc B580 12 GBGood · 1.2× 12 GB · $249–$299
- NVIDIA RTX 3060 12 GBGood · 1.2× 12 GB · $280–$400
- Minisforum UM890 ProPerfect · 2.5× 32 GB DDR5 (shared) · $463–$580 all-in
- RTX 5060 Ti 16 GBPerfect · 1.6× 16 GB · $560–$610
- AMD Radeon RX 9070 XTPerfect · 1.6× 16 GB · $649–$779
- Mac Mini M4 16 GBTight · 1.1× 16 GB unified · $799 (new floor) / $499–$599 (eBay/residuals)
- AMD Radeon RX 7900 XTXPerfect · 2.5× 24 GB · $810 used / ~$1,340 new
- NVIDIA RTX 3090 (used, single)Perfect · 2.5× 24 GB · $950–$1,200
- NVIDIA RTX 5070 TiPerfect · 1.6× 16 GB · $980–$1,300
- NVIDIA RTX 5080Perfect · 1.6× 16 GB · $999–$1,400
- MacBook Air M5 24 GBPerfect · 1.6× 24 GB unified · $1,299–$1,699
- Mac Mini M4 Pro 24 GBPerfect · 1.6× 24 GB unified · $1,399
- Dual RTX 3090 (used)Perfect · 4.9× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 8.8× 128 GB unified · $1,999–$2,851
- NVIDIA RTX 4090Perfect · 2.5× 24 GB · $2,200–$2,800
- M5 Pro MacBook Pro 48 GBPerfect · 3.3× 48 GB unified · $2,599–$3,099
- Mac Studio M4 Max 64 GBPerfect · 4.4× 64 GB unified · $3,199
- NVIDIA RTX 5090Perfect · 3.3× 32 GB · $3,500–$4,300
- NVIDIA RTX A6000 (48 GB, used)Perfect · 4.9× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 6.6× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBPerfect · 4.4× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 8.8× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 6.6× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→