the AI bench
VERIFIED JUNE 2026
All models

MODEL · GOOGLE · 31B DENSE / 26B TOTAL + 3.8B ACTIVE (MOE) / 12B DENSE

Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal)

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".

License: Apache 2.0 (moved off Gemma Terms) · Context: 256K · Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)

The decision in five lines

The call
Buy — for chat
Best for
chat · docs
Runs on
23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out
Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context.
Evidence
Estimated · last verified June 2026

31B dense
PARAMETERS
DENSE + MOE
TYPE
256K
CONTEXT
~18 GB (31B dense) / ~15 GB (26B MoE) / ~8 GB (12B dense)
VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CHAT · TOP
Gemma 4 31B DenseGoogle April 2 2026 release; Arena top 5, 256K context, vision+audio native; Apache 2.0.
CHAT · HIGH
Gemma 4 26B MoE (3.8B active)Open Arena top 10 at 3.8B active compute; calm and fast.
DOCS · TOP
Gemma 4 31B (256K context)256K context with vision+audio; calmer long-context behaviour than the 35B-A3B MoE on dense retrieval prompts.
DOCS · HIGH
Gemma 4 31B (256K context)31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.

The call

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".

When not to use: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context. For 6 GB and under, Qwen 3.5 4B fits better.

Runner notes

Ollama tags `gemma4:31b`, `gemma4:26b`, and `gemma4:12b`. Ollama may lag on the audio modality path — use llama.cpp head for full multimodal. The 12B is encoder-free (vision/audio flow straight into the backbone) and ships MTP drafters for lower decode latency. MoE routing overhead can hurt vLLM concurrency vs dense equivalents under heavy batching.

License
Apache 2.0 (moved off Gemma Terms)
Released
April 2, 2026 (31B/26B); June 3, 2026 (12B)
Maker
Google

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this