Gemma 4 (31B dense + 26B A4B MoE + 12B multimodal)

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".

License: Apache 2.0 (moved off Gemma Terms) · Context: 256K · Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)

The decision in five lines

The call: Buy — for chat
Best for: chat · docs
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context.
Evidence: Estimated · last verified June 2026

31B dense: PARAMETERS
DENSE + MOE: TYPE
256K: CONTEXT
~18 GB (31B dense) / ~15 GB (26B MoE) / ~8 GB (12B dense): VRAM AT Q4

Where we recommend this

Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.

CHAT · TOP

Gemma 4 31B DenseGoogle April 2 2026 release; Arena top 5, 256K context, vision+audio native; Apache 2.0.

CHAT · HIGH

Gemma 4 26B MoE (3.8B active)Open Arena top 10 at 3.8B active compute; calm and fast.

DOCS · TOP

Gemma 4 31B (256K context)256K context with vision+audio; calmer long-context behaviour than the 35B-A3B MoE on dense retrieval prompts.

DOCS · HIGH

Gemma 4 31B (256K context)31B dense with 256K context; Gemma commercial-permissive terms; Arena top 5.

The call

Google's April 2026 refresh — Arena top 5 in its first week, 256K context native, vision + audio multimodal, and the move to Apache 2.0 from the custom Gemma Terms. On June 3, 2026 Google added Gemma 4 12B: a ~12B dense, encoder-free unified multimodal variant (text + image + audio + video in) that runs locally on 16 GB of VRAM or unified memory while nearing the 26B MoE on benchmarks. The 12B is the laptop-tier multimodal pick; the 31B dense is the current Apache-2.0 "best dense under 70B".
When not to use: Tight VRAM budgets under 8 GB — the 12B variant still wants ~8 GB at Q4 before context. For 6 GB and under, Qwen 3.5 4B fits better.

Runner notes

Ollama tags `gemma4:31b`, `gemma4:26b`, and `gemma4:12b`. Ollama may lag on the audio modality path — use llama.cpp head for full multimodal. The 12B is encoder-free (vision/audio flow straight into the backbone) and ships MTP drafters for lower decode latency. MoE routing overhead can hurt vLLM concurrency vs dense equivalents under heavy batching.

License: Apache 2.0 (moved off Gemma Terms)
Released: April 2, 2026 (31B/26B); June 3, 2026 (12B)
Maker: Google
Model card: huggingface.co/google/gemma-4-31B-it →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→