IBM Granite 4.1

IBM's refreshed open-weights enterprise family — three dense decoder-only sizes, Apache 2.0, trained on ~15T tokens with progressive annealing toward technical/scientific/mathematical data plus instruction-following. The 8B instruct claims to match the prior Granite 4.0 32B-A9B MoE flagship on IBM's own benchmarks; cross-vendor comparison (vs Qwen/Gemma/Mistral) is unverified at time of publication.

License: Apache 2.0 · Context: 128K default; 512K via late-training context-extension stage · Released: April 29, 2026

The decision in five lines

The call: Consider — runnable locally, family reference
Best for: Local evaluation and family reference
Runs on: 23 hardware picks fit (cheapest: Intel Arc B580 12 GB · $249)
Watch out: Frontier reasoning or coding-agent workflows where Qwen3-Coder-30B-A3B or GLM-5.1 are the established daily drivers.
Evidence: Estimated · last verified June 2026

3B: PARAMETERS
DENSE: TYPE
128K: CONTEXT
~2.1 GB (3B) / ~5.3 GB (8B) / ~17 GB (30B) — Ollama Q4_K_M: VRAM AT Q4

Where we recommend this

This model isn’t currently in an active planner slot. See the runner notes below if you’re running it anyway.

The call

IBM's refreshed open-weights enterprise family — three dense decoder-only sizes, Apache 2.0, trained on ~15T tokens with progressive annealing toward technical/scientific/mathematical data plus instruction-following. The 8B instruct claims to match the prior Granite 4.0 32B-A9B MoE flagship on IBM's own benchmarks; cross-vendor comparison (vs Qwen/Gemma/Mistral) is unverified at time of publication.
When not to use: Frontier reasoning or coding-agent workflows where Qwen3-Coder-30B-A3B or GLM-5.1 are the established daily drivers. Granite 4.1 is positioned as enterprise-deploy-friendly (clean Apache 2.0, IBM support story, traceable training data) rather than benchmark-chasing.

Runner notes

Ollama tags live same-day: `granite4.1:3b`, `granite4.1:8b`, `granite4.1:30b`. Default tags ship at 128K context — 512K requires the extended-context training-stage variants from `huggingface.co/ibm-granite`. Q4_K_M is the Ollama default; Q8_0 available via `:8b-q8_0` etc. HuggingFace hosting at `ibm-granite/granite-4.1-*-instruct`.

License: Apache 2.0
Released: April 29, 2026
Maker: IBM Research
Model card: huggingface.co/collections/ibm-granite/granite-41-language-models →

Hardware that fits

Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.

Next step

Find-by-model — see what hardware runs this→