MODEL · META · 70B DENSE
Llama 3.3 70B Instruct
The community-standard 70B dense for local. Reliable, well-supported across llama.cpp / vLLM / TensorRT-LLM / MLX, and the proven daily driver for 48 GB+ discrete or 96 GB+ unified. Qwen 3.5 has no 70B dense (jumps from 27B to 122B-A10B), so Llama 3.3 still owns this slot in 2026.
License: Llama 3.3 Community License (custom — not Apache; commercial OK with attribution + 700M MAU cap) · Context: 128K · Released: December 6, 2024
The decision in five lines
- The call
- Skip for local — for coding
- Best for
- coding · chat · docs · agents
- Runs on
- 8 hardware picks fit (cheapest: Dual RTX 3090 (used) · $1,800)
- Watch out
- Tight VRAM budgets — at Q4 it needs ~46 GB total with KV at 32K context.
- Evidence
- Estimated
- 70B dense
- PARAMETERS
- DENSE
- TYPE
- 128K
- CONTEXT
- ~40 GB (Q4_K_M) / ~70 GB (BF16 on DGX Spark)
- VRAM AT Q4
Where we recommend this
Every tier slot in the planner where this model is a top or alternate pick. Pulled live from planner.js — when the planner refreshes, this table stays current.
The call
The community-standard 70B dense for local. Reliable, well-supported across llama.cpp / vLLM / TensorRT-LLM / MLX, and the proven daily driver for 48 GB+ discrete or 96 GB+ unified. Qwen 3.5 has no 70B dense (jumps from 27B to 122B-A10B), so Llama 3.3 still owns this slot in 2026.
When not to use: Tight VRAM budgets — at Q4 it needs ~46 GB total with KV at 32K context. RTX 5090 32 GB cannot fit Q4 (would need IQ2 quality compromise). Also, multimodal — Llama 3.3 is text-only.
Runner notes
Ollama tag `llama3.3:70b` (Q4_K_M default). On DGX Spark 128 GB, BF16 fits without quantization. M5 Max 128 GB: ~22 tok/s at Q4. M4 Max 96 GB: 8-15 tok/s with sysctl wired-memory tweak. AMD ROCm path: HIP build of llama.cpp is reliable at this size.
Hardware that fits
Every hardware pick whose memory fits this model at the quant we recommend. Sorted cheapest-first — the top row is your best-value fit. Click through for the full buyer’s guide.
- Dual RTX 3090 (used)Tight · 1.1× 48 GB · $1,800–$2,500 all-in
- Framework Desktop (Ryzen AI Max+ 395)Perfect · 2.0× 128 GB unified · $1,999–$2,851
- Mac Studio M4 Max 64 GBRequires tweak · 1.3× 64 GB unified · $3,199
- NVIDIA RTX A6000 (48 GB, used)Tight · 1.1× 48 GB ECC · $3,500–$4,500
- Mac Studio M3 Ultra 96 GBPerfect · 1.5× 96 GB unified · $3,999
- M5 Max MacBook Pro 64 GBRequires tweak · 1.3× 64 GB unified · $4,499
- NVIDIA DGX SparkPerfect · 2.0× 128 GB unified · $4,699
- Dual RTX 5090Perfect · 1.5× 64 GB (2×32) · $8,500–$10,500
Next step
Find-by-model — see what hardware runs this→