Q: What quantization of Qwen 2.5 72B should I use on a Apple M1?

For 16 GB VRAM on the Apple M1, the Q5_K_M variant is the best fit. Estimated ~1 tokens/sec on the Q5_K_M quantization.

Q: How fast does Qwen 2.5 72B run on Apple M1?

Roughly 1 tokens/sec for Q5_K_M. Real speed depends on context length, backend (Ollama, llama.cpp, LM Studio), and KV cache size.

Q: What if Apple M1 is not enough for Qwen 2.5 72B?

Consider upgrading to Apple M1 Max (64 GB VRAM) which fits the recommended 58 GB target. Or pick a smaller quantization to stay on your current card.

Question 1

Can I run Qwen 2.5 72B on a Apple M1?

Accepted Answer

Sort of — Apple M1 can run Qwen 2.5 72B (Q5_K_M) only by spilling layers to RAM. Generation will be slow. CPU + GPU hybrid — not enough VRAM (16 GB < 52 GB min), but 64 GB RAM is sufficient. Expect significantly slower inference.

Question 2

What quantization of Qwen 2.5 72B should I use on a Apple M1?

Accepted Answer

For 16 GB VRAM on the Apple M1, the Q5_K_M variant is the best fit. Estimated ~1 tokens/sec on the Q5_K_M quantization.

Question 3

How fast does Qwen 2.5 72B run on Apple M1?

Accepted Answer

Roughly 1 tokens/sec for Q5_K_M. Real speed depends on context length, backend (Ollama, llama.cpp, LM Studio), and KV cache size.

Question 4

What if Apple M1 is not enough for Qwen 2.5 72B?

Accepted Answer

Consider upgrading to Apple M1 Max (64 GB VRAM) which fits the recommended 58 GB target. Or pick a smaller quantization to stay on your current card.

Quantization	File Size	Min VRAM	Rec VRAM	Context	Verdict	Estimated tok/s
Q2_K	27 GB	29 GB	36 GB	8K / 128K	Hybrid CPU+GPU	~1
Q3_K_M	35 GB	37 GB	44 GB	8K / 128K	Hybrid CPU+GPU	~1
Q4_K_M	42 GB	44 GB	48 GB	8K / 128K	Hybrid CPU+GPU	~1
Q5_K_MBest fit	50 GB	52 GB	58 GB	8K / 128K	Hybrid CPU+GPU	~1

Can I Run Qwen 2.5 72B on Apple M1?

Share this matchup

Every Qwen 2.5 72B quantization on Apple M1

Upgrade options that fit Qwen 2.5 72B better