Q4_K_M
7 GBMin VRAM: 8 GB
Recommended VRAM: 9.1 GB
Min RAM: 11 GB
Context: 8K / 8K
Loading model details...
Fetching variants, compatibility details, and metadata.
Share Phi-4 Reasoning Plus 14B with someone who is deciding what to run locally.
Social proof
61% of 982 scanned PCs run Phi-4 Reasoning Plus 14B fully on GPU.
768 keep at least some work on GPU. Based on anonymous compatibility checks.
General-purpose local model brief
Best for
Consider alternatives if
Quantization tip: Benchmark at least two quantizations and validate with a task-specific eval set before production use.
New to local models? Smaller quantization variants are easier to run, while larger ones can improve quality at the cost of more memory.
Q4_K_M
7 GBMin VRAM: 8 GB
Recommended VRAM: 9.1 GB
Min RAM: 11 GB
Context: 8K / 8K
Q5_K_M
8.8 GBMin VRAM: 10.1 GB
Recommended VRAM: 11.4 GB
Min RAM: 14 GB
Context: 8K / 8K
Q8_0
14 GBMin VRAM: 16.1 GB
Recommended VRAM: 18.2 GB
Min RAM: 21 GB
Context: 8K / 8K
FP16
28 GBMin VRAM: 32.2 GB
Recommended VRAM: 36.4 GB
Min RAM: 42 GB
Context: 8K / 8K
| Quantization | File Size | Min VRAM | Recommended VRAM | Min RAM | Context |
|---|---|---|---|---|---|
| Q4_K_M | 7 GB | 8 GB | 9.1 GB | 11 GB | 8K / 8K |
| Q5_K_M | 8.8 GB | 10.1 GB | 11.4 GB | 14 GB | 8K / 8K |
| Q8_0 | 14 GB | 16.1 GB | 18.2 GB | 21 GB | 8K / 8K |
| FP16 | 28 GB | 32.2 GB | 36.4 GB | 42 GB | 8K / 8K |
These GPUs meet the recommended 9.1 GB VRAM for the Q4_K_M quantization. Estimated speeds are approximate and assume full GPU offloading.
Budget Pick
NVIDIA GeForce RTX 3080 10GB10 GB VRAM · ~86.9 tok/s
Lowest cost that meets recommended VRAM
Check price on AmazonFastest Pick
NVIDIA GeForce RTX 509032 GB VRAM · ~204.8 tok/s
Highest estimated throughput
Check price on AmazonBest Value
NVIDIA GeForce RTX 3080 Ti12 GB VRAM · ~104.2 tok/s
Best speed per dollar of VRAM
Check price on AmazonNeed a detailed comparison? See all GPU rankings for Phi-4 Reasoning Plus 14B.