Q4_K_M
17.5 GBMin VRAM: 20.1 GB
Recommended VRAM: 22.8 GB
Min RAM: 27 GB
Context: 8K / 8K
Loading model details...
Fetching variants, compatibility details, and metadata.
Model Detail
Share Qwen3.5 35B A3B with someone who is deciding what to run locally.
Social proof
32% of 1,029 scanned PCs run Qwen3.5 35B A3B fully on GPU.
598 keep at least some work on GPU. Based on anonymous compatibility checks.
Best Qwen-family first pick for high-end local rigs
Best for
Consider alternatives if
Quantization tip: Treat 10 tok/s as the minimum comfort bar and reduce context before downgrading the model.
New to local models? Smaller quantization variants are easier to run, while larger ones can improve quality at the cost of more memory.
Q4_K_M
17.5 GBMin VRAM: 20.1 GB
Recommended VRAM: 22.8 GB
Min RAM: 27 GB
Context: 8K / 8K
Q5_K_M
21.9 GBMin VRAM: 25.2 GB
Recommended VRAM: 28.5 GB
Min RAM: 33 GB
Context: 8K / 8K
Q8_0
35 GBMin VRAM: 40.3 GB
Recommended VRAM: 45.5 GB
Min RAM: 53 GB
Context: 8K / 8K
FP16
70 GBMin VRAM: 80.5 GB
Recommended VRAM: 91 GB
Min RAM: 105 GB
Context: 8K / 8K
| Quantization | File Size | Min VRAM | Recommended VRAM | Min RAM | Context |
|---|---|---|---|---|---|
| Q4_K_M | 17.5 GB | 20.1 GB | 22.8 GB | 27 GB | 8K / 8K |
| Q5_K_M | 21.9 GB | 25.2 GB | 28.5 GB | 33 GB | 8K / 8K |
| Q8_0 | 35 GB | 40.3 GB | 45.5 GB | 53 GB | 8K / 8K |
| FP16 | 70 GB | 80.5 GB | 91 GB | 105 GB | 8K / 8K |
These GPUs meet the recommended 22.8 GB VRAM for the Q4_K_M quantization. Estimated speeds are approximate and assume full GPU offloading.
Budget Pick
NVIDIA GeForce RTX 409024 GB VRAM · ~46.1 tok/s
Lowest cost that meets recommended VRAM
Rent on RunPodFastest Pick
NVIDIA GeForce RTX 509032 GB VRAM · ~81.9 tok/s
Highest estimated throughput
Check price on AmazonBest Value
NVIDIA GeForce RTX 3090 Ti24 GB VRAM · ~46.1 tok/s
Best speed per dollar of VRAM
Check price on AmazonNeed a detailed comparison? See all GPU rankings for Qwen3.5 35B A3B.