Low VRAM picks
Best local LLMs for 8 GB VRAM
Models that fit tighter consumer GPUs without feeling toy-sized.
48 models in this collection.
Qwen3 0.6B
Qwen · 0.6B
Nomic Embed Text v1.5
Nomic · 0.137B
BGE Large EN v1.5
BGE · 0.335B
mxbai-embed-large
Mixedbread · 0.335B
Qwen 2.5 0.5B
Qwen · 0.5B
Qwen3 1.7B
Qwen · 1.7B
Snowflake Arctic Embed L
Snowflake · 0.335B
Gemma 3n E2B
Gemma · 2B
Gemma 3 1B
Gemma · 1B
Llama 3.2 1B
Llama · 1.24B
SmolLM3 3B
SmolLM · 3B
DeepSeek R1 Distill Qwen 1.5B
DeepSeek · 1.5B
Qwen 2.5 1.5B
Qwen · 1.5B
StableLM 2 1.6B
StableLM · 1.6B
Gemma 3n E4B
Gemma · 4B
Qwen3 4B
Qwen · 4B
Qwen3.5 4B
Qwen · 4B
Gemma 2 2B
Gemma · 2B
Stable Code 3B
StableCode · 3B
StarCoder2 3B
StarCoder · 3B
Llama 3.2 3B
Llama · 3.21B
Phi-3 Mini 3.8B
Phi · 3.8B
Phi-4 Mini 3.8B
Phi · 3.8B
Best for Low-VRAM devices
Qwen 2.5 3B
Qwen · 3B
Gemma 3 4B
Gemma · 4B
Gemma 4 E2B
Gemma · 2.3B
Nemotron Mini 4B
Nemotron · 4B
Mistral 7B v0.3
Mistral · 7B
Best for Baseline chat
Llama 3.1 8B
Llama · 8B
Best for General use
Qwen 2.5 7B
Qwen · 7B
Best for General multilingual assistants
Yi 1.5 6B
Yi · 6B
Qwen3 8B
Qwen · 8B
CodeLlama 7B
CodeLlama · 7B
Gemma 4 E4B
Gemma · 4.5B
Best for On-device multimodal assistants
StarCoder2 7B
StarCoder · 7B
Qwen3.5 9B
Qwen · 9B
Best for Upgraded general local assistant
Aya Expanse 8B
Command · 8B
Command R7B
Command · 7B
DeepSeek R1 Distill Llama 8B
DeepSeek · 8B
DeepSeek R1 Distill Qwen 7B
DeepSeek · 7B
Best for Reasoning tasks
Hermes 3 Llama 3.1 8B
Hermes · 8B
InternLM 2.5 7B
InternLM · 7B
Qwen 2.5 Coder 7B
Qwen · 7B
Best for Code generation
Gemma 2 9B
Gemma · 9B
Best for General local assistants
Yi 1.5 9B
Yi · 9B
Phi-4 Reasoning 14B
Phi · 14B
Phi-4 Reasoning Plus 14B
Phi · 14B
Qwen3 14B
Qwen · 14B