Best local LLMs for 8 GB VRAM

Nomic Embed Text v1.5

Nomic · 0.137B

BGE Large EN v1.5

BGE · 0.335B

mxbai-embed-large

Mixedbread · 0.335B

Qwen 2.5 0.5B

Qwen · 0.5B

Qwen3 1.7B

Qwen · 1.7B

0.9 GB

chatsmalledgereasoning

VRAM1 GB

RAM2 GB

Snowflake Arctic Embed L

Snowflake · 0.335B

Gemma 3n E2B

Gemma · 2B

1 GB

chatsmalledgemultimodal

VRAM1.2 GB

RAM2 GB

Gemma 3 1B

Gemma · 1B

Llama 3.2 1B

Llama · 1.24B

0.75 GB

chatsmalledgeinstruct

VRAM1.5 GB

RAM2 GB

chatsmallreasoningtool-use+1

SmolLM3 3B

SmolLM · 3B

1.5 GB

VRAM1.7 GB

RAM3 GB

reasoningdistillsmallchat

DeepSeek R1 Distill Qwen 1.5B

DeepSeek · 1.5B

1 GB

VRAM2 GB

chatsmallreasoningtool-use+2

Qwen 2.5 1.5B

Qwen · 1.5B

StableLM 2 1.6B

StableLM · 1.6B

Gemma 3n E4B

Gemma · 4B

Qwen3 4B

Qwen · 4B

Qwen3.5 4B

Qwen · 4B

2 GB

VRAM2.3 GB

RAM3 GB

Gemma 2 2B

Gemma · 2B

Stable Code 3B

StableCode · 3B

StarCoder2 3B

StarCoder · 3B

Llama 3.2 3B

Llama · 3.21B

2 GB

chatsmalledgeinstruct

VRAM3 GB

Phi-3 Mini 3.8B

Phi · 3.8B

Phi-4 Mini 3.8B

Phi · 3.8B

2.3 GB

Best for Low-VRAM devices

chatsmallreasoningcoding

VRAM3 GB

chatsmalledgemultimodal+2

Qwen 2.5 3B

Qwen · 3B

Gemma 3 4B

Gemma · 4B

Gemma 4 E2B

Gemma · 2.3B

2.7 GB

VRAM3.5 GB

Nemotron Mini 4B

Nemotron · 4B

Mistral 7B v0.3

Mistral · 7B

3.5 GB

Best for Baseline chat

Llama 3.1 8B

Llama · 8B

Best for General use

Qwen 2.5 7B

Qwen · 7B

3.7 GB

Best for General multilingual assistants

chatgeneralmultilingual

VRAM4.5 GB

QuantQ3_K_M

Yi 1.5 6B

Yi · 6B

3.7 GB

chatgeneralmultilingual

VRAM4.5 GB

chatgeneralreasoningmultilingual

Qwen3 8B

Qwen · 8B

4 GB

VRAM4.6 GB

chatsmallmultimodalreasoning+1

CodeLlama 7B

CodeLlama · 7B

Gemma 4 E4B

Gemma · 4.5B

4.1 GB

Best for On-device multimodal assistants

VRAM5 GB

chatgeneralreasoningtool-use+2

StarCoder2 7B

StarCoder · 7B

Qwen3.5 9B

Qwen · 9B

4.5 GB

Best for Upgraded general local assistant

VRAM5.2 GB

RAM7 GB

chatgeneralfunction-calling

Aya Expanse 8B

Command · 8B

Command R7B

Command · 7B

DeepSeek R1 Distill Llama 8B

DeepSeek · 8B

DeepSeek R1 Distill Qwen 7B

DeepSeek · 7B

4.7 GB

Best for Reasoning tasks

Hermes 3 Llama 3.1 8B

Hermes · 8B

4.9 GB

VRAM5.5 GB

RAM8 GB

chatgeneraltool-usemultilingual

InternLM 2.5 7B

InternLM · 7B

4.7 GB

VRAM5.5 GB

RAM8 GB

Qwen 2.5 Coder 7B

Qwen · 7B

4.7 GB

Best for Code generation

Gemma 2 9B

Gemma · 9B

5.5 GB

Best for General local assistants

Yi 1.5 9B

Yi · 9B

5.5 GB

chatgeneralmultilingual

VRAM6.5 GB

RAM8 GB

chatreasoningcodingfrontier

Phi-4 Reasoning 14B

Phi · 14B

Phi-4 Reasoning Plus 14B

Phi · 14B

7 GB

VRAM8 GB

RAM11 GB

chatgeneralreasoningmultilingual

Qwen3 14B

Qwen · 14B

7 GB

VRAM8 GB

RAM11 GB