Q: What quantization of Llama 3.1 70B should I use on a NVIDIA GeForce GTX 1660 Ti?

For 6 GB VRAM on the NVIDIA GeForce GTX 1660 Ti, the Q5_K_M variant is the best fit. Estimated ~2 tokens/sec on the Q5_K_M quantization.

Q: How fast does Llama 3.1 70B run on NVIDIA GeForce GTX 1660 Ti?

Roughly 2 tokens/sec for Q5_K_M. Real speed depends on context length, backend (Ollama, llama.cpp, LM Studio), and KV cache size.

Q: What if NVIDIA GeForce GTX 1660 Ti is not enough for Llama 3.1 70B?

Consider upgrading to Apple M1 Max (64 GB VRAM) which fits the recommended 56 GB target. Or pick a smaller quantization to stay on your current card.

Question 1

Can I run Llama 3.1 70B on a NVIDIA GeForce GTX 1660 Ti?

Accepted Answer

Sort of — NVIDIA GeForce GTX 1660 Ti can run Llama 3.1 70B (Q5_K_M) only by spilling layers to RAM. Generation will be slow. CPU + GPU hybrid — not enough VRAM (6 GB < 50 GB min), but 64 GB RAM is sufficient. Expect significantly slower inference.

Question 2

What quantization of Llama 3.1 70B should I use on a NVIDIA GeForce GTX 1660 Ti?

Accepted Answer

For 6 GB VRAM on the NVIDIA GeForce GTX 1660 Ti, the Q5_K_M variant is the best fit. Estimated ~2 tokens/sec on the Q5_K_M quantization.

Question 3

How fast does Llama 3.1 70B run on NVIDIA GeForce GTX 1660 Ti?

Accepted Answer

Roughly 2 tokens/sec for Q5_K_M. Real speed depends on context length, backend (Ollama, llama.cpp, LM Studio), and KV cache size.

Question 4

What if NVIDIA GeForce GTX 1660 Ti is not enough for Llama 3.1 70B?

Accepted Answer

Consider upgrading to Apple M1 Max (64 GB VRAM) which fits the recommended 56 GB target. Or pick a smaller quantization to stay on your current card.

Quantization	File Size	Min VRAM	Rec VRAM	Context	Verdict	Estimated tok/s
Q2_K	25 GB	27 GB	32 GB	8K / 128K	Hybrid CPU+GPU	~3
Q3_K_M	33 GB	35 GB	40 GB	8K / 128K	Hybrid CPU+GPU	~2
Q4_K_M	40 GB	42 GB	48 GB	8K / 128K	Hybrid CPU+GPU	~2
Q5_K_MBest fit	48 GB	50 GB	56 GB	8K / 128K	Hybrid CPU+GPU	~2
Q8_0	74 GB	76 GB	80 GB	8K / 128K	Can't Run	—

Can I Run Llama 3.1 70B on NVIDIA GeForce GTX 1660 Ti?

Share this matchup

Every Llama 3.1 70B quantization on NVIDIA GeForce GTX 1660 Ti

Upgrade options that fit Llama 3.1 70B better