Can NVIDIA GeForce RTX 5090 run DeepSeek R1 70B?
70B parameter LLM model on 32GB GDDR7
Barely — requires CPU/RAM offloading
~1-3 tok/s (offload)
SpeedVery slow — expect 1-3 tokens/sec
QualityQuality is fine but the speed makes it impractical for interactive use
VRAM Requirements
DeepSeek R1 70B is a 70B parameter model. At full precision (FP16), it requires 140GB of VRAM. Your NVIDIA GeForce RTX 5090 only has 32GB — not enough even at maximum compression.
FP16 (Full Precision)140GB (need 108GB more)
Maximum quality, no quantization
Q8 (8-bit)70GB (need 38GB more)
Near-lossless, ~50% size reduction
Q4 (4-bit)40GB (need 8GB more)
Good quality, ~75% size reduction
Your GPU VRAM: 32GB GDDR7 at 1792 GB/s bandwidth
Recommended system RAM: 64GB DDR5 (2x GPU VRAM minimum for model overflow)
Recommended system RAM: 64GB DDR5 (2x GPU VRAM minimum for model overflow)
How to Set It Up
Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | shOllama is the easiest way to run local LLMs. Works on Linux, macOS, and Windows.
Step 2: Download and run DeepSeek R1 70B
ollama run deepseek-r1:70b:q4_K_MThis downloads the model (~70GB). First run takes a few minutes.
Step 3: Verify GPU is being used
nvidia-smiCheck that VRAM usage increases when the model loads. You should see ~40GB used.
NVIDIA GeForce RTX 5090 Specs
VRAM32GB GDDR7
Memory Bandwidth1792 GB/s
TDP575W
CUDA Cores21,760
Street Price~$2800
AI Rating10/10
Other GPUs That Run DeepSeek R1 70B
Other LLM Models on NVIDIA GeForce RTX 5090
About DeepSeek R1 70B
Strong reasoning model. Same tier as Llama 70B for hardware requirements.
Category: LLM · Parameters: 70B · CUDA required: No (runs via llama.cpp/GGUF)