Can NVIDIA RTX 5000 Ada run DeepSeek R1 70B?
70B parameter LLM model on 32GB GDDR6 ECC
Barely — requires CPU/RAM offloading
~1-3 tok/s (offload)
SpeedVery slow — expect 1-3 tokens/sec
QualityQuality is fine but the speed makes it impractical for interactive use
VRAM Requirements
DeepSeek R1 70B is a 70B parameter model. At full precision (FP16), it requires 140GB of VRAM. Your NVIDIA RTX 5000 Ada only has 32GB — not enough even at maximum compression.
FP16 (Full Precision)140GB (need 108GB more)
Maximum quality, no quantization
Q8 (8-bit)70GB (need 38GB more)
Near-lossless, ~50% size reduction
Q4 (4-bit)40GB (need 8GB more)
Good quality, ~75% size reduction
Your GPU VRAM: 32GB GDDR6 ECC at 576 GB/s bandwidth
Recommended system RAM: 64GB DDR5 (2x GPU VRAM minimum for model overflow)
Recommended system RAM: 64GB DDR5 (2x GPU VRAM minimum for model overflow)
How to Set It Up
Step 1: Install Ollama
curl -fsSL https://ollama.com/install.sh | shOllama is the easiest way to run local LLMs. Works on Linux, macOS, and Windows.
Step 2: Download and run DeepSeek R1 70B
ollama run deepseek-r1:70b:q4_K_MThis downloads the model (~70GB). First run takes a few minutes.
Step 3: Verify GPU is being used
nvidia-smiCheck that VRAM usage increases when the model loads. You should see ~40GB used.
NVIDIA RTX 5000 Ada Specs
VRAM32GB GDDR6 ECC
Memory Bandwidth576 GB/s
TDP250W
CUDA Cores12,800
Street Price~$3800
AI Rating9/10
Other GPUs That Run DeepSeek R1 70B
Other LLM Models on NVIDIA RTX 5000 Ada
About DeepSeek R1 70B
Strong reasoning model. Same tier as Llama 70B for hardware requirements.
Category: LLM · Parameters: 70B · CUDA required: No (runs via llama.cpp/GGUF)