← Back to GPUs
NVIDIA · Data Center
NVIDIA A100 40GB
$4500$10000 MSRP
The NVIDIA A100 40GB is the more affordable variant of NVIDIA's legendary data center GPU. With 40GB of HBM2e at 1,555 GB/s bandwidth, it runs 70B models at Q4 and any model up to 32B at full precision. Used prices around ,500 make it competitive with a new RTX 5090 while offering more VRAM and higher memory bandwidth. Requires server chassis or PCIe adapter — not a plug-and-play consumer card.
Best ForHigh-VRAM AI inference at data center performance levels
VerdictCompetitive with RTX 5090 on used market — more VRAM, more bandwidth, but no gaming.
AI
10/10
Gaming
1/10
Specifications
VRAM40GB HBM2e
Memory Bandwidth1555 GB/s
CUDA Cores6,912
Boost Clock1410 MHz
TDP250W
Power Connector1x 8-pin
Length267mm
Form FactorDual Slot
Release Year2020
AI Capabilities
Unrivaled40GB VRAM
Run 70B+ models, no compromises. The AI power user's dream.
Can run (Q4 quantized)
Llama 3.1 70BLlama 3.1 8BQwen 2.5 32BQwen 2.5 14BMistral 7BDeepSeek R1 70BFLUX.1 DevStable Diffusion XLStable Diffusion 3.5 LargeHunyuanVideoCogVideoX-5BMochi 1LTX VideoStable Video DiffusionWan Video 14BCodestral 22BQwen 2.5 Coder 32BLLaVA 1.6 34BAlphaFold 2ESMFold (ESM-2 15B)ESM-2 3BscGPTRFdiffusionFine-tune Llama 8BFine-tune Llama 70BTrain SDXL LoRATrain FLUX LoRA
Tight fit (may need CPU offload)
Qwen 2.5 72B (42GB Q4)
Recommended system RAM for AI: 80GB+ (2x GPU VRAM for model overflow)
Performance Estimates
Estimated tokens/sec for LLM inference based on 1555 GB/s memory bandwidth — not hardware benchmarks. Methodology · What is Q4/Q8?
Llama 3.1 70B70B
Q4~26-32 tok/sUsableLlama 3.1 8B8B
FP16~58-72 tok/sFastQwen 2.5 72B72B
Offload~1-3 tok/sVery slowQwen 2.5 32B32B
Q8~31-38 tok/sFastQwen 2.5 14B14B
FP16~33-41 tok/sFastMistral 7B7B
FP16~66-82 tok/sExcellentDeepSeek R1 70B70B
Q4~26-32 tok/sUsableCodestral 22B22B
Q8~45-55 tok/sFastQwen 2.5 Coder 32B32B
Q8~31-38 tok/sFastPros
- +40GB HBM2e with massive bandwidth
- +Runs 70B models at Q4
- +Great used prices for the capability
- +NVLink support
Cons
- -No display output
- -Needs server chassis or PCIe adapter
- -No gaming at all
- -Older Ampere architecture
aiworkstation