← All Guides

How Much VRAM Do You Actually Need?

The honest, data-driven answer. No marketing fluff, no "it depends" cop-outs — real numbers for every workload.

The Short Answer

8GB — You can experiment with 7-8B parameter models (Mistral, Llama 8B) and run Stable Diffusion. Fine for learning, not enough for serious work.

16GB — The minimum for real AI work. Runs 14B models at Q8, handles SDXL and AlphaFold for medium proteins. The entry point for scientific computing.

24GB — The sweet spot. Runs 70B models at Q4, handles virtually every image gen model, and comfortably runs AlphaFold on long proteins. This is what most professionals should target.

32GB+ — No compromises. 70B models at Q8, large video generation, ESMFold at full precision. For people who need to run the biggest models at the highest quality.

Why VRAM Is the Only Spec That Matters for AI

GPU marketing focuses on TFLOPS, clock speeds, and ray tracing cores. For AI workloads, none of that matters as much as VRAM. Here's why:

An AI model is essentially a giant matrix of numbers — its "weights." To run the model, every single weight must be loaded into VRAM. If the model doesn't fit, it either won't load at all, or it spills into system RAM which is 10-50x slower. A model running from system RAM generates 1-3 tokens per second. The same model fully in VRAM generates 30-80 tokens per second.

This is a binary cliff, not a gradient. Either the model fits in VRAM and runs fast, or it doesn't and the experience is unusable for interactive work. There is no "almost fits" — you're either above the line or below it.

For scientific computing, the same principle applies differently. AlphaFold's VRAM usage scales with protein sequence length — the attention matrices grow quadratically. scGPT's VRAM scales with dataset size. The model weights are small, but the working memory during computation is the bottleneck.

The practical implication: buy the most VRAM you can afford. A slower GPU with more VRAM will serve you better than a faster GPU with less. An RTX 3090 (24GB, older architecture) is more useful for AI than an RTX 4070 Ti (12GB, newer architecture) despite being a generation behind.

The Complete VRAM Compatibility Matrix

Every AI model we track, at every VRAM level. Green means it runs, yellow means quantized, red means it won't fit. What is quantization?

Model8GB12GB16GB24GB32GB48GB80GB
Llama 3.1 70B 70BNoNoNoNoOffloadQ4Q8
Llama 3.1 8B 8BQ8Q8FP16FP16FP16FP16FP16
Qwen 2.5 72B 72BNoNoNoNoOffloadQ4Q8
Qwen 2.5 32B 32BNoNoOffloadQ4Q8Q8FP16
Qwen 2.5 14B 14BOffloadQ4Q8Q8FP16FP16FP16
Mistral 7B 7BQ8Q8FP16FP16FP16FP16FP16
DeepSeek R1 70B 70BNoNoNoNoOffloadQ4Q8
FLUX.1 Dev 12BOffloadQ4Q8Q8FP16FP16FP16
Stable Diffusion XL 6.6BQ8Q8FP16FP16FP16FP16FP16
Stable Diffusion 3.5 Large 8BQ4Q8Q8FP16FP16FP16FP16
HunyuanVideo 13BNoOffloadQ4Q8Q8FP16FP16
CogVideoX-5B 5BQ4Q8Q8FP16FP16FP16FP16
Mochi 1 10BOffloadQ4Q8Q8FP16FP16FP16
LTX Video 2BQ8FP16FP16FP16FP16FP16FP16
Stable Video Diffusion 1.5BFP16FP16FP16FP16FP16FP16FP16
Wan Video 14B 14BOffloadQ4Q4Q8FP16FP16FP16
Codestral 22B 22BNoOffloadQ4Q8Q8FP16FP16
Qwen 2.5 Coder 32B 32BNoNoOffloadQ4Q8Q8FP16
LLaVA 1.6 34B 34BNoNoOffloadQ4Q4Q8FP16
AlphaFold 2 93MQ4Q8FP16FP16FP16FP16FP16
ESMFold (ESM-2 15B) 15BOffloadQ4Q8Q8FP16FP16FP16
ESM-2 3B 3BFP16FP16FP16FP16FP16FP16FP16
scGPT 50MQ8FP16FP16FP16FP16FP16FP16
RFdiffusion 200MQ4Q8FP16FP16FP16FP16FP16
Fine-tune Llama 8B 8BQ4Q4Q8Q8Q8FP16FP16
Fine-tune Llama 70B 70BNoNoNoNoOffloadQ4Q8
Train SDXL LoRA 6.6BQ4Q8Q8FP16FP16FP16FP16
Train FLUX LoRA 12BNoOffloadQ4Q8Q8FP16FP16

FP16 = full precision. Q8 = 8-bit quantized. Q4 = 4-bit quantized. Offload = partially in system RAM (very slow).

VRAM Tier Breakdown

8GBThe Experimenter's Tier

8GB gets you started but hits walls quickly. You can run 7-8B parameter LLMs like Mistral 7B and Llama 3.1 8B at Q4-Q8 quantization. These are capable for basic chat, simple coding help, and text processing, but they're noticeably less intelligent than larger models.

For image generation, 8GB handles Stable Diffusion XL at Q8 and the lightweight video models (LTX Video, Stable Video Diffusion). You won't run FLUX.1 or the larger video generators.

For scientific computing, 8GB is enough for ESM-2 3B (protein embeddings), short protein AlphaFold predictions, and small single-cell experiments with scGPT. It's a viable entry point for learning, but you'll outgrow it fast.

Who should buy 8GB: Students learning AI/ML, hobbyists experimenting, anyone on a strict budget who wants to get started rather than wait.

12GBThe Stepping Stone

12GB is an underrated sweet spot for budget builders. The RTX 3060 12GB is the cheapest NVIDIA card with enough VRAM for meaningful AI work, and it's widely available used for under $200.

You can run 14B models at Q4 (Qwen 2.5 14B is excellent at this size), 8B models at full FP16 precision, and most image generation models. AlphaFold runs medium-length proteins, and scGPT handles datasets up to ~30K cells.

The key limitation: 70B-class models (the frontier) are completely out of reach. You're capped at the "smart but not brilliant" tier of language models. For many use cases — coding assistance, quick queries, image generation, small-scale scientific computing — that's perfectly fine.

Who should buy 12GB: Budget-conscious AI experimenters, scientists running smaller analyses, anyone who wants more than 8GB without spending $500+.

16GBThe Minimum for Serious Work

16GB is where AI transitions from "toy" to "tool." You can run 14B models at Q8 (near-lossless quality), Qwen 2.5 32B at Q4, FLUX.1 Dev at Q8 for state-of-the-art image generation, and ESMFold at FP16 for fast protein structure predictions.

For scientific computing, 16GB is the practical minimum. AlphaFold handles most single-chain proteins at full precision. scGPT runs medium-scale experiments. RFdiffusion runs most protein design tasks. You're not limited to toy problems anymore.

The 70B models still don't fit — even at Q4, Llama 3.1 70B needs ~40GB. But the 14-32B models are remarkably capable in 2026. Qwen 2.5 14B at Q8 handles most coding, writing, and analysis tasks that people used to need 70B models for just a year ago.

Who should buy 16GB: Scientists running AlphaFold and single-cell analysis, developers wanting local code completion, AI enthusiasts who want real capability without the premium price.

24GBThe Sweet Spot (Our Recommendation)

24GB is the magic number in 2026. At this tier, the frontier opens up: Llama 3.1 70B and DeepSeek R1 70B run at Q4 with usable speed. Qwen 2.5 32B runs at Q8 — near-lossless quality on one of the best open models available. Every image generation model runs comfortably. Most video generation models fit.

For scientific computing, 24GB is professional-grade. AlphaFold handles long sequences and multimer predictions without VRAM anxiety. ESMFold runs at FP16 for most proteins. scGPT handles atlas-scale datasets. RFdiffusion runs complex multi-chain protein design. You can do real research, not just demos.

The RTX 3090 (24GB) and RTX 4090 (24GB) are the two most popular GPUs in the AI community for good reason. The 3090 is available used for ~$900 — making it the best VRAM-per-dollar consumer card available. The 4090 is faster but costs over twice as much.

Who should buy 24GB: Most people reading this guide. Researchers running production workloads, developers who want the best local AI models, scientists who need AlphaFold and ESMFold at full capability.

32GBThe No-Compromise Tier

32GB removes nearly every constraint. 70B models run at Q4 with room for context, or you can run 32B models at Q8 — the quality sweet spot. FLUX.1 Dev runs at full FP16 precision. HunyuanVideo and the larger video generators become practical. ESMFold at full precision handles any protein.

The RTX 5090 is currently the only consumer card at 32GB, and its 1,792 GB/s memory bandwidth means those 70B models don't just fit — they run fast. If you can stomach the price premium over a 24GB card, this is the "set it and forget it" tier.

Who should buy 32GB: Power users who want the biggest models at high quality, video generation enthusiasts, researchers who need ESMFold at full precision for large proteins, anyone who doesn't want to think about VRAM limits.

48-80GBWorkstation & Data Center

At 48GB+ you're in workstation and data center territory. The RTX 6000 Ada (48GB), A100 (40/80GB), and H100 (80GB) serve organizations that need to run 70B+ models at high precision, train models, or process massive scientific datasets.

Llama 70B at Q8 (70GB) fits on a single A100 80GB or H100. At FP16 (140GB), you need multi-GPU setups. For scientific computing, 80GB lets you run AlphaFold on the longest protein complexes, train custom ESM models, and process single-cell datasets with millions of cells.

The used market for data center cards is interesting: the Tesla P40 (24GB) goes for ~$300, and A100 40GB cards are becoming accessible at ~$4,500. These lack display output and need server cooling, but for pure compute, the VRAM per dollar is unbeatable.

Who should buy 48GB+: Research labs, companies deploying AI at scale, scientists working with very large datasets, anyone training (not just running) models.

VRAM Requirements by Workload

Local LLMs (ChatGPT-style Models)

Running large language models locally is the most VRAM-hungry consumer AI workload. The rule of thumb: parameters x 2 = GB at FP16, parameters x 0.6 = GB at Q4. A 70B model needs ~140GB at full precision or ~40GB at Q4.

ModelFP16Q8Q4
Llama 3.1 70B 70B140GB70GB40GB
Llama 3.1 8B 8B16GB8GB5GB
Qwen 2.5 72B 72B144GB72GB42GB
Qwen 2.5 32B 32B64GB32GB20GB
Qwen 2.5 14B 14B28GB14GB9GB
Mistral 7B 7B14GB7GB4.5GB
DeepSeek R1 70B 70B140GB70GB40GB

The practical reality: In 2026, the 14-32B models have caught up to where 70B models were a year ago. Qwen 2.5 14B at Q8 (14GB VRAM) handles most everyday tasks — coding, writing, analysis — that used to require 70B. The 70B models are still better for complex reasoning and nuanced work, but the gap is narrowing fast. Don't overspec if 16GB solves your actual use case.

Image Generation (Stable Diffusion, FLUX)

Image generation models have more forgiving VRAM requirements than LLMs. SDXL runs on 8GB at Q8. FLUX.1 Dev — the current state-of-the-art — needs 16GB for comfortable operation. The VRAM bottleneck for image gen is usually resolution and batch size, not the model itself.

ModelFP16Q8Q4
FLUX.1 Dev32GB16GB10GB
Stable Diffusion XL16GB8GB6GB
Stable Diffusion 3.5 Large18GB10GB7GB

Key insight: If you only want image generation, 12-16GB is plenty. If you want image gen and local LLMs, you need enough VRAM for the LLM — the image gen model will fit by default. This is why we recommend buying for your most demanding workload.

Video Generation

Video generation is the most VRAM-hungry creative workload. The best models (HunyuanVideo, Wan Video 14B) need 16-24GB for usable quality. Lightweight options exist (LTX Video runs on 4GB) but the quality difference is stark.

ModelFP16Q8Q4
HunyuanVideo40GB22GB14GB
CogVideoX-5B18GB10GB7GB
Mochi 130GB16GB10GB
LTX Video10GB6GB4GB
Stable Video Diffusion8GB5GB3.5GB
Wan Video 14B32GB18GB11GB

Scientific Computing (AlphaFold, Single-Cell, Protein Design)

Scientific computing VRAM requirements are different from AI models. The model weights are often small, but working memory during computation scales with input size. AlphaFold's attention grows quadratically with sequence length. scGPT's memory scales with cell count. The numbers below represent typical workloads — your specific needs depend on your data.

ToolLargeMediumSmall
AlphaFold 216GB12GB8GB
ESM-2 3B6GB3GB2GB
scGPT12GB8GB4GB
RFdiffusion16GB10GB8GB

AlphaFold 2:The model weights are only ~200MB. VRAM usage is dominated by the MSA (Multiple Sequence Alignment) attention matrices. Short proteins (<500 residues) fit on 8GB. Long proteins (>1000 residues) or multimer predictions need 16GB+. ColabFold with MMseqs2 reduces memory by avoiding the full BFD database.

ESMFold: Unlike AlphaFold, ESMFold uses a single-sequence approach — no MSA needed. The trade-off is that the model itself is 15B parameters (30GB at FP16). But predictions complete in seconds, not minutes. If you're screening hundreds of proteins, ESMFold on a 24GB+ card is dramatically faster than AlphaFold.

scGPT: The model is small (50M parameters), but VRAM scales with your dataset. 10,000 cells at training time needs ~4-6GB. 100,000 cells needs ~12GB. If you're working with atlas-scale datasets (500K+ cells), you need 24GB+ or gradient checkpointing.

Code Completion & Generation

Local code models are increasingly competitive with cloud services. Qwen 2.5 Coder 32B is the current open-source leader, but it needs 20-32GB VRAM. Codestral 22B fits on 16GB at Q4. For IDE integration, speed matters — you want the model fully in VRAM for sub-second completions.

ModelFP16Q8Q4
Codestral 22B44GB22GB13GB
Qwen 2.5 Coder 32B64GB32GB20GB

Training & Fine-tuning (The VRAM Multiplier)

Everything above covers inference — running pre-trained models. Training needs 2-4x more VRAM because the GPU must store not just the model weights, but also optimizer states (Adam uses 2x the model size), gradients, and activations for backpropagation. A model that runs on 16GB might need 40GB to train.

The breakthrough: LoRA and QLoRA make training accessible on consumer hardware. Instead of updating all parameters (full fine-tune), LoRA trains small adapter matrices — typically 1-5% of total parameters. QLoRA goes further by loading the base model at 4-bit precision. This slashes VRAM from "needs a data center" to "fits on your desktop GPU."

WorkloadFull FTLoRAQLoRA
Fine-tune Llama 8B40GB16GB8GB
Fine-tune Llama 70B300GB80GB40GB
Train SDXL LoRA24GB12GB8GB
Train FLUX LoRA40GB24GB16GB

For LLM fine-tuning: QLoRA with Unsloth is the sweet spot. Fine-tune Llama 8B on 8GB, or Llama 70B on 40GB. Quality is within 1-2% of full fine-tuning for most tasks. Training takes hours, not days.

For image model training: LoRA is the standard — train custom styles, characters, and concepts in 15-60 minutes. SDXL LoRAs need 8-12GB. FLUX LoRAs need 16-24GB but produce dramatically better results.

The rule: If you plan to both run and train models, buy for the training requirement — it's always higher. A 24GB card that trains comfortably also runs everything.

VRAM vs System RAM — Can You Just Use More RAM?

Short answer: technically yes, practically no.

Tools like llama.cpp support "partial offloading" — keeping some model layers in system RAM and some in VRAM. This lets you run models that don't fully fit. The problem is speed: system RAM (DDR5-5600) has about 45 GB/s bandwidth. VRAM (GDDR6X on an RTX 4090) has 1,008 GB/s. That's a 20x difference.

In practice, a model that's 50% offloaded to system RAM runs roughly 3-5x slower than one fully in VRAM. A model that's 80% offloaded is essentially unusable for interactive work — you'll get 1-3 tokens per second instead of 30-80.

The rule: system RAM is for overflow, not as a substitute. Budget for at least 2x your GPU's VRAM in system RAM (e.g., 48-64GB DDR5 for a 24GB GPU). This gives the operating system, model loading, and other processes enough headroom without stealing from the GPU.

For scientific computing, system RAM matters more. AlphaFold's MSA database processing happens in CPU RAM before the GPU computation. Single-cell datasets are loaded into CPU memory first, then batched to the GPU. Budget 64-128GB for serious scientific workloads.

Common Mistakes When Buying for VRAM

Buying a fast GPU with less VRAM over a slower one with more

An RTX 4070 Ti Super (16GB) is faster than an RTX 3090 (24GB) at everything — except running models that need more than 16GB. If your workload needs 24GB, the 3090 is infinitely faster because it can actually run the model. Speed doesn't matter if the model doesn't load.

Assuming you can upgrade VRAM later

VRAM is soldered to the GPU. You cannot add more. The only upgrade path is buying a new card. Buy for your target workload, not your current one — the frontier models next year will be bigger than today's.

Buying AMD for AI because the specs look good

AMD GPUs have competitive specs on paper, but CUDA — NVIDIA's compute platform — is required by most AI frameworks. ROCm (AMD's alternative) is improving but still unreliable on consumer cards. Only buy AMD if gaming is your primary use case. For scientific computing, this is even more critical — AlphaFold, ESM, and scGPT all assume CUDA.

Thinking 8GB is "enough to start"

8GB was enough in 2023. In 2026, even entry-level models benefit from 12-16GB. The RTX 3060 12GB is available used for under $200 — that extra 4GB is worth more than the $50-100 savings of an 8GB card.

Focusing on MSRP instead of street price

GPU street prices can be 20-50% above MSRP due to demand. Our pricing uses real street prices, not manufacturer suggestions. A card with great $/GB at MSRP might be mediocre at street price.

How Much VRAM Will You Need in 2027?

Models are getting bigger, but they're also getting more efficient. Two competing trends:

Models grow: The frontier moves from 70B to 100B+ parameters. Video generation models are getting larger and longer. Scientific models are being trained on more data with larger context windows.

Efficiency improves: Better quantization methods (GGUF Q4_K_M is dramatically better than naive Q4 from two years ago). Mixture-of-experts architectures only activate a fraction of total parameters. Speculative decoding and other tricks reduce effective memory needs.

The practical answer: Buy one tier above what you need today. If 16GB solves your current workloads, buy 24GB. If 24GB works, 32GB gives you a year of headroom. The extra cost now is cheaper than replacing the entire GPU next year.

Quick Decision Framework

Budget<$300 → Used RTX 3060 12GB. <$600 → Used RTX 3090 24GB if you can find one. <$1000 → RTX 4060 Ti 16GB or RTX 3090 24GB.
LLMs7-8B models → 8GB. 14B models → 16GB. 32B models → 24GB. 70B models → 24GB (Q4) or 48GB+ (Q8).
Image GenSDXL → 8GB. FLUX.1 → 16GB. Comfortable workflow → 16-24GB.
Video GenLTX Video → 8GB. HunyuanVideo/Wan → 16-24GB. High quality → 24GB+.
ScienceESM-2 3B → 8GB. AlphaFold → 16GB. ESMFold/scGPT → 16-24GB. Production → 24GB+.
CodingCodestral 22B → 16GB (Q4). Qwen Coder 32B → 24GB (Q4). Best quality → 32GB.