If you can do it in GPU memory, do it in GPU memory.<p>If it takes quantization + buying a 1 TB RAM server ($4k of RAM + parts), do that in memory with the raw tensors and shed a small tear -- both for cost and the joy of the pain that you are saving yourself, your team, and everyone around you.<p>If you need more, then tread lightly and extremely carefully. Very few mega LLM pretraining datasets are even on this order of magnitude, though some are a few terabytes IIRC. If you are exceeding this, then your business usecase is likely specialized indeed.<p>This message brought to you by the "cost reduction by not adding dumb complexity" group. I try to maintain a reputation for aggressively fighting unnecessary complexity, which is the true cost measure IMO of any system.