What's interesting is how there's so much emphasis on high end video cards which are prohibitively expensive for most people, yet many of the newer models, when quantized, run perfectly well on CPUs. Instead of chasing speed with money, seeing what can run decently on available hardware will end up having a much bigger potential impact on a greater number of people.<p>As an experiment, I've been running llama.cpp on an old 2012 AMD Bulldozer system, which most people consider to be AMD's equivalent of Intel's Pentium 4, with 64 gigs of memory, and with newer models it's surprisingly usable, if not entirely practical. It's much more usable, in my opinion, than spending energy trying to get everything to fit in to more modest GPUs' smaller amounts of VRAM.<p>It certainly shows that people shouldn't be dissuaded from playing around just because they have an older GPU and/or a GPU without much VRAM.