As someone who is new to running Language Models, I am struggling to understand the infrastructure needed to run them effectively. I would greatly appreciate any advice you can offer. Could you please help me with the following questions:<p>1. What are the hardware specifications you would recommend for running Language Models?<p>2. What are the building options available for Language Models and which one is the easiest to set up?<p>3. Is it better to rent or buy hardware for running Language Models?<p>4.What are some cost-saving strategies that have worked for you when running Language Models?
The best of the best right now is probably Vicuna 13B. The 30B and 65B LLaMA models are better on benchmarks, but there isn’t a compelling instruct fine tuned version of those yet so they require a lot of prompt engineering.<p>If you want to run Vicuna without quantization you need 25GB of VRAM, which exceeds pretty much all consumer GPUs. Vicuna 4bit GPTQ is decent though I personally notice a quality difference when comparing it to 16bit.<p>CPU is also an option, you can run pretty much any model that will fit in your RAM, although your performance will obviously suffer. LlamaCPP has gotten very popular.
On a slight tangent, I feel like this is one major ace up Apple’s sleeve, if they can zoom into it. With the awesome performance of Apple Silicon and how lots of the big leaps are seen with video rendering, they just need to focus some more on the ML capabilities. They’ve been skirting around ML with some optimizing on popular ML libraries, but it’s mostly focused on inference, but hopefully with recent pytorch 2.0 optimizations and co, they can meet open source libraries halfway, and do more.<p>I think the current proliferation of AI and general awareness of LLM can be a major selling point if they make sure their neural engine is well optimized for it. Will put them right at the center of the conversation, especially since one of the current concern is the cost of training these models.
<a href="https://stateofart.ai/" rel="nofollow">https://stateofart.ai/</a><p>Disclosure: I am the author of the website, and it's extremely light on content currently.
1. The best hardware you can afford.<p>2. Take a look at huggingface.co<p>3. Rent for shot periods, buy if you need it for a long time. You can do the maths.<p>4. Smaller models, quantization, running on CPU when the speed and increased energy usage isn’t a problem.