The research team at Predibase analyzed 700+ fine-tuning experiments using 13 of the most popular open-source models, GPT-3.5/4/4o, and 31 distinct datasets and tasks. We chose open-source models with a max of 7B parameters to ensure that any organization can train the models on low-end GPUs. For evaluation, we utilized task-specific metrics including accuracy, rouge, and HumanEval to assess performance.<p>Key Takeaways<p>- LoRA Fine-tuned models outperform GPT-4 on specialized tasks<p>- LoRA Fine-tuned models are fast and cheap to train and serve, averaging less than $8 each<p>- Specialized tasks are ideal for LoRA fine-tuning