TL;DR: I explored the journey of fine-tuning two small LLMs: Llama 3.1-8B-Instruct vs GPT-3.5-Turbo on a fun dataset of GPT-4o generated conversations in the style of Larry David. All the code for this project and evaluation results are openly accessible.<p>My key takeaway was that fine-tuning small LLMs is easily within reach and practically free for independent developers. This approach could provide a major cost/latency/privacy advantage if one is interested in small models for specialized use cases.