> The training code, dataset and weights for this model are open sourced so that developers can reproduce the model and help train other SLMs and LLMs.<p>Wow, an actual open source language model (first of its kind [from a larger company] maybe even?), includes all you need to be able to recreate it from scratch. Thanks AMD!<p>Available under this funky GitHub organization it seems: <a href="https://github.com/AMD-AIG-AIMA/AMD-LLM">https://github.com/AMD-AIG-AIMA/AMD-LLM</a>
Now this here is the beginning on real innovation of AI. With AMD coming in(albeit late and slowly), meta with LLama improving, we will soon see some real adaptation and development in next few thousand days. At this moment, I see OAI as the yahoo of the pre-Google era.
Looks like they are using sixteen $13k GPUs [1] (around $210k hardware) for 6 days of training.<p>Anyone know the recommended cloud provider and equivalent rental price?<p>[1] <a href="https://www.wiredzone.com/shop/product/10025451-supermicro-gpu-amdmi250-oam-0029h-graphics-processing-unit-gpu-instinct-mi250-128gb-hbm2e-amd-100-300000029h-10725?srsltid=AfmBOoqud_fxtvuGVSjxigEx4DSMbozywAE5-dI9GfBsBWNDzLE9-wN2" rel="nofollow">https://www.wiredzone.com/shop/product/10025451-supermicro-g...</a>
The section on speculative execution is interesting.
"This approach allows each forward pass to generate multiple tokens without compromising performance, thereby significantly reducing memory access consumption, and enabling several orders of magnitude speed improvements."<p>Does anyone know if the "several orders of magnitude speed improvement" is accurate? I'm doubtful.<p>Very interesting though! I'll be playing around with this on the weekend!
Since most people can’t run these LLMs locally, I wonder what a model would look like where we have hyper tuned models for specific purposes, IE a model for code, a model for prose, etc. you have a director model that interprets what downstream model should be used and then it runs that. That way you can run the model locally, without needing beefy GPUs. It’s a trade off of using more disk space vs needing more vram
It's always encouraging to see wider hardware platform competition for AI inference and training. Access to affordable and capable hardware for consumers will only benefit (I imagine) from increasing competition.
> [1] The training code for AMD-135M is based on TinyLlama, utilizing multi-node distributed training with PyTorch FSDP.<p>I thought PyTorch didn't work well with AMD architecture, and read of many people using JAX instead?