Every one should go through this <i>rite of passage</i> work and get to the "Attention is all you need" implementation. It's a world where engineering and the academic papers are very close and reproducible and a must for you to progress in the field.<p>(see also andre karpathys zero to hero nn series on youtube as well its very good and similar to this work)
I did a similar thing for Julia:
Llama2.jl contains vanilla Julia code [1] for training small Llama2-style models on the CPU.<p>[1] <a href="https://github.com/cafaxo/Llama2.jl/tree/master/src/training">https://github.com/cafaxo/Llama2.jl/tree/master/src/training</a>
Very well written. AD is like magic and this is a good exposition on the basic building block.<p>I quite like Jeremy's approach: <a href="https://nbviewer.org/github/fastai/fastbook/blob/master/17_foundations.ipynb" rel="nofollow">https://nbviewer.org/github/fastai/fastbook/blob/master/17_f...</a><p>It shows a very simple "Pythonic" approach to assemble gradient of a composition of functions from the gradients of the components.
As a chronic premature optimizer my first reaction was, "Is this even possible in vanilla python???" Obviously it's <i>possible</i>, but can you train an LLM before the heat death of the universe? A perceptron, sure, of course. A deep learning model, plausible if it's not too deep. But a <i>large</i> language model? I.e. the kind of LLM necessary for "from vanilla python to functional coding assistant."<p>But obviously the author already thought of that. The source repo has a great motto: "It don't go fast but it do be goin'" [1]<p>I love the idea of the project and I'm curious to see what the endgame runtime will be.<p>[1] <a href="https://github.com/bclarkson-code/Tricycle">https://github.com/bclarkson-code/Tricycle</a>
is there an existing SLM that resembles an LLM in architecture that includes the code for training it ?<p>i realize the cost and time to train may be prohibitive and that quality on general english might be very limited, but is the code itself available ?
The only problem is it's implemented in Python. One reason is i hate to install python on my machine, and i don't know how to manage dependencies. The MacOSX required to upgrade to install native stuffs. Such a hell.