TechEcho

7 comments

itissidover 1 year ago

Every one should go through this rite of passage work and get to the "Attention is all you need" implementation. It's a world where engineering and the academic papers are very close and reproducible and a must for you to progress in the field.(see also andre karpathys zero to hero nn series on youtube as well its very good and similar to this work)

评论 #39392031 未加载

评论 #39394549 未加载

评论 #39392337 未加载

评论 #39391142 未加载

cafaxoover 1 year ago

I did a similar thing for Julia: Llama2.jl contains vanilla Julia code [1] for training small Llama2-style models on the CPU.[1] <a href="https://github.com/cafaxo/Llama2.jl/tree/master/src/training">https://github.com/cafaxo/Llama2.jl/tree/master/src/training</a>

评论 #39394438 未加载

评论 #39392221 未加载

andxor_over 1 year ago

Very well written. AD is like magic and this is a good exposition on the basic building block.I quite like Jeremy's approach: <a href="https://nbviewer.org/github/fastai/fastbook/blob/master/17_foundations.ipynb" rel="nofollow">https://nbviewer.org/github/fastai/fastbook/blob/master/17_f...</a>It shows a very simple "Pythonic" approach to assemble gradient of a composition of functions from the gradients of the components.

asgrahamover 1 year ago

As a chronic premature optimizer my first reaction was, "Is this even possible in vanilla python???" Obviously it's possible, but can you train an LLM before the heat death of the universe? A perceptron, sure, of course. A deep learning model, plausible if it's not too deep. But a large language model? I.e. the kind of LLM necessary for "from vanilla python to functional coding assistant."But obviously the author already thought of that. The source repo has a great motto: "It don't go fast but it do be goin'" [1]I love the idea of the project and I'm curious to see what the endgame runtime will be.[1] <a href="https://github.com/bclarkson-code/Tricycle">https://github.com/bclarkson-code/Tricycle</a>

评论 #39389624 未加载

nqzeroover 1 year ago

is there an existing SLM that resembles an LLM in architecture that includes the code for training it ?i realize the cost and time to train may be prohibitive and that quality on general english might be very limited, but is the code itself available ?

评论 #39392178 未加载

revskillover 1 year ago

The only problem is it's implemented in Python. One reason is i hate to install python on my machine, and i don't know how to manage dependencies. The MacOSX required to upgrade to install native stuffs. Such a hell.

Building an LLM from Scratch: Automatic Differentiation (2023)

7 comments

Building an LLM from Scratch: Automatic Differentiation (2023)

7 comments