TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Building an LLM from Scratch: Automatic Differentiation (2023)

355 pointsby netwrtover 1 year ago

7 comments

itissidover 1 year ago
Every one should go through this <i>rite of passage</i> work and get to the &quot;Attention is all you need&quot; implementation. It&#x27;s a world where engineering and the academic papers are very close and reproducible and a must for you to progress in the field.<p>(see also andre karpathys zero to hero nn series on youtube as well its very good and similar to this work)
评论 #39392031 未加载
评论 #39394549 未加载
评论 #39392337 未加载
评论 #39391142 未加载
cafaxoover 1 year ago
I did a similar thing for Julia: Llama2.jl contains vanilla Julia code [1] for training small Llama2-style models on the CPU.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;cafaxo&#x2F;Llama2.jl&#x2F;tree&#x2F;master&#x2F;src&#x2F;training">https:&#x2F;&#x2F;github.com&#x2F;cafaxo&#x2F;Llama2.jl&#x2F;tree&#x2F;master&#x2F;src&#x2F;training</a>
评论 #39394438 未加载
评论 #39392221 未加载
andxor_over 1 year ago
Very well written. AD is like magic and this is a good exposition on the basic building block.<p>I quite like Jeremy&#x27;s approach: <a href="https:&#x2F;&#x2F;nbviewer.org&#x2F;github&#x2F;fastai&#x2F;fastbook&#x2F;blob&#x2F;master&#x2F;17_foundations.ipynb" rel="nofollow">https:&#x2F;&#x2F;nbviewer.org&#x2F;github&#x2F;fastai&#x2F;fastbook&#x2F;blob&#x2F;master&#x2F;17_f...</a><p>It shows a very simple &quot;Pythonic&quot; approach to assemble gradient of a composition of functions from the gradients of the components.
asgrahamover 1 year ago
As a chronic premature optimizer my first reaction was, &quot;Is this even possible in vanilla python???&quot; Obviously it&#x27;s <i>possible</i>, but can you train an LLM before the heat death of the universe? A perceptron, sure, of course. A deep learning model, plausible if it&#x27;s not too deep. But a <i>large</i> language model? I.e. the kind of LLM necessary for &quot;from vanilla python to functional coding assistant.&quot;<p>But obviously the author already thought of that. The source repo has a great motto: &quot;It don&#x27;t go fast but it do be goin&#x27;&quot; [1]<p>I love the idea of the project and I&#x27;m curious to see what the endgame runtime will be.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;bclarkson-code&#x2F;Tricycle">https:&#x2F;&#x2F;github.com&#x2F;bclarkson-code&#x2F;Tricycle</a>
评论 #39389624 未加载
nqzeroover 1 year ago
is there an existing SLM that resembles an LLM in architecture that includes the code for training it ?<p>i realize the cost and time to train may be prohibitive and that quality on general english might be very limited, but is the code itself available ?
评论 #39392178 未加载
revskillover 1 year ago
The only problem is it&#x27;s implemented in Python. One reason is i hate to install python on my machine, and i don&#x27;t know how to manage dependencies. The MacOSX required to upgrade to install native stuffs. Such a hell.
ESOLprofover 1 year ago
Amazing! Thank you