TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama 3 implemented in pure NumPy

476 pointsby orixilusabout 1 year ago

13 comments

ffriendabout 1 year ago
It&#x27;s also worth mentioning that the original implementation by Meta is only 300 lines of very readable code [1].<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;meta-llama&#x2F;llama3&#x2F;blob&#x2F;main&#x2F;llama&#x2F;model.py">https:&#x2F;&#x2F;github.com&#x2F;meta-llama&#x2F;llama3&#x2F;blob&#x2F;main&#x2F;llama&#x2F;model.p...</a>
评论 #40384232 未加载
评论 #40382742 未加载
评论 #40386480 未加载
评论 #40384031 未加载
评论 #40385113 未加载
评论 #40382656 未加载
joennlaeabout 1 year ago
Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;joennlae&#x2F;tensorli">https:&#x2F;&#x2F;github.com&#x2F;joennlae&#x2F;tensorli</a>
评论 #40385146 未加载
buildbotabout 1 year ago
Cool, instant cuda acceleration via cupy! `import cupy as np`
lnyanabout 1 year ago
`import jax.numpy as np`, then we also get a jax implemention after certain modifications: e.g. remove in-place index assignment, replace unsupported functions, etc
评论 #40382090 未加载
评论 #40381005 未加载
rhdunnabout 1 year ago
From the TinyStories dataset card [1] the dataset is generated by GPT-3.5 and GPT-4. Reading the discussions in the community tab [2] it looks like there are a lot of incomplete or misspelled words, incorrect grammar, and even Chinese characters in the dataset.<p>As such, I&#x27;d be weary of using that dataset to train or evaluate models.<p>[1] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories</a><p>[2] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories&#x2F;discussions" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories&#x2F;discu...</a>
评论 #40381845 未加载
dangabout 1 year ago
We changed the URL from <a href="https:&#x2F;&#x2F;github.com&#x2F;likejazz&#x2F;llama3.np">https:&#x2F;&#x2F;github.com&#x2F;likejazz&#x2F;llama3.np</a> to the article it points to, which gives more background.
AI_hackerabout 1 year ago
How does the performance of llama3.np compare to other implementations, especially considering it&#x27;s a pure NumPy implementation?
johndoughabout 1 year ago
What is the difference to the llama.np repository credited in the README? <a href="https:&#x2F;&#x2F;github.com&#x2F;hscspring&#x2F;llama.np">https:&#x2F;&#x2F;github.com&#x2F;hscspring&#x2F;llama.np</a>
评论 #40380021 未加载
kolinkoabout 1 year ago
Obligatory Recmo’s Llama1 implementation in numpy :)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;recmo&#x2F;cria">https:&#x2F;&#x2F;github.com&#x2F;recmo&#x2F;cria</a>
Scene_Cast2about 1 year ago
The rotary embeddings bit is neat. I wonder if a complex representation would simplify vs complexify things (readability, performance, expressive power).
评论 #40379796 未加载
评论 #40380138 未加载
threatripperabout 1 year ago
&gt; np.sin(freqs)<p>Didn&#x27;t we drop 2 pi somewhere?
xchipabout 1 year ago
Nice but the tricky part is the training data.
评论 #40383365 未加载
评论 #40382017 未加载
ulam2about 1 year ago
I&#x27;ll consider superintelligence achieved if AI can do such work faithfully.
评论 #40380444 未加载
评论 #40379822 未加载