TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Llama 3 implemented in pure NumPy

476 点作者 orixilus大约 1 年前

13 条评论

ffriend大约 1 年前
It&#x27;s also worth mentioning that the original implementation by Meta is only 300 lines of very readable code [1].<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;meta-llama&#x2F;llama3&#x2F;blob&#x2F;main&#x2F;llama&#x2F;model.py">https:&#x2F;&#x2F;github.com&#x2F;meta-llama&#x2F;llama3&#x2F;blob&#x2F;main&#x2F;llama&#x2F;model.p...</a>
评论 #40384232 未加载
评论 #40382742 未加载
评论 #40386480 未加载
评论 #40384031 未加载
评论 #40385113 未加载
评论 #40382656 未加载
joennlae大约 1 年前
Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;joennlae&#x2F;tensorli">https:&#x2F;&#x2F;github.com&#x2F;joennlae&#x2F;tensorli</a>
评论 #40385146 未加载
buildbot大约 1 年前
Cool, instant cuda acceleration via cupy! `import cupy as np`
lnyan大约 1 年前
`import jax.numpy as np`, then we also get a jax implemention after certain modifications: e.g. remove in-place index assignment, replace unsupported functions, etc
评论 #40382090 未加载
评论 #40381005 未加载
rhdunn大约 1 年前
From the TinyStories dataset card [1] the dataset is generated by GPT-3.5 and GPT-4. Reading the discussions in the community tab [2] it looks like there are a lot of incomplete or misspelled words, incorrect grammar, and even Chinese characters in the dataset.<p>As such, I&#x27;d be weary of using that dataset to train or evaluate models.<p>[1] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories</a><p>[2] <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories&#x2F;discussions" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;datasets&#x2F;roneneldan&#x2F;TinyStories&#x2F;discu...</a>
评论 #40381845 未加载
dang大约 1 年前
We changed the URL from <a href="https:&#x2F;&#x2F;github.com&#x2F;likejazz&#x2F;llama3.np">https:&#x2F;&#x2F;github.com&#x2F;likejazz&#x2F;llama3.np</a> to the article it points to, which gives more background.
AI_hacker大约 1 年前
How does the performance of llama3.np compare to other implementations, especially considering it&#x27;s a pure NumPy implementation?
johndough大约 1 年前
What is the difference to the llama.np repository credited in the README? <a href="https:&#x2F;&#x2F;github.com&#x2F;hscspring&#x2F;llama.np">https:&#x2F;&#x2F;github.com&#x2F;hscspring&#x2F;llama.np</a>
评论 #40380021 未加载
kolinko大约 1 年前
Obligatory Recmo’s Llama1 implementation in numpy :)<p><a href="https:&#x2F;&#x2F;github.com&#x2F;recmo&#x2F;cria">https:&#x2F;&#x2F;github.com&#x2F;recmo&#x2F;cria</a>
Scene_Cast2大约 1 年前
The rotary embeddings bit is neat. I wonder if a complex representation would simplify vs complexify things (readability, performance, expressive power).
评论 #40379796 未加载
评论 #40380138 未加载
threatripper大约 1 年前
&gt; np.sin(freqs)<p>Didn&#x27;t we drop 2 pi somewhere?
xchip大约 1 年前
Nice but the tricky part is the training data.
评论 #40383365 未加载
评论 #40382017 未加载
ulam2大约 1 年前
I&#x27;ll consider superintelligence achieved if AI can do such work faithfully.
评论 #40380444 未加载
评论 #40379822 未加载