科技回声

13 条评论

ffriend大约 1 年前

It's also worth mentioning that the original implementation by Meta is only 300 lines of very readable code [1].[1]: <a href="https://github.com/meta-llama/llama3/blob/main/llama/model.py">https://github.com/meta-llama/llama3/blob/main/llama/model.p...</a>

评论 #40384232 未加载

评论 #40382742 未加载

评论 #40386480 未加载

评论 #40384031 未加载

评论 #40385113 未加载

评论 #40382656 未加载

joennlae大约 1 年前

Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)<a href="https://github.com/joennlae/tensorli">https://github.com/joennlae/tensorli</a>

评论 #40385146 未加载

buildbot大约 1 年前

Cool, instant cuda acceleration via cupy! `import cupy as np`

lnyan大约 1 年前

`import jax.numpy as np`, then we also get a jax implemention after certain modifications: e.g. remove in-place index assignment, replace unsupported functions, etc

评论 #40382090 未加载

评论 #40381005 未加载

rhdunn大约 1 年前

From the TinyStories dataset card [1] the dataset is generated by GPT-3.5 and GPT-4. Reading the discussions in the community tab [2] it looks like there are a lot of incomplete or misspelled words, incorrect grammar, and even Chinese characters in the dataset.As such, I'd be weary of using that dataset to train or evaluate models.[1] <a href="https://huggingface.co/datasets/roneneldan/TinyStories" rel="nofollow">https://huggingface.co/datasets/roneneldan/TinyStories</a>[2] <a href="https://huggingface.co/datasets/roneneldan/TinyStories/discussions" rel="nofollow">https://huggingface.co/datasets/roneneldan/TinyStories/discu...</a>

评论 #40381845 未加载

dang大约 1 年前

We changed the URL from <a href="https://github.com/likejazz/llama3.np">https://github.com/likejazz/llama3.np</a> to the article it points to, which gives more background.

AI_hacker大约 1 年前

How does the performance of llama3.np compare to other implementations, especially considering it's a pure NumPy implementation?

johndough大约 1 年前

What is the difference to the llama.np repository credited in the README? <a href="https://github.com/hscspring/llama.np">https://github.com/hscspring/llama.np</a>

评论 #40380021 未加载

kolinko大约 1 年前

Obligatory Recmo’s Llama1 implementation in numpy :)<a href="https://github.com/recmo/cria">https://github.com/recmo/cria</a>

Scene_Cast2大约 1 年前

The rotary embeddings bit is neat. I wonder if a complex representation would simplify vs complexify things (readability, performance, expressive power).

评论 #40379796 未加载

评论 #40380138 未加载

threatripper大约 1 年前

> np.sin(freqs)Didn't we drop 2 pi somewhere?

xchip大约 1 年前

Nice but the tricky part is the training data.

评论 #40383365 未加载

评论 #40382017 未加载

ulam2大约 1 年前

I'll consider superintelligence achieved if AI can do such work faithfully.

评论 #40380444 未加载

评论 #40379822 未加载

13 条评论

ffriend大约 1 年前

评论 #40384232 未加载

评论 #40382742 未加载

评论 #40386480 未加载

评论 #40384031 未加载

评论 #40385113 未加载

评论 #40382656 未加载

joennlae大约 1 年前

Trainable Llama-like transformer (with backpropagation) in numpy only (~600 lines)<a href="https://github.com/joennlae/tensorli">https://github.com/joennlae/tensorli</a>

评论 #40385146 未加载

buildbot大约 1 年前

Cool, instant cuda acceleration via cupy! `import cupy as np`

lnyan大约 1 年前

`import jax.numpy as np`, then we also get a jax implemention after certain modifications: e.g. remove in-place index assignment, replace unsupported functions, etc

评论 #40382090 未加载

评论 #40381005 未加载

rhdunn大约 1 年前

评论 #40381845 未加载

dang大约 1 年前

We changed the URL from <a href="https://github.com/likejazz/llama3.np">https://github.com/likejazz/llama3.np</a> to the article it points to, which gives more background.

AI_hacker大约 1 年前

How does the performance of llama3.np compare to other implementations, especially considering it's a pure NumPy implementation?

johndough大约 1 年前

What is the difference to the llama.np repository credited in the README? <a href="https://github.com/hscspring/llama.np">https://github.com/hscspring/llama.np</a>

评论 #40380021 未加载

kolinko大约 1 年前

Obligatory Recmo’s Llama1 implementation in numpy :)<a href="https://github.com/recmo/cria">https://github.com/recmo/cria</a>

Scene_Cast2大约 1 年前

The rotary embeddings bit is neat. I wonder if a complex representation would simplify vs complexify things (readability, performance, expressive power).

Llama 3 implemented in pure NumPy

13 条评论

Llama 3 implemented in pure NumPy

13 条评论