科技回声

I was getting into LLM and I pick up some projects. I tried to dive into the code to see what is secret sauce.<p>But the code is so short to the point there is nothing to really read.<p>https://github.com/facebookresearch/llama<p>I then proceed to check https://github.com/mistralai/mistral-src and suprsingly it's same.<p>What is exactly those codebases? It feels like just download the models.

Most neural networks are just directed graphs, with a ton of matrix multiplies and nonlinear functions at the end of each layer. The libraries to do gradient descent, training, etc.. are all there to use. It is amazing how small the actual code is, compared to the amount of compute to do the training.

These are just the repos that provide the inference code to run the model, it requires the weights which are available via HuggingFace or in Llama 2's case, from here: <a href="https://ai.meta.com/resources/models-and-libraries/llama-downloads/" rel="nofollow noreferrer">https://ai.meta.com/resources/models-and-libraries/llama-dow...</a>

Ask HN: Why the LLaMA code base is so short

2 条评论

Ask HN: Why the LLaMA code base is so short

2 条评论