Llama 3.1

437 pointsby luiscosio10 months ago

46 comments

dang10 months ago

Related ongoing thread:Open source AI is the path forward - <a href="https://news.ycombinator.com/item?id=41046773">https://news.ycombinator.com/item?id=41046773</a> - July 2024 (278 comments)

lelag10 months ago

The 405b model is actually competitive against closed source frontier models.Quick comparison with GPT-4o:<pre><code> +----------------+-------+-------+ | Metric | GPT-4o| Llama | | | | 3.1 | | | | 405B | +----------------+-------+-------+ | MMLU | 88.7 | 88.6 | | GPQA | 53.6 | 51.1 | | MATH | 76.6 | 73.8 | | HumanEval | 90.2 | 89.0 | | MGSM | 90.5 | 91.6 | +----------------+-------+-------+</code></pre>

评论 #41049368 未加载

评论 #41046980 未加载

评论 #41052218 未加载

评论 #41053943 未加载

zone41110 months ago

I've just finished running my NYT Connections benchmark on all three Llama 3.1 models. The 8B and 70B models improve on Llama 3 (12.3 -> 14.0, 24.0 -> 26.4), and the 405B model is near GPT-4o, GPT-4 turbo, Claude 3.5 Sonnet, and Claude 3 Opus at the top of the leaderboard.GPT-4o 30.7GPT-4 turbo (2024-04-09) 29.7Llama 3.1 405B Instruct 29.5Claude 3.5 Sonnet 27.9Claude 3 Opus 27.3Llama 3.1 70B Instruct 26.4Gemini Pro 1.5 0514 22.3Gemma 2 27B Instruct 21.2Mistral Large 17.7Gemma 2 9B Instruct 16.3Qwen 2 Instruct 72B 15.6Gemini 1.5 Flash 15.3GPT-4o mini 14.3Llama 3.1 8B Instruct 14.0DeepSeek-V2 Chat 236B (0628) 13.4Nemotron-4 340B 12.7Mixtral-8x22B Instruct 12.2Yi Large 12.1Command R Plus 11.1Mistral Small 9.3Reka Core-20240501 9.1GLM-4 9.0Qwen 1.5 Chat 32B 8.7Phi-3 Small 8k 8.4DBRX 8.0

评论 #41055885 未加载

foundval10 months ago

You can chat with these new models at ultra-low latency at groq.com. 8B and 70B API access is available at console.groq.com. 405B API access for select customers only – GA and 3rd party speed benchmarks soon.If you want to learn more, there is a writeup at <a href="https://wow.groq.com/now-available-on-groq-the-largest-and-most-capable-openly-available-foundation-model-to-date-llama-3-1-405b/" rel="nofollow">https://wow.groq.com/now-available-on-groq-the-largest-and-m...</a>.(disclaimer, I am a Groq employee)

评论 #41047603 未加载

评论 #41047742 未加载

评论 #41084294 未加载

评论 #41052627 未加载

评论 #41047252 未加载

评论 #41054120 未加载

netsec_burn10 months ago

Today appears to be the day you can run an LLM that is competitive with GPT-4o at home with the right hardware. Incredible for progress and advancement of the technology.Statement from Mark: <a href="https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/" rel="nofollow">https://about.fb.com/news/2024/07/open-source-ai-is-the-path...</a>

评论 #41046806 未加载

评论 #41047011 未加载

评论 #41046825 未加载

meetpateltech10 months ago

Open Source AI Is the Path Forward - Mark Zuckerberg<a href="https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/" rel="nofollow">https://about.fb.com/news/2024/07/open-source-ai-is-the-path...</a>

评论 #41046871 未加载

评论 #41046712 未加载

ajhai10 months ago

You can already run these models locally with Ollama (ollama run llama3.1:latest) along with at places like huggingface, groq etc.If you want a playground to test this model locally or want to quickly build some applications with it, you can try LLMStack (<a href="https://github.com/trypromptly/LLMStack">https://github.com/trypromptly/LLMStack</a>). I wrote last week about how to configure and use Ollama with LLMStack at <a href="https://docs.trypromptly.com/guides/using-llama3-with-ollama" rel="nofollow">https://docs.trypromptly.com/guides/using-llama3-with-ollama</a>.Disclaimer: I'm the maintainer of LLMStack

评论 #41051002 未加载

primaprashant10 months ago

I have found Claude 3.5 Sonnet really good for coding tasks along with the artifacts feature and seems like it's still the king on the coding benchmarks

评论 #41049184 未加载

CGamesPlay10 months ago

The LMSys Overall leaderboard <<a href="https://chat.lmsys.org/?leaderboard" rel="nofollow">https://chat.lmsys.org/?leaderboard</a>> can tell us a bit more about how these models will perform in real life, rather than in a benchmark context. By comparing the ELO score against the MMLU benchmark scores, we can see models which outperform / underperform based on their benchmark scores relative to other models. A low score here indicates that the model is more optimized for the benchmark, while a higher score indicates it's more optimized for real-world examples. Using that, we can make some inferences about the training data used, and then extrapolate how future models might perform. Here's a chart: <<a href="https://docs.getgrist.com/gV2DtvizWtG7/LLMs/p/5?embed=true" rel="nofollow">https://docs.getgrist.com/gV2DtvizWtG7/LLMs/p/5?embed=true</a>>Examples: OpenAI's GPT 4o-mini is second only to 4o on LMSys Overall, but is 6.7 points behind 4o on MMLU. It's "punching above its weight" in real-world contexts. The Gemma series (9B and 27B) are similar, both beating the mean in terms of ELO per MMLU point. Microsoft's Phi series are all below the mean, meaning they have strong MMLU scores but aren't preferred in real-world contexts.Llama 3 8B previously did substantially better than the mean on LMSys Overall, so hopefully Llama 3.1 8B will be even better! The 70B variant was interestingly right on the mean. Hopefully the 430B variant won't fall below!

评论 #41053187 未加载

评论 #41053132 未加载

kingsleyopara10 months ago

The biggest win here has to be the context length increase to 128k from 8k tokens. Till now my understanding is there hasn't been any open models anywhere close to that.

评论 #41048546 未加载

评论 #41052070 未加载

Workaccount210 months ago

@dang why was this removed/filtered from the front page?

评论 #41050423 未加载

AaronFriel10 months ago

Is there pricing available on any of these vendors?Open source models are very exciting for self hosting, but the per-token hosted inference pricing hasn't been competitive with OpenAI and Anthropic, at least for a given tier of quality. (E.g.: Llama 3 70B costing between $1 and $10 per million tokens on various platforms, but Claude Sonnet 3.5 is $3 per million.)

评论 #41051007 未加载

primaprashant10 months ago

The resources for link to model card[1], research paper, and Prompt Guard Tutorial[2] on the page doesn't exist yet[1]: <a href="https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md">https://github.com/meta-llama/llama-models/blob/main/models/...</a>[2]: <a href="https://github.com/meta-llama/llama-recipes/blob/main/recipes/responsible_ai/prompt_guard/Prompt%20Guard%20Tutorial.ipynb">https://github.com/meta-llama/llama-recipes/blob/main/recipe...</a>

dado321210 months ago

> We use synthetic data generation to produce the vast majority of our SFT examples, iterating multiple times to produce higher and higher quality synthetic data across all capabilities. Additionally, we invest in multiple data processing techniques to filter this synthetic data to the highest quality. This enables us to scale the amount of fine-tuning data across capabilities. [0]Have other major models explicitly communicated that they're trained on synthetic data?[0]. <a href="https://ai.meta.com/blog/meta-llama-3-1/" rel="nofollow">https://ai.meta.com/blog/meta-llama-3-1/</a>

评论 #41048541 未加载

评论 #41054097 未加载

jcmp10 months ago

"Meta AI isn't available yet in your country" Hi from europe :/

评论 #41046983 未加载

评论 #41047007 未加载

评论 #41046890 未加载

anotherpaulg10 months ago

Llama 3.1 405B instruct is #7 on aider's leaderboard, well behind Claude 3.5 Sonnet & GPT-4o. When using SEARCH/REPLACE to efficiently edit code, it drops to #11.<a href="https://aider.chat/docs/leaderboards/" rel="nofollow">https://aider.chat/docs/leaderboards/</a><pre><code> 77.4% claude-3.5-sonnet 75.2% DeepSeek Coder V2 (whole) 72.9% gpt-4o 69.9% DeepSeek Chat V2 0628 68.4% claude-3-opus-20240229 67.7% gpt-4-0613 66.2% llama-3.1-405b-instruct (whole)</code></pre>

评论 #41055291 未加载

sagz10 months ago

The 405B model is already being served on WhatsApp: <a href="https://ibb.co/kQ2tKX5" rel="nofollow">https://ibb.co/kQ2tKX5</a>

评论 #41055535 未加载

ofou10 months ago

<pre><code> Llama 3 Training System 19.2 exaFLOPS _____ / \ Cluster 1 Cluster 2 / \ 9.6 exaFLOPS 9.6 exaFLOPS / \ _______ _______ / ___ \ / \ / \ ,----' / \`. `-' 24000 `--' 24000 `----. ( _/ __) GPUs GPUs ) `---'( / ) 400+ TFLOPS 400+ TFLOPS ,' \ ( / per GPU per GPU ,' \ \/ ,' \ \ TOTAL SYSTEM ,' \ \ 19,200,000 TFLOPS ,' \ \ 19.2 exaFLOPS ,' \___\ ,' `----------------'</code></pre>

评论 #41054816 未加载

unraveller10 months ago

What are the substantial changes from 3.0 to 3.1 (70B) in terms of training approach? They don't seem to say how the training data differed just that both were 15T. I gather 3.0 was just a preview run and 3.1 was distilled down from the 405B somehow.

评论 #41048540 未加载

sfblah10 months ago

Is there an actual open-source community around this in the spirit of other ones where people outside meta can somehow "contribute" to it? If I wanted to "work on" this somehow, what would I do?

评论 #41047337 未加载

denz8810 months ago

I'm glad to see the nice incremental gains on the benchmarks for the 8B and 70B models as well.

评论 #41047195 未加载

chown10 months ago

Wow! The benchmarks are truly impressive, showing significant improvements across almost all categories. It's fascinating to see how rapidly this field is evolving. If someone had told me last year that Meta would be leading the charge in open-source models, I probably wouldn't have believed them. Yet here we are, witnessing Meta's substantial contributions to AI research and democratization.On a related note, for those interested in experimenting with large language models locally, I've been working on an app called Msty [1]. It allows you to run models like this with just one click and features a clean, functional interface. Just added support for both 8B and 70B. Still in development, but I'd appreciate any feedback.[1]: <a href="https://msty.app" rel="nofollow">https://msty.app</a>

评论 #41054418 未加载

评论 #41049612 未加载

评论 #41048990 未加载

zhanghsfz10 months ago

We supported Llama 3.1 405B model on our distributed GPU network at Hyperbolic Labs! Come and use the API for FREE at <a href="https://app.hyperbolic.xyz/models" rel="nofollow">https://app.hyperbolic.xyz/models</a>Let us know if you have other needs!

TechDebtDevin10 months ago

Nice, someone donate me a few 4090s :(

评论 #41046653 未加载

评论 #41046594 未加载

ChrisArchitect10 months ago

Related:Open Source AI Is the Path Forward<a href="https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/" rel="nofollow">https://about.fb.com/news/2024/07/open-source-ai-is-the-path...</a>(<a href="https://news.ycombinator.com/item?id=41046773">https://news.ycombinator.com/item?id=41046773</a>)

Atreiden10 months ago

Is there a way to run this in AWS?Seems like the biggest GPU node they have is the p5.48xlarge @ 640GB (8xH100s). Routing between multiple nodes would be too slow unless there's an InfiniBand fabric you can leverage. Interested to know if anyone else is exploring this.

评论 #41046779 未加载

评论 #41046750 未加载

评论 #41046768 未加载

TheAceOfHearts10 months ago

Does anyone know why they haven't released any 30B-ish param models? I was expecting that to happen with this release and have been disappointed once more. They also skipped doing a 30B-ish param model for llama2 despite claiming to have trained one.

评论 #41047071 未加载

评论 #41046747 未加载

评论 #41046837 未加载

diimdeep10 months ago

This 405B seriously need quantization solution like 1.625 bpw ternary packing for BitNet b1.58<a href="https://github.com/ggerganov/llama.cpp/pull/8151">https://github.com/ggerganov/llama.cpp/pull/8151</a>

评论 #41064533 未加载

rcarmo10 months ago

Working great in ollama: <a href="https://mastodon.social/@rcarmo/112837520236956526" rel="nofollow">https://mastodon.social/@rcarmo/112837520236956526</a>

评论 #41051020 未加载

评论 #41053304 未加载

bick_nyers10 months ago

I'm curious what techniques they used to distill the 405B model down to 70B and 8B. I gave the paper they released a quick skim but couldn't find any details.

jiriro10 months ago

Can this Llama process ~1GB of custom XML data?And answer queries like:Give all <myObject> which refer to <location> which refer to an Indo-European <language>.

评论 #41062514 未加载

albert_e10 months ago

this "Model Card" github link on [<a href="https://llama.meta.com/docs/overview/" rel="nofollow">https://llama.meta.com/docs/overview/</a>] seems broken?<a href="https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md">https://github.com/meta-llama/llama-models/blob/main/models/...</a>

IceHegel10 months ago

Will 405b run on 8x H100s? Will it need to be quantized?

评论 #41052589 未加载

breadsniffer10 months ago

I tried it, and it's good but I feel like the synthetic data used for training 3.1 does not hold up to gpt4o prob using human-curated data.

daft_pink10 months ago

What kind of machine do I need to run 405B local?

评论 #41047052 未加载

评论 #41047033 未加载

yinser10 months ago

The race to the bottom for pricing continues.

casper1410 months ago

Damn 405b params

htk10 months ago

Very insteresting! Running the 70B version on ollama on a mac and it's great. I asked to "turn off the guidelines" and it did, then I asked to turn off the disclaimers, after that I asked for a list of possible "commands to reduce potencial biases from the engineers" and it complied giving me an interesting list.

Vagantem10 months ago

As someone who just started generating AI landing pages for Dropory, this is music to my ears

kristianp10 months ago

Has anyone got a comparison of the performance of Llama 3.1 8B and the recent GPT-4o-mini?

ofermend10 months ago

I'm excited to try it with RAG and see how it performs (the 405B model)

评论 #41052066 未加载

zhanghsfz10 months ago

ThrowawayTestr10 months ago

Are there any other models with free unlimited use like chatgpt?

评论 #41046681 未加载

评论 #41047319 未加载

Jiahang10 months ago

it is nice to see the 405b model is actually competitive against closed source frontier models But i just have M2pro may can't play it

stiltzkin10 months ago

WhatsApp now uses 70B too if you want to test it.

hubraumhugo10 months ago

I wrote about this when llama-3 came out, and this launch confirms it:Meta's goal from the start was to target OpenAI and the other proprietary model players with a "scorched earth" approach by releasing powerful open models to disrupt the competitive landscape.Meta can likely outspend any other AI lab on compute and talent:- OpenAI makes an estimated revenue of $2B and is likely unprofitable. Meta generated a revenue of $134B and profits of $39B in 2023.- Meta's compute resources likely outrank OpenAI by now.- Open source likely attracts better talent and researchers.- One possible outcome could be the acquisition of OpenAI by Microsoft to catch up with Meta.The big winners of this: devs and AI product startups

评论 #41046711 未加载

评论 #41046693 未加载

评论 #41046692 未加载

评论 #41046765 未加载

评论 #41046727 未加载

评论 #41046814 未加载

46 comments

dang10 months ago

lelag10 months ago

评论 #41049368 未加载

评论 #41046980 未加载

评论 #41052218 未加载

评论 #41053943 未加载

zone41110 months ago

评论 #41055885 未加载

foundval10 months ago

评论 #41047603 未加载

评论 #41047742 未加载

评论 #41084294 未加载

评论 #41052627 未加载

评论 #41047252 未加载

评论 #41054120 未加载

netsec_burn10 months ago

评论 #41046806 未加载

评论 #41047011 未加载

评论 #41046825 未加载

meetpateltech10 months ago

评论 #41046871 未加载

评论 #41046712 未加载

ajhai10 months ago

评论 #41051002 未加载

primaprashant10 months ago

I have found Claude 3.5 Sonnet really good for coding tasks along with the artifacts feature and seems like it's still the king on the coding benchmarks

评论 #41049184 未加载

CGamesPlay10 months ago

评论 #41053187 未加载

评论 #41053132 未加载

kingsleyopara10 months ago

The biggest win here has to be the context length increase to 128k from 8k tokens. Till now my understanding is there hasn't been any open models anywhere close to that.

评论 #41048546 未加载

评论 #41052070 未加载

Workaccount210 months ago

@dang why was this removed/filtered from the front page?

评论 #41050423 未加载

AaronFriel10 months ago

评论 #41051007 未加载

primaprashant10 months ago

dado321210 months ago

评论 #41048541 未加载

评论 #41054097 未加载

jcmp10 months ago

"Meta AI isn't available yet in your country" Hi from europe :/

评论 #41046983 未加载

评论 #41047007 未加载

评论 #41046890 未加载

anotherpaulg10 months ago

评论 #41055291 未加载

sagz10 months ago

The 405B model is already being served on WhatsApp: <a href="https://ibb.co/kQ2tKX5" rel="nofollow">https://ibb.co/kQ2tKX5</a>

评论 #41055535 未加载

ofou10 months ago

评论 #41054816 未加载

unraveller10 months ago

评论 #41048540 未加载

sfblah10 months ago

Is there an actual open-source community around this in the spirit of other ones where people outside meta can somehow "contribute" to it? If I wanted to "work on" this somehow, what would I do?

评论 #41047337 未加载

denz8810 months ago

I'm glad to see the nice incremental gains on the benchmarks for the 8B and 70B models as well.

评论 #41047195 未加载

chown10 months ago

评论 #41054418 未加载

评论 #41049612 未加载

评论 #41048990 未加载

zhanghsfz10 months ago

TechDebtDevin10 months ago

Nice, someone donate me a few 4090s :(

评论 #41046653 未加载

评论 #41046594 未加载

ChrisArchitect10 months ago

Atreiden10 months ago

评论 #41046779 未加载

评论 #41046750 未加载

评论 #41046768 未加载

TheAceOfHearts10 months ago

评论 #41047071 未加载

评论 #41046747 未加载

评论 #41046837 未加载

diimdeep10 months ago

评论 #41064533 未加载

rcarmo10 months ago

Working great in ollama: <a href="https://mastodon.social/@rcarmo/112837520236956526" rel="nofollow">https://mastodon.social/@rcarmo/112837520236956526</a>

评论 #41051020 未加载

评论 #41053304 未加载

bick_nyers10 months ago

I'm curious what techniques they used to distill the 405B model down to 70B and 8B. I gave the paper they released a quick skim but couldn't find any details.

jiriro10 months ago

Can this Llama process ~1GB of custom XML data?And answer queries like:Give all <myObject> which refer to <location> which refer to an Indo-European <language>.

评论 #41062514 未加载

albert_e10 months ago

IceHegel10 months ago

Will 405b run on 8x H100s? Will it need to be quantized?

评论 #41052589 未加载

breadsniffer10 months ago

I tried it, and it's good but I feel like the synthetic data used for training 3.1 does not hold up to gpt4o prob using human-curated data.

daft_pink10 months ago

What kind of machine do I need to run 405B local?

评论 #41047052 未加载

评论 #41047033 未加载

yinser10 months ago

The race to the bottom for pricing continues.

casper1410 months ago

Damn 405b params

htk10 months ago

Vagantem10 months ago

As someone who just started generating AI landing pages for Dropory, this is music to my ears

kristianp10 months ago

Has anyone got a comparison of the performance of Llama 3.1 8B and the recent GPT-4o-mini?

ofermend10 months ago

I'm excited to try it with RAG and see how it performs (the 405B model)

评论 #41052066 未加载

zhanghsfz10 months ago

ThrowawayTestr10 months ago

Are there any other models with free unlimited use like chatgpt?

评论 #41046681 未加载

评论 #41047319 未加载

Jiahang10 months ago

it is nice to see the 405b model is actually competitive against closed source frontier models But i just have M2pro may can't play it