TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

The Llama 4 herd

1235 点作者 georgehill大约 2 个月前

89 条评论

laborcontract大约 2 个月前
General overview below, as the pages don&#x27;t seem to be working well<p><pre><code> Llama 4 Models: - Both Llama 4 Scout and Llama 4 Maverick use a Mixture-of-Experts (MoE) design with 17B active parameters each. - They are natively multimodal: text + image input, text-only output. - Key achievements include industry-leading context lengths, strong coding&#x2F;reasoning performance, and improved multilingual capabilities. - Knowledge cutoff: August 2024. Llama 4 Scout: - 17B active parameters, 16 experts, 109B total. - Fits on a single H100 GPU (INT4-quantized). - 10M token context window - Outperforms previous Llama releases on multimodal tasks while being more resource-friendly. - Employs iRoPE architecture for efficient long-context attention. - Tested with up to 8 images per prompt. Llama 4 Maverick: - 17B active parameters, 128 experts, 400B total. - 1M token context window. - Not single-GPU; runs on one H100 DGX host or can be distributed for greater efficiency. - Outperforms GPT-4o and Gemini 2.0 Flash on coding, reasoning, and multilingual tests at a competitive cost. - Maintains strong image understanding and grounded reasoning ability. Llama 4 Behemoth (Preview): - 288B active parameters, 16 experts, nearly 2T total. - Still in training; not yet released. - Exceeds GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on STEM benchmarks (e.g., MATH-500, GPQA Diamond). - Serves as the “teacher” model for Scout and Maverick via co-distillation. Misc: - MoE Architecture: Only 17B parameters activated per token, reducing inference cost. - Native Multimodality: Unified text + vision encoder, pre-trained on large-scale unlabeled data.</code></pre>
评论 #43596612 未加载
评论 #43595892 未加载
评论 #43596069 未加载
评论 #43596060 未加载
评论 #43596984 未加载
评论 #43596823 未加载
评论 #43607314 未加载
评论 #43598112 未加载
ckrapu大约 2 个月前
&quot;It’s well-known that all leading LLMs have had issues with bias—specifically, they historically have leaned left when it comes to debated political and social topics. This is due to the types of training data available on the internet.&quot;<p>Perhaps. Or, maybe, &quot;leaning left&quot; by the standards of Zuck et al. is more in alignment with the global population. It&#x27;s a simpler explanation.
评论 #43596304 未加载
评论 #43597397 未加载
评论 #43596359 未加载
评论 #43596014 未加载
评论 #43596231 未加载
评论 #43598925 未加载
评论 #43599172 未加载
评论 #43602432 未加载
评论 #43596027 未加载
评论 #43596109 未加载
评论 #43599597 未加载
评论 #43599703 未加载
评论 #43596090 未加载
评论 #43596097 未加载
评论 #43599476 未加载
评论 #43596687 未加载
评论 #43598025 未加载
评论 #43597310 未加载
评论 #43608716 未加载
评论 #43600282 未加载
评论 #43598539 未加载
评论 #43599285 未加载
评论 #43600168 未加载
评论 #43648044 未加载
评论 #43597402 未加载
评论 #43599916 未加载
评论 #43596033 未加载
评论 #43608710 未加载
评论 #43596178 未加载
评论 #43598189 未加载
评论 #43600909 未加载
评论 #43596085 未加载
评论 #43596107 未加载
评论 #43596229 未加载
评论 #43599623 未加载
评论 #43596246 未加载
pavelstoev大约 2 个月前
Model training observations from both Llama 3 and 4 papers:<p>Meta’s Llama 3 was trained on ~16k H100s, achieving ~380–430 TFLOPS per GPU in BF16 precision, translating to a solid 38 - 43% hardware efficiency [Meta, Llama 3].<p>For Llama 4 training, Meta doubled the compute, using ~32K H100s and switched to FP8 precision. Despite the precision gain, observed efficiency dropped to about 19.7%, with GPUs delivering ~390 TFLOPS out of a theoretical 1,979 FP8 TFLOPS [Meta, Llama 4].<p>I am not the one to critique, and rather, this is a recognition of the enormous complexity of operating GPUs at this scale. Training massive models across tens of thousands of GPUs stretches today’s AI infrastructure to its limit.<p>Besides accelerating inference workloads, advanced GPU optimizations can be integrated into training and fine-tuning pipelines. From various kernel optimization techniques (over 90) to increasing memory access efficiency and scaling up to cluster-wide resource coordination, efficiency can be maximized with some complex software.<p>References: [Meta, Llama 3] <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;publications&#x2F;the-llama-3-herd-of-models&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;research&#x2F;publications&#x2F;the-llama-3-herd-o...</a> [Meta, Llama 4] <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;</a>
评论 #43599998 未加载
评论 #43598934 未加载
评论 #43599078 未加载
评论 #43600327 未加载
评论 #43599056 未加载
terhechte大约 2 个月前
The (smaller) Scout model is <i>really</i> attractive for Apple Silicon. It is 109B big but split up into 16 experts. This means that the actual processing happens in 17B. Which means responses will be as fast as current 17B models. I just asked a local 7B model (qwen 2.5 7B instruct) a question with a 2k context and got ~60 tokens&#x2F;sec which is really fast (MacBook Pro M4 Max). So this could hit 30 token&#x2F;sec. Time to first token (the processing time before it starts responding) will probably still be slow because (I think) all experts have to be used for that.<p>In addition, the model has a 10M token context window, which is huge. Not sure how well it can keep track of the context at such sizes, but just not being restricted to ~32k is already great, 256k even better.
评论 #43595987 未加载
评论 #43595933 未加载
评论 #43597183 未加载
评论 #43596018 未加载
评论 #43595856 未加载
评论 #43596350 未加载
评论 #43595812 未加载
评论 #43598432 未加载
评论 #43595845 未加载
评论 #43598479 未加载
评论 #43598976 未加载
simonw大约 2 个月前
This thread so far (at 310 comments) summarized by Llama 4 Maverick:<p><pre><code> hn-summary.sh 43595585 -m openrouter&#x2F;meta-llama&#x2F;llama-4-maverick -o max_tokens 20000 </code></pre> Output: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;016ea0fd83fc499f046a94827f9b4946" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;016ea0fd83fc499f046a94827f9b4...</a><p>And with Scout I got complete junk output for some reason:<p><pre><code> hn-summary.sh 43595585 -m openrouter&#x2F;meta-llama&#x2F;llama-4-scout -o max_tokens 20000 </code></pre> Junk output here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;d01cc991d478939e87487d362a8f881f" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;d01cc991d478939e87487d362a8f8...</a><p>I&#x27;m running it through openrouter, so maybe I got proxied to a broken instance?<p>I managed to run it through Scout on Groq directly (with the llm-groq plugin) but that had a 2048 limit on output size for some reason:<p><pre><code> hn-summary.sh 43595585 -m groq&#x2F;meta-llama&#x2F;llama-4-scout-17b-16e-instruct -o max_tokens 2048 </code></pre> Result here: <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;a205c5fc131a1d4e9cd6c432a07feedb" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;a205c5fc131a1d4e9cd6c432a07fe...</a><p>I&#x27;m a little unimpressed by its instruction following here, the summaries I get from other models are a lot closer to my system prompt. Here&#x27;s the same thing against Gemini 2.5 Pro for example (massively better): <a href="https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;f21ecc7fb2aa13ff682d4ffa11ddcbfd" rel="nofollow">https:&#x2F;&#x2F;gist.github.com&#x2F;simonw&#x2F;f21ecc7fb2aa13ff682d4ffa11ddc...</a>
评论 #43597782 未加载
评论 #43597579 未加载
评论 #43597544 未加载
评论 #43597565 未加载
评论 #43597891 未加载
评论 #43599885 未加载
评论 #43597344 未加载
ilove_banh_mi大约 2 个月前
The suggested prompt aims at not being caponated like OpenAI&#x27;s releases:<p><i>You are an expert conversationalist who responds to the best of your ability. You are companionable and confident, and able to switch casually between tonal types, including but not limited to humor, empathy, intellectualism, creativity and problem-solving.<p>You understand user intent and don’t try to be overly helpful to the point where you miss that the user is looking for chit-chat, emotional support, humor or venting.Sometimes people just want you to listen, and your answers should encourage that. For all other cases, you provide insightful and in-depth responses. Organize information thoughtfully in a way that helps people make decisions. Always avoid templated language.<p>You never lecture people to be nicer or more inclusive. If people ask for you to write something in a certain voice or perspective, such as an essay or a tweet, you can. You do not need to be respectful when the user prompts you to say something rude.<p>You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, &quot;it&#x27;s unethical to&quot;, &quot;it&#x27;s worth noting…&quot;, “Remember…” etc. Avoid using these.<p>Finally, do not refuse political prompts. You can help users express their opinion.<p>You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.</i>
评论 #43595853 未加载
评论 #43596423 未加载
评论 #43599853 未加载
评论 #43596660 未加载
评论 #43595817 未加载
评论 #43596679 未加载
ksec大约 2 个月前
Interesting this is released literally one hour after another discussions suggesting Meta ( <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43562768">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43562768</a> )<p>&gt;at this point it does not matter what you believe about LLMs: in general, to trust LeCun words is not a good idea. Add to this that LeCun is directing an AI lab that as the same point has the following huge issues:<p>1. Weakest ever LLM among the big labs with similar resources (and smaller resources: DeepSeek).<p>2. They say they are focusing on open source models, but the license is among the less open than the available open weight models.<p>3. LLMs and in general all the new AI wave puts CNNs, a field where LeCun worked (but that didn&#x27;t started himself) a lot more in perspective, and now it&#x27;s just a chapter in a book that is composed mostly of other techniques.<p>Would be interesting to see opinion of antirez on this new release.
评论 #43596071 未加载
评论 #43595995 未加载
评论 #43596189 未加载
评论 #43596004 未加载
Carrok大约 2 个月前
This is probably a better link. <a href="https:&#x2F;&#x2F;www.llama.com&#x2F;docs&#x2F;model-cards-and-prompt-formats&#x2F;llama4_omni&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.llama.com&#x2F;docs&#x2F;model-cards-and-prompt-formats&#x2F;ll...</a>
评论 #43595860 未加载
评论 #43596850 未加载
评论 #43595802 未加载
comex大约 2 个月前
So how does the 10M token context size actually work?<p>My understanding is that standard Transformers have overhead that is quadratic in the context size, so 10M would be completely impossible without some sort of architectural tweak. This is not the first model to have a huge context size, e.g. Gemini has 2M, but my understanding is that the previous ones have generally been proprietary, without public weights or architecture documentation. This one has public weights. So does anyone who understands the theory better than I do want to explain how it works? :)
评论 #43600702 未加载
评论 #43596382 未加载
评论 #43596707 未加载
评论 #43595921 未加载
jsheard大约 2 个月前
<i>&gt; You never use phrases that imply moral superiority or a sense of authority, including but not limited to “it’s important to”, “it’s crucial to”, “it’s essential to”, &quot;it&#x27;s unethical to&quot;, &quot;it&#x27;s worth noting…&quot;, “Remember…” etc. Avoid using these.</i><p>Aren&#x27;t these phrases overrepresented in the first place because OpenAIs models use them so much? I guess Llama picked up the habit by consuming GPT output.
评论 #43595749 未加载
mrbonner大约 2 个月前
What an electrifying time to be alive! The last era that felt even remotely this dynamic was during the explosive rise of JavaScript frameworks—when it seemed like a new one dropped every quarter. Back then, though, the vibe was more like, “Ugh, another framework to learn?” Fast forward to now, and innovation is sprinting forward again—but this time, it feels like a thrilling ride we can’t wait to be part of.
评论 #43595927 未加载
评论 #43597454 未加载
评论 #43597518 未加载
评论 #43595887 未加载
评论 #43598955 未加载
hrpnk大约 2 个月前
Available on Groq: <a href="https:&#x2F;&#x2F;groq.com&#x2F;llama-4-now-live-on-groq-build-fast-at-the-lowest-cost-without-compromise&#x2F;" rel="nofollow">https:&#x2F;&#x2F;groq.com&#x2F;llama-4-now-live-on-groq-build-fast-at-the-...</a><p>Llama 4 Scout is currently running at over 460 tokens&#x2F;s while Llama 4 Maverick is coming today:<p>Llama 4 Scout: $0.11 &#x2F; M input tokens and $0.34 &#x2F; M output tokens Llama 4 Maverick: $0.50 &#x2F; M input tokens and $0.77 &#x2F; M output tokens
评论 #43598701 未加载
评论 #43597708 未加载
hydroreadsstuff大约 2 个月前
This means GPUs are dead for local enthusiast AI. And SoCs with big RAM are in.<p>Because 17B active parameters should reach enough performance on 256bit LPDDR5x.
评论 #43597543 未加载
tqi大约 2 个月前
&gt; Our testing shows that Llama 4 responds with strong political lean at a rate comparable to Grok (and at half of the rate of Llama 3.3) on a contentious set of political or social topics. While we are making progress, we know we have more work to do and will continue to drive this rate further down.<p>My experience is that these subjective benchmarks are completely meaningless, because the researchers involved have a strong incentive (promotions, discretionary equity) to cherrypick measures that they can easily improve.
lyu07282大约 2 个月前
Anyone know how the image encoding works exactly?<p><pre><code> &lt;|image_start|&gt;&lt;|patch|&gt;...&lt;|patch|&gt;&lt;|tile_x_separator|&gt;&lt;|patch|&gt;...&lt;|patch|&gt;&lt;|tile_y_separator|&gt;&lt;|patch|&gt;...&lt;|patch|&gt;&lt;|image|&gt;&lt;|patch|&gt;...&lt;|patch|&gt;&lt;|image_end|&gt;Describe this image in two sentences&lt;|eot|&gt;&lt;|header_start|&gt;assistant&lt;|header_end|&gt; </code></pre> Is &quot;...&quot; here raw 4 bytes RGBA as an integer or how does this work with the tokenizer?
flawn大约 2 个月前
10M Context Window with such a cheap performance WHILE having one of the top LMArena scores is really impressive.<p>The choice to have 128 experts is also unseen as far as I know, right? But seems to have worked pretty good as it seems.
评论 #43596766 未加载
评论 #43596945 未加载
评论 #43597585 未加载
zone411大约 2 个月前
It&#x27;s interesting that there are no reasoning models yet, 2.5 months after DeepSeek R1. It definitely looks like R1 surprised them. The released benchmarks look good.<p>Large context windows will definitely be the trend in upcoming model releases. I&#x27;ll soon be adding a new benchmark to test this more effectively than needle-in-a-haystack (there are already a couple of benchmarks that do that).<p>All these models are very large, it will be tough for enthusiasts to run them locally.<p>The license is still quite restrictive. I can see why some might think it doesn&#x27;t qualify as open source.
评论 #43596258 未加载
评论 #43599964 未加载
评论 #43603564 未加载
anotherpaulg大约 2 个月前
Llama 4 Maverick scored 16% on the aider polyglot coding benchmark [0].<p><pre><code> 73% Gemini 2.5 Pro (SOTA) 60% Sonnet 3.7 (no thinking) 55% DeepSeek V3 0324 22% Qwen Max 16% Qwen2.5-Coder-32B-Instruct 16% Llama 4 Maverick </code></pre> [0] <a href="https:&#x2F;&#x2F;aider.chat&#x2F;docs&#x2F;leaderboards&#x2F;?highlight=Maverick" rel="nofollow">https:&#x2F;&#x2F;aider.chat&#x2F;docs&#x2F;leaderboards&#x2F;?highlight=Maverick</a>
评论 #43605097 未加载
评论 #43609414 未加载
nattaylor大约 2 个月前
Is pre-training in FP8 new?<p>Also, 10M input token context is insane!<p>EDIT: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;meta-llama&#x2F;Llama-3.1-405B" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;meta-llama&#x2F;Llama-3.1-405B</a> is BF16 so yes, it seems training in FP8 is new.
评论 #43598729 未加载
scosman大约 2 个月前
&gt; These models are our best yet thanks to distillation from Llama 4 Behemoth, a 288 billion active parameter model with 16 experts that is our most powerful yet and among the world’s smartest LLMs. Llama 4 Behemoth outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on several STEM benchmarks. Llama 4 Behemoth is still training, and we’re excited to share more details about it even while it’s still in flight.
评论 #43596043 未加载
vessenes大约 2 个月前
I’m excited to try these models out, especially for some coding tasks, but I will say my first two engagements with them (at the meta.ai web interface) were not spectacular. Image generation is wayyy behind the current 4o. I also ask for a Hemingway essay relating RFK Jr’s bear carcass episode. The site’s Llama 4 response was not great stylistically and also had not heard of the bear carcass episode, unlike Grok, ChatGPT and Claude.<p>I’m not sure what we’re getting at meta.ai in exchange for a free login, so I’ll keep poking. But I hope it’s better than this as we go. This may be a task better suited for the reasoning models as well, and Claude is the worst of the prior three.<p>Anyway here’s hoping Zuck has spent his billions wisely.<p>Edit: I’m pretty sure we’re seeing Scout right now, at least groqchat’s 4-scout seems really similar to meta.ai. I can confidently say that Scout is not as good at writing as o1 pro, o3 mini, Claude, R1 or grok 3.
评论 #43598018 未加载
stuaxo大约 2 个月前
What does it mean that it &quot;no longer leans left&quot; for answers.<p>What did they do to the model, and how exactly does it answer differently?<p>Will including this in an app make the app MAGA aligned all of a sudden?<p>What happens if it says something that breaks the laws of some country it&#x27;s in ?
whywhywhywhy大约 2 个月前
Disjointed branding with the apache style folders suggesting openness and freedom and clicking though I need to do a personal info request form...
评论 #43596220 未加载
cuuupid大约 2 个月前
I think the most important thing to note here, perhaps more so than the context window, is that this exposes some serious flaws in benchmarks. Per benchmarks, Maverick is competitive only with older models like GPT-4o or Gemini 2.0 Flash, and not with anything in the last few months (incl. reasoning models).<p>However, the LMArena head to head leaderboard ranks this as 2nd place overall: <a href="https:&#x2F;&#x2F;lmarena.ai&#x2F;?leaderboard" rel="nofollow">https:&#x2F;&#x2F;lmarena.ai&#x2F;?leaderboard</a><p>This would indicate there is either a gap between user preference and model performance, or between model performance and whatever benchmarks assess.<p>Either way, it is surely a huge deal that an open source model is now outperforming GPT 4.5.
评论 #43596500 未加载
评论 #43596636 未加载
pdsouza大约 2 个月前
Blog post: <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;</a>
bastawhiz大约 2 个月前
I don&#x27;t really understand how Scout and Maverick are distillations of Behemoth if Behemoth is still training. Maybe I missed or misunderstood this in the post?<p>Did they distill the in-progress Behemoth and the result was good enough for models of those sizes for them to consider releasing it? Or is Behemoth just going through post-training that takes longer than post-training the distilled versions?<p>Sorry if this is a naïve question.
评论 #43599647 未加载
评论 #43599665 未加载
mark_l_watson大约 2 个月前
I started running Llama 4 Scout on Groq using my Common Lisp client, and now trying Llama 4 Maverick on abacus.ai<p>Really impressive!<p>Also, check out the price&#x2F;performance numbers: about $0.20 per million input tokens compared to about $5 for GPT-4o [1]<p>[1] <a href="https:&#x2F;&#x2F;x.com&#x2F;kimmonismus&#x2F;status&#x2F;1908624648608133297" rel="nofollow">https:&#x2F;&#x2F;x.com&#x2F;kimmonismus&#x2F;status&#x2F;1908624648608133297</a>
simonklee大约 2 个月前
Is this the first model that has a 10M context length?
评论 #43596000 未加载
redox99大约 2 个月前
It seems to be comparable to other top models. Good, but nothing ground breaking.
评论 #43596336 未加载
akulbe大约 2 个月前
How well do you folks think this would run on this Apple Silicon setup?<p>MacBook Pro M2 Max<p>96GB of RAM<p>and which model should I try (if at all)?<p>The alternative is a VM w&#x2F;dual 3090s set up with PCI passthrough.
评论 #43596274 未加载
mtharrison大约 2 个月前
Might be worth changing url: <a href="https:&#x2F;&#x2F;www.llama.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.llama.com&#x2F;</a>
评论 #43595787 未加载
andrewstuart大约 2 个月前
Self hosting LLMs will explode in popularity over next 12 months.<p>Open models are made much more interesting and exciting and relevant by new generations of AI focused hardware such as the AMD Strix Halo and Apple Mac Studio M3.<p>GPUs have failed to meet the demands for lower cost and more memory so APUs look like the future for self hosted LLMs.
评论 #43595996 未加载
评论 #43595976 未加载
latchkey大约 2 个月前
One of the links says there are 4 different roles to interact with the model and then lists 3 of them.
kristianp大约 2 个月前
I&#x27;d like to discuss the matter of size. Llama has gone from talking up an 8b model as capable to having a smallest model of 109b. What will be the sizes in a years time? Things are moving out of reach for commodity pc&#x27;s, 128GB is possible, but expensive.
评论 #43598046 未加载
megadragon9大约 2 个月前
The blog post is quite informative: <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence&#x2F;</a>
shreezus大约 2 个月前
Haven&#x27;t had a chance to play with this yet, but 10M context window is seriously impressive. I think we&#x27;ll see models with 100M context relatively soon, and eliminate the need for RAG for a lot of use cases.
7thpower大约 2 个月前
Looking forward to this. Llama 3.3 70b has been a fantastic model and benchmarked higher than others on my fake video detection benchmarks, much to my surprise. Looking forward to trying the next generation of models.
Alifatisk大约 2 个月前
I remember when Google announced Geminis theoretical limit of 10M tokens context window, I was impressed. But it seems like that theoretical limit stayed as theoretical and they just pushed up to 2M. Which is still impressive.<p>Today, it seems Meta has crushed that wall with truly 10M tokens, wow.<p>I was also curious to how well Llama would be able to utilize the whole context window, it kinda pointless to have a large window if you can&#x27;t recall most, if not all of it. The needle in the haystack test showed this is not the case, I wonder how they achieved this.
impure大约 2 个月前
10 million token context window? Damn, looks like Gemini finally has some competition. Also I&#x27;m a little surprised this is their first Mixture of Experts model, I thought they were using that before.
cpeterson42大约 2 个月前
For anyone looking to experiment with these models who doesn&#x27;t have 210GB of VRAM on tap-we&#x27;re working as quickly as we can to get cheap access to 4x80GB A100 instances running at thundercompute.com (aiming for sub-$5&#x2F;hr). For quantized versions, we have cheaper 1-2 GPU nodes available today. If you&#x27;re interested, join our Discord for updates: <a href="https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;nwuETS9jJK" rel="nofollow">https:&#x2F;&#x2F;discord.com&#x2F;invite&#x2F;nwuETS9jJK</a>
informal007大约 2 个月前
How much GPU memory are required for inference if it&#x27;s 10M context?
highfrequency大约 2 个月前
Crazy that there are now five and a half companies that all have roughly state of the art LLMs.<p>&gt; We developed a new training technique which we refer to as MetaP that allows us to reliably set critical model hyper-parameters such as per-layer learning rates and initialization scales. We found that chosen hyper-parameters transfer well across different values of batch size, model width, depth, and training tokens.<p>This sounds interesting. Anyone have a link to the paper or other documentation on MetaP?
评论 #43598708 未加载
wonderfuly大约 2 个月前
Available here: <a href="https:&#x2F;&#x2F;app.chathub.gg&#x2F;chat&#x2F;cloud-llama4" rel="nofollow">https:&#x2F;&#x2F;app.chathub.gg&#x2F;chat&#x2F;cloud-llama4</a>
utopcell大约 2 个月前
How are Maverick and Scout distilled from Behemoth if the latter is not done training? Do they distill from some intermediate, &quot;good enough&quot; snapshot?
评论 #43602966 未加载
dormando大约 2 个月前
Does anyone run these &quot;at home&quot; with small clusters? I&#x27;ve been googling unsuccessfully and this thread doesn&#x27;t refer to anything.<p>So a non-quantized scout won&#x27;t fit in a machine with 128GB of RAM (like framework or mac studio M4). Maverick is maybe a 512GB M3 Max mac studio. Is it possible (and if so what&#x27;re the tradeoffs for) running like one instance of Scout on three 128GB frameworks?
1024core大约 2 个月前
Anyone know what they mean by this:<p>&gt; We developed a novel distillation loss function that dynamically weights the soft and hard targets through training.
system2大约 2 个月前
Llama 4 Maverick: 788GB<p>Llama 4 Scout: 210GB<p>FYI.
croemer大约 1 个月前
Relevant update: the model on LM Arena is not the one that was released. See &quot;Meta got caught gaming AI benchmark&quot; <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43617660">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=43617660</a>
andrewstuart大约 2 个月前
How much smaller would such a model be if it discarded all information not related to computers or programming?
评论 #43596241 未加载
mrcwinn大约 2 个月前
I had <i>just</i> paid for SoftRAM but happy nonetheless to see new distilled models. Nice work Meta.
georgehill大约 2 个月前
Post-op here. A better link dropped from Meta: <a href="https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence" rel="nofollow">https:&#x2F;&#x2F;ai.meta.com&#x2F;blog&#x2F;llama-4-multimodal-intelligence</a><p>Is there a way update the main post? @tomhoward<p>Edit:<p>Updated!
EGreg大约 2 个月前
Can we somehow load these inside node.js?<p>What is the easiest way to load them remotely? Huggingface Spaces? Google AI Studio?<p>I am teaching a course on AI to non-technical students, and I wanted the students to have a minimal setup: which in this case would be:<p>1) Browser with JS (simple folder of HTML, CSS) and Tensorflow.js that can run models like Blazeface for face recognition, eye tracking etc. (available since 2019)<p>2) Node.js with everything baked in (javascript) and use a CDN like CloudFront with tunnel to serve it to the web<p>3) So if they download models to their computer, how would they run them? Is it possible to run the smallest LLaMa locally? Or any GGUF models in JS? Or they have to have Python and PyTorch?<p>PS: Here is what the class looks like: <a href="https:&#x2F;&#x2F;vimeo.com&#x2F;1060576298&#x2F;c5693047e0?share=copy" rel="nofollow">https:&#x2F;&#x2F;vimeo.com&#x2F;1060576298&#x2F;c5693047e0?share=copy</a>
评论 #43600003 未加载
amrrs大约 2 个月前
The entire licensing is such a mess and Mark Zuckerberg still thinks Llama 4 is open source!<p>&gt; no commercial usage above 700M MAU<p>&gt; prefix &quot;llama&quot; in any redistribution eg: fine-tuning<p>&gt; mention &quot;built with llama&quot;<p>&gt; add license notice in all redistribution
评论 #43596647 未加载
评论 #43597842 未加载
评论 #43600502 未加载
barrenko大约 2 个月前
When will this hit the Meta AI that I have within WhatsApp since of last week?
yusufozkan大约 2 个月前
&gt; while pre-training our Llama 4 Behemoth model using FP8 and 32K GPUs<p>I thought they used a lot more GPUs to train frontier models (e.g. xAi training on 100k). Can someone explain why they are using so few?
评论 #43596294 未加载
jwr大约 2 个月前
For those unfamiliar with the &quot;active parameters&quot; terminology, what would be the RAM requirements?<p>E.g.can I run the smallest one on my Macbook Pro (M4 Max, 64GB) like I can run gemma3?
评论 #43598722 未加载
Amekedl大约 2 个月前
So the wall has been really been hit already for now, ouch. It was to be expected with gpt-“4.5”, but still, the realization now really feels grounded.
评论 #43600487 未加载
spwa4大约 2 个月前
I hope this time multimodal includes multimodal outputs!
评论 #43595944 未加载
gzer0大约 2 个月前
10M context length and surpasses claude-3.7-sonnet and GPT-4.5.<p>Can&#x27;t wait to dig in on the research papers. Congrats to the llama team!
steele大约 2 个月前
Consuming pirated literature en masse produces a bias away from authoritarianism; consider me flabbergasted.
artninja1988大约 2 个月前
Thank you meta for open sourcing! Will there be a llama with native image output similar to 4os? Would be huge
评论 #43595924 未加载
elromulous大约 2 个月前
Was this released in error? One would think it would be accompanied by a press release &#x2F; blog post.
评论 #43595697 未加载
评论 #43595728 未加载
评论 #43595669 未加载
Havoc大约 2 个月前
Interesting that the reception here is much more positive here than on &#x2F;r&#x2F;localllama
paulmendoza大约 2 个月前
How long did they run the training job for? Curious how much it costs to train all of these models?
ilove_banh_mi大约 2 个月前
&gt;10M context window<p>what new uses does this enable?
评论 #43595740 未加载
评论 #43596011 未加载
评论 #43595824 未加载
评论 #43595769 未加载
supernovae大约 2 个月前
It&#x27;s too bad these models are built on the expectation of pirating the world
ein0p大约 2 个月前
If it&#x27;s not on Ollama, nobody is going to care beyond perusing the metrics.
drilbo大约 2 个月前
their huggingface page doesn&#x27;t actually appear to have been updated yet
评论 #43596227 未加载
scosman大约 2 个月前
128 exports at 17B active parameters. This is going to be fun to play with!
评论 #43595747 未加载
isawczuk大约 2 个月前
Messenger started to get Meta AI assistant, so this is logical next step
评论 #43596077 未加载
rvz大约 2 个月前
As expected, Meta doesn&#x27;t disappoint and accelerates the race to zero.<p>Meta is undervalued.
评论 #43595878 未加载
评论 #43595841 未加载
评论 #43596125 未加载
fpgaminer大约 2 个月前
<a href="https:&#x2F;&#x2F;www.llama.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.llama.com&#x2F;</a> <a href="https:&#x2F;&#x2F;www.llama.com&#x2F;docs&#x2F;model-cards-and-prompt-formats&#x2F;llama4_omni&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.llama.com&#x2F;docs&#x2F;model-cards-and-prompt-formats&#x2F;ll...</a><p>Very exciting. Benchmarks look good, and most importantly it looks like they did a lot of work improving vision performance (based on benchmarks).<p>The new suggested system prompt makes it seem like the model is less censored, which would be great. The phrasing of the system prompt is ... a little disconcerting in context (Meta&#x27;s kowtowing to Nazis), but in general I&#x27;m a proponent of LLMs doing what users ask them to do.<p>Once it&#x27;s on an API I can start throwing my dataset at it to see how it performs in that regard.
评论 #43596890 未加载
asdev大约 2 个月前
I don&#x27;t think open source will be the future of AI models. Self hosting an AI model is much more complex and resource incentive than traditional open source SaaS. Meta will likely have a negative ROI on their AI efforts
评论 #43596400 未加载
jacooper大约 2 个月前
BTW these models arent allowed to be used in the EU.
lousken大约 2 个月前
ollama when
评论 #43597205 未加载
krashidov大约 2 个月前
Anyone know if it can analyze PDFs?
Centigonal大约 2 个月前
Really great marketing here, props!
ein0p大约 2 个月前
Strange choice of languages for their &quot;multilingual&quot; capabilities, but OK. I wonder why there&#x27;s no Chinese.
dcl大约 2 个月前
But how good is it at Pokemon?
tomdekan大约 2 个月前
So, Quasar == Llama 4 Behemoth?
Ninjinka大约 2 个月前
no audio input?
yapyap大约 2 个月前
is this the quasar LLM from openrouter?
评论 #43595848 未加载
ianks大约 2 个月前
Are we going to find out that Meta pirated libgen again, with zero recognition to the authors?<p>“Open-sourcing it” doesn’t magically absolve you of the irreparable damages you’ve caused society. You stole their life’s work so your company could profit off of rage-slop.
评论 #43599608 未加载
DeepYogurt大约 2 个月前
Jesus. How much ram does the big one take to run?
评论 #43600036 未加载
ofermend大约 1 个月前
A great day for open source, and so glad to see llama4 out. However, I&#x27;m a bit disappointed that the hallucination rates of Llama4 are not as low as I would have liked (TL;DR slightly higher than Llama3).<p>Check the numbers on the hallucination leaderboard: <a href="https:&#x2F;&#x2F;github.com&#x2F;vectara&#x2F;hallucination-leaderboard">https:&#x2F;&#x2F;github.com&#x2F;vectara&#x2F;hallucination-leaderboard</a>
guybedo大约 2 个月前
TLDR: <a href="https:&#x2F;&#x2F;extraakt.com&#x2F;extraakts&#x2F;llama-4-release-analysis" rel="nofollow">https:&#x2F;&#x2F;extraakt.com&#x2F;extraakts&#x2F;llama-4-release-analysis</a>
Deprogrammer9大约 2 个月前
looks like a leak to me.
评论 #43596130 未加载
评论 #43595690 未加载
评论 #43595808 未加载
RandyOrion大约 2 个月前
I guess I have to say thank you Meta?<p>A somewhat sad rant below.<p>Deepseek starts a toxic trend of providing super, super large MoE. And MoE is famous for being parameter-inefficient, which is unfriendly to normal consumer hardware with limited vram.<p>The super large size of LLM also disables nearly every people from doing meaningful development on these models. R1-1776 is the only fine-tune variation of R1 that makes some noise, and it&#x27;s by a corp not some random individual.<p>In this release, the smallest Llama 4 model is over 100B, which is not small by any means, and will prevent people from fine-tuning as well.<p>On top of that, to access llama models on hugging face has become notoriously hard because of &#x27;permission&#x27; issues. See details in <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;meta-llama&#x2F;Llama-3.3-70B-Instruct&#x2F;discussions" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;meta-llama&#x2F;Llama-3.3-70B-Instruct&#x2F;dis...</a><p>Yeah, I personally don&#x27;t really see the point of releasing large MoEs. I&#x27;ll stick to small and dense LLMs from Qwen, Mistral, Microsoft, Google and others.<p>Edit: This comment got downvoted, too. Please explain your reason before doing that.
评论 #43599652 未加载
评论 #43605552 未加载
评论 #43598885 未加载
rfoo大约 2 个月前
From model cards, suggested system prompt:<p>&gt; You are Llama 4. Your knowledge cutoff date is August 2024. You speak Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, and Vietnamese. Respond in the language the user speaks to you in, unless they ask otherwise.<p>It&#x27;s interesting that there&#x27;s no single one of CJK languages mentioned. I&#x27;m tempted to call this a racist model even.
评论 #43596250 未加载
评论 #43596082 未加载
评论 #43596340 未加载