TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Llama 4 Now Live on Groq

109 pointsby gokabout 1 month ago

11 comments

Game_Enderabout 1 month ago
To help those who got a bit confused (like me) this Groq the company making accelerators designed specifically for LLM&#x27;s that they call LPUs (Language Process Units) [0]. So they want to sell you their custom machines that, while expensive, will be much more efficient at running LLMs for you. While there is also Grok [0] which is xAI&#x27;s series of LLMs and competes with ChatGPT and other models like Claude and DeepSeek.<p>EDIT - Seems that Groq has stopped selling their chips and now will only partner to fund large build outs of their cloud [2].<p>0 - <a href="https:&#x2F;&#x2F;groq.com&#x2F;the-groq-lpu-explained&#x2F;" rel="nofollow">https:&#x2F;&#x2F;groq.com&#x2F;the-groq-lpu-explained&#x2F;</a><p>1 - <a href="https:&#x2F;&#x2F;grok.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;grok.com&#x2F;</a><p>2 - <a href="https:&#x2F;&#x2F;www.eetimes.com&#x2F;groq-ceo-we-no-longer-sell-hardware" rel="nofollow">https:&#x2F;&#x2F;www.eetimes.com&#x2F;groq-ceo-we-no-longer-sell-hardware</a>
评论 #43596982 未加载
评论 #43597333 未加载
评论 #43596899 未加载
评论 #43597096 未加载
评论 #43596846 未加载
评论 #43596807 未加载
评论 #43596925 未加载
simonwabout 1 month ago
It&#x27;s live on Groq, Together and Fireworks now.<p>All three of those can also be accessed via OpenRouter - with both a chat interface and an API:<p>- Scout: <a href="https:&#x2F;&#x2F;openrouter.ai&#x2F;meta-llama&#x2F;llama-4-scout" rel="nofollow">https:&#x2F;&#x2F;openrouter.ai&#x2F;meta-llama&#x2F;llama-4-scout</a><p>- Maverick: <a href="https:&#x2F;&#x2F;openrouter.ai&#x2F;meta-llama&#x2F;llama-4-maverick" rel="nofollow">https:&#x2F;&#x2F;openrouter.ai&#x2F;meta-llama&#x2F;llama-4-maverick</a><p>Scout claims a 10 million input token length but the available providers currently seem to limit to 128,000 (Groq and Fireworks) or 328,000 (Together) - I wonder who will win the race to get that full sized 10 million token window running?<p>Maverick claims 1 million and Fireworks offers 1.05M while Together offers 524,000. Groq isn&#x27;t offering Maverick yet
评论 #43596825 未加载
parhamnabout 1 month ago
I might be biased by the products I&#x27;m building but it feels to me that function support is table stakes now? Are open source models are just missing the dataset to fine tune one?<p>Very few of the models supported on Groq&#x2F;Together&#x2F;Fireworks support function calling. And rarely the interesting ones (DeepSeek V3, large llamas, etc)
评论 #43597408 未加载
评论 #43597030 未加载
minimaxirabout 1 month ago
Although Llama 4 is too big for mere mortals to run without many caveats, the economics of call a dedicated-hosting Llama 4 are more interesting than expected.<p>$0.11 per 1M tokens, a 10 million content window (not yet implemented in Groq), and faster inference due to fewer activated parameters allows for some specific applications that were not cost-feasible to be done with GPT-4o&#x2F;Claude 3.7 Sonnet. That&#x27;s all dependent on whether the quality of Llama 4 is as advertised, of course, particularly around that 10M context window.
评论 #43597128 未加载
评论 #43597107 未加载
greeneggsabout 1 month ago
FYI, the last sentence, &quot;Start building today on GroqCloud – sign up for free access here…&quot; links to <a href="https:&#x2F;&#x2F;conosle.groq.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;conosle.groq.com&#x2F;</a> (instead of &quot;console&quot;)
评论 #43596821 未加载
vessenesabout 1 month ago
Just tried this thank you. Couple qs - looked like just scout access for now, do you have plans for larger model access? Also, seems like context length is always fairly short with you guys, is that architectural or cost-based decisions?
评论 #43597368 未加载
sinababout 1 month ago
I got an error when passing a prompt with about 20k tokens to the Llama 4 Scout model on groq (despite Llama 4 supporting up to 10M token context). groq responds with a POST <a href="https:&#x2F;&#x2F;api.groq.com&#x2F;openai&#x2F;v1&#x2F;chat&#x2F;completions" rel="nofollow">https:&#x2F;&#x2F;api.groq.com&#x2F;openai&#x2F;v1&#x2F;chat&#x2F;completions</a> 413 (Payload Too Large) error.<p>Is there some technical limitation on the context window size with LPUs or is this a temporary stop-gap measure to avoid overloading groq&#x27;s resources? Or something else?
jasonjmcgheeabout 1 month ago
Seems to be about 500 tk&#x2F;s. That&#x27;s actually significantly less than I expected &#x2F; hoped for, but fantastic compared to nearly anything else. (specdec when?)<p>Out of curiosity, the console is letting me set max output tokens to 131k but errors above 8192. what&#x27;s the max intended to be? (8192 max output tokens would be rough after getting spoiled with 128K output of Claude 3.7 Sonnet and 64K of gemini models.)
评论 #43597379 未加载
growdarkabout 1 month ago
Would it be realistic to buy and self-host the hardware to run, for example, the latest Llama 4 models, assuming a budget of less than $500,000?
评论 #43597085 未加载
评论 #43597286 未加载
评论 #43597117 未加载
评论 #43596931 未加载
geor9eabout 1 month ago
I&#x27;m glad I saw this because llama-3.3-70b-versatile just stopped working in my app. I switched it to meta-llama&#x2F;llama-4-scout-17b-16e-instruct and it started working again. Maybe groq stopped supporting the old one?
imcriticabout 1 month ago
All I get is {&quot;error&quot;:{&quot;message&quot;:&quot;Not Found&quot;}}
评论 #43597386 未加载