Groq runs Mixtral 8x7B-32k with 500 T/s

847 点作者 tin7in大约 1 年前

82 条评论

eigenvalue大约 1 年前

I just want to say that this is one of the most impressive tech demos I’ve ever seen in my life, and I love that it’s truly an open demo that anyone can try without even signing up for an account or anything like that. It’s surreal to see the thing spitting out tokens at such a crazy rate when you’re used to watching them generate at one less than one fifth that speed. I’m surprised you guys haven’t been swallowed up by Microsoft, Apple, or Google already for a huge premium.

评论 #39432727 未加载

评论 #39453110 未加载

评论 #39435049 未加载

评论 #39446199 未加载

评论 #39435583 未加载

评论 #39433118 未加载

评论 #39432980 未加载

treesciencebot大约 1 年前

The main problem with the Groq LPUs is, they don't have any HBM on them at all. Just a miniscule (230 MiB) [0] amount of ultra-fast SRAM (20x faster than HBM3, just to be clear). Which means you need ~256 LPUs (4 full server racks of compute, each unit on the rack contains 8x LPUs and there are 8x of those units on a single rack) just to serve a single model [1] where as you can get a single H200 (1/256 of the server rack density) and serve these models reasonably well.It might work well if you have a single model with lots of customers, but as soon as you need more than a single model and a lot of finetunes/high rank LoRAs etc., these won't be usable. Or for any on-prem deployment since the main advantage is consolidating people to use the same model, together.[0]: <a href="https://wow.groq.com/groqcard-accelerator/" rel="nofollow">https://wow.groq.com/groqcard-accelerator/</a>[1]: <a href="https://twitter.com/tomjaguarpaw/status/1759615563586744334" rel="nofollow">https://twitter.com/tomjaguarpaw/status/1759615563586744334</a>

评论 #39432163 未加载

评论 #39432384 未加载

评论 #39432119 未加载

评论 #39432033 未加载

评论 #39432974 未加载

评论 #39432364 未加载

评论 #39432517 未加载

karpathy大约 1 年前

Very impressive looking! Just wanted to caution it's worth being a bit skeptical without benchmarks as there are a number of ways to cut corners. One prominent example is heavy model quantization, which speeds up the model at a cost of model quality. Otherwise I'd love to see LLM tok/s progress exactly like CPU instructions/s did a few decades ago.

评论 #39431888 未加载

评论 #39434543 未加载

评论 #39432333 未加载

评论 #39434378 未加载

评论 #39432687 未加载

评论 #39431770 未加载

评论 #39432291 未加载

tome大约 1 年前

Hi folks, I work for Groq. Feel free to ask me any questions.(If you check my HN post history you'll see I post a lot about Haskell. That's right, part of Groq's compilation pipeline is written in Haskell!)

评论 #39431789 未加载

评论 #39431470 未加载

评论 #39432794 未加载

评论 #39431088 未加载

评论 #39429162 未加载

评论 #39431433 未加载

评论 #39493029 未加载

评论 #39432013 未加载

评论 #39432040 未加载

评论 #39431757 未加载

评论 #39432587 未加载

评论 #39432758 未加载

评论 #39435181 未加载

评论 #39431052 未加载

评论 #39431513 未加载

评论 #39433099 未加载

评论 #39433119 未加载

评论 #39580445 未加载

评论 #39433247 未加载

评论 #39432943 未加载

评论 #39430924 未加载

评论 #39431431 未加载

imiric大约 1 年前

Impressive demo!However, the hardware requirements and cost make this inaccessible for anyone but large companies. When do you envision that the price could be affordable for hobbyists?Also, while the CNN Vapi demo was impressive as well, a few weeks ago here[1] someone shared <a href="https://smarterchild.chat/" rel="nofollow">https://smarterchild.chat/</a>. That also has _very_ low audio latency, making natural conversation possible. From that discussion it seems that <a href="https://www.sindarin.tech/" rel="nofollow">https://www.sindarin.tech/</a> is behind it. Do we know if they use Groq LPUs or something else?I think that once you reach ~50 t/s, real-time interaction is possible. Anything higher than that is useful for generating large volumes of data quickly, but there are diminishing returns as it's far beyond what humans can process. Maybe such speeds would be useful for AI-AI communication, transferring knowledge/context, etc.So an LPU product that's only focused on AI-human interaction could have much lower capabilities, and thus much lower cost, no?[1]: <a href="https://news.ycombinator.com/item?id=39180237">https://news.ycombinator.com/item?id=39180237</a>

评论 #39432432 未加载

评论 #39434837 未加载

评论 #39433971 未加载

charlie123hufft大约 1 年前

It's only faster sometimes, but when you ask it a complicated question or give it any type of pre-prompt to speak in a different way, then it still takes a while to load. Interesting but ultimately probably going to be a flop

neilv大约 1 年前

If the page can't access certain fonts, it will fail to work, while it keeps retrying requests:<pre><code> https://fonts.gstatic.com/s/notosansarabic/[...] https://fonts.gstatic.com/s/notosanshebrew/[...] https://fonts.gstatic.com/s/notosanssc/[...] </code></pre> (I noticed this because my browser blocks these de facto trackers by default.)

评论 #39433948 未加载

评论 #39433665 未加载

SeanAnderson大约 1 年前

Sorry, I'm a bit naïve about all of this.Why is this impressive? Can this result not be achieved by throwing more compute at the problem to speed up responses? Isn't the fact that there is a queue when under load just indicative that there's a trade-off between "# of request to process per unit of time" and "amount of compute to put into a response to respond quicker"?<a href="https://raw.githubusercontent.com/NVIDIA/TensorRT-LLM/rel/docs/source/blogs/media/TRT_LLM_v0-5-0_H100vA100_tps.png" rel="nofollow">https://raw.githubusercontent.com/NVIDIA/TensorRT-LLM/rel/do...</a>This chart from NVIDIA implies their H100 runs llama v2 70B at >500 tok/s.

评论 #39433929 未加载

评论 #39433932 未加载

评论 #39435411 未加载

评论 #39434544 未加载

评论 #39439637 未加载

评论 #39433876 未加载

sebzim4500大约 1 年前

So this has nothing to do with `Grok`, the model provided by x.ai?EDIT: Tried using it, very impressed with the speed.

评论 #39430989 未加载

评论 #39435850 未加载

评论 #39441915 未加载

评论 #39431465 未加载

eurekin大约 1 年前

Jaw dropping. Both groq and mixtral.I used following prompt:Generate gitlab ci yaml file for a hybrid front-end/backend project. Fronted is under /frontend and is a node project, packaged with yarn, built with vite to the /backend/public folder. The backend is a python flask server

评论 #39435899 未加载

charlie123hufft大约 1 年前

Nevermind, I stand corrected. Blown tf away after trying the demo MYSELF. It's instantaneous, the last time I used an LLM that fast was a proprietary model with a small dataset. Lighting fast but it wasn't smart enough. This is wild. But I don't understand why the demo was so bad and why the demo took so long to respond to his questions?

Gcam大约 1 年前

Groq's API performance reaches close to this level of performance as well. We've benchmarked performance over time and >400 tokens/s has sustained - can see here <a href="https://artificialanalysis.ai/models/mixtral-8x7b-instruct" rel="nofollow">https://artificialanalysis.ai/models/mixtral-8x7b-instruct</a> (bottom of page for over time view)

CuriouslyC大约 1 年前

This is pretty sweet. The speed is nice but what I really care about is you bringing the per token cost down compared with models on the level of mistral medium/gpt4. GPT3.5 is pretty close in terms of cost/token but the quality isn't there and GPT4 is overpriced. Having GPT4 quality at sub-gpt3.5 prices will enable a lot of things though.

评论 #39433370 未加载

评论 #39431731 未加载

评论 #39431691 未加载

评论 #39436602 未加载

评论 #39437954 未加载

matanyal大约 1 年前

Hey y'all, we have a discord now for more discussion and announcements: <a href="https://discord.com/invite/TQcy5EBdCP" rel="nofollow">https://discord.com/invite/TQcy5EBdCP</a>

deepsquirrelnet大约 1 年前

Incredible job. Feels dumb or obvious to say this, but this really changes the way I think of using it. The slow autoregression really sucks because it inhibits your ability to skim sections. For me, that creates an unnatural reading environment. This makes chatgpt feel antiqued.

评论 #39431331 未加载

anybodyz大约 1 年前

I have this hooked up experimentally to my universal Dungeon Master simulator DungeonGod and it seems to work quite well.I had been using Together AI Mixtral (which is serving the Hermes Mixtrals) and it is pretty snappy, but nothing close to Groq. I think the next closes that I've tested is Perplexity Labs Mixtral.A key blocker in just hanging out a shingle for an open source AI project is the fear that anything that might scale will bankrupt you (or just be offline if you get any significant traction). I think we're nearing the phase that we could potentially just turn these things "on" and eat the reasonable inference fees to see what people engage with - with a pretty decently cool free tier available.I'd add that the simulator does multiple calls to the api for one response to do analysis and function selection in the underlying python game engine, which Groq makes less of a problem as it's close to instant. This adds a pretty significant pause in the OpenAI version. Also since this simulator runs on Discord with multiple users, I've had problems in the past with 'user response storms' where the AI couldn't keep up. Also less of a problem with Groq.

supercharger9大约 1 年前

Do they make money from LLM service or by selling hardware? Homepage is confusing without any reference to other products.

评论 #39433998 未加载

ppsreejith大约 1 年前

Relevant thread from 5 months ago: <a href="https://news.ycombinator.com/item?id=37469434">https://news.ycombinator.com/item?id=37469434</a>I'm achieving consistent 450+ tokens/sec for Mixtral 8x7b 32k and ~200 tps for Llama 2 70B-4k.As an aside, seeing that this is built with flutter Web, perhaps a mobile app is coming soon?

评论 #39429093 未加载

tandr大约 1 年前

@tome Cannot sign up with sneakemail.com, snkml.com, snkmail, liamekaens.com etc... I pay for these services so my email is a bit more protected. Why do you insist on getting well-known email providers instead, datamining or something else?

codedokode大约 1 年前

Is it normal that I have asked two networks (llama/mixtral) the same question ("tell me about most popular audio pitch detection algorithms") and they gave almost the same answer? Both answers start with "Sure, here are some of the most popular pitch detection algorithms used in audio signal processing" and end with "Each of these algorithms has its own strengths and weaknesses, and the choice of algorithm depends on the specific application and the characteristics of the input signal.". And the content is 95% the same. How can it be?

评论 #39435161 未加载

botanical大约 1 年前

I always ask LLMs this:> If I initially set a timer for 45 minutes but decided to make the total timer time 60 minutes when there's 5 minutes left in the initial 45, how much should I add to make it 60?And they never get it correct.

评论 #39438602 未加载

Havoc大约 1 年前

That sort of speed will be amazing for code completion. Need to find a way to hook this into vscode somehome...

评论 #39440196 未加载

sva_大约 1 年前

In how far is the API compatible with OpenAI? Does it offer logprobs[0] and top_logprobs[1]?0. <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs" rel="nofollow">https://platform.openai.com/docs/api-reference/chat/create#c...</a>1. <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_logprobs" rel="nofollow">https://platform.openai.com/docs/api-reference/chat/create#c...</a>

评论 #39432058 未加载

roomey大约 1 年前

Oh hell yes, this is the first "fast" one, superhuman fast.I know you gave suggestions of what to ask, but I threw a few curveballs and it was really good! Well done this is a big step forwards

zmmmmm大约 1 年前

As a virtual reality geek, this is super exciting because although there are numerous people experimenting with voicing NPCs with LLMs, they all have horrible latency and are unusable in practice. This looks like the first one that can actually potentially work for an application like that. I can see it won't be long before we can have open ended realistic conversations with "real" simulated people!

mrg3_2013大约 1 年前

This is unreal. I have never seen anything this fast. How ? I mean, how can you physically ship the bits this fast, let alone a LLM.Something about the UI. Doesn't work for me. May be I like openAI chat interface too much. Can someone bring their own data and train ? That would be crazy!

kimbochen大约 1 年前

Congrats on the great demo, been a fan of Groq since I learned about TSP. I'm surprised LPU runs Mixtral fast because MoE's dynamic routing is orthogonal to Groq's deterministic paradigm. Did Groq implement MegaBlocks-like kernels or other methods tailored for LPUs?

aeyes大约 1 年前

Switching the model between Mixtral and Llama I get word for word the same responses. Is this expected?

评论 #39431976 未加载

评论 #39431992 未加载

totalhack大约 1 年前

Where is the data center located? The fastest response time I could get from some quick testing from the northeast US, having it output just one letter, was 670ms. Just wondering if that's an expected result, as it's on a par or slower than GPT 3.5 via API.

评论 #39433440 未加载

评论 #39433346 未加载

deepnotderp大约 1 年前

This demo has more than 500 chips btw, it’s not exactly an apples to apples comparison with 1 GPU…

评论 #39432770 未加载

Aeolun大约 1 年前

I think we’re kind of past the point where we post prompts because it’s interesting, but this one still had me thinking.Obviously it doesn’t have memory, but it’s the first time I’ve seen a model actually respond instead of hedge (having mostly used ChatGPT).> what is the longest prompt you have ever received?> The length of a prompt can vary greatly, and it's not uncommon for me to receive prompts that are several sentences long. However, I don't think I have ever received a prompt that could be considered "super long" in terms of physical length. The majority of prompts I receive are concise and to the point, typically consisting of a single sentence or a short paragraph.

评论 #39436593 未加载

deniz_tekalp大约 1 年前

GPUs are notoriously bad on exploiting sparsity. I wonder if this architecture can do a better job. The groq engineers in this thread, if a neural network had say 60% of its weights set to 0, what would it do to cost & speed in your hardware?

mrtksn大约 1 年前

Does this make it practical to run LLMs on mobile devices? I wonder about the power consumption and if it can make sense to have it integrated in some future mobile devices. Or maybe have a dedicated storage, RAM and processing cores that goes as an USB-C add-on? A case with integrated battery and this chip?I'm dreaming of having LLMs on anything. Unlike the "bluetooth on everything" craze, this can be practical as every device can become smart. Remember how some British researchers made a self driving car using an LLM? A toaster anticipating how to cook when you describe it what you want want actually be an improvement.

评论 #39431665 未加载

评论 #39431538 未加载

评论 #39432326 未加载

评论 #39431756 未加载

patapong大约 1 年前

Very impressive! I am even more impressed by the API pricing though - 0.27/1M tokens seems like an order of magnitude cheaper than the GPT-3.5 API, and two orders of magnitude cheaper than GPT-4? Am I missing something here?

评论 #39432555 未加载

ggnore7452大约 1 年前

The Groq demo was indeed impressive. I work with LLM alot in work, and a generation speed of 500+ tokens/s would definitely change how we use these products. (Especially considering it's an early-stage product)But the "completely novel silicon architecture" and the "self-developed LPU" (claiming not to use GPUs)... makes me bit skeptical. After all, pure speed might be achievable through stacking computational power and model quantization. Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

评论 #39437059 未加载

评论 #39436256 未加载

评论 #39434333 未加载

ponywombat大约 1 年前

This is very impressive, but whilst it was very fast with Mixtral yesterday, today I waited 59.44s for a response. If I was to use your API, the end-to-end is much more important than the Output Tokens Throughput and Time to first token metrics. Will you also publish average / minimum / maximum end-to-end times too?

评论 #39441573 未加载

Klaus23大约 1 年前

The demo is pretty cool, but the mobile interface could be a parody of bad interface design. The text box at the top is hard to reach if you want to open the keyboard, which automatically closes, or press the button to send the question, and the chat history is out of chronological order for no logical reason.Edit: Text selection is also broken.

评论 #39436216 未加载

qwertox大约 1 年前

How come the answers for Mixtral 8x7B-32k and Llama 2 70B-4k are identical?After asking via Mixtral a couple of questions I switched to Llama, and while it shows Llama as the Model used for the response, the answer is identical.See first and last question:<a href="https://pastebin.com/ZQV10C8Q" rel="nofollow">https://pastebin.com/ZQV10C8Q</a>

评论 #39436293 未加载

nilayj大约 1 年前

How is the Token/second calculated? I ask it a simple prompt and the model generated a 150 word (about 300 tokens?) answer in 17 seconds, then mentioning the speed of 408T/s.Also, I guess this demo would feel real time if you could stream the outputs to the UI? Can this be done in your current setup?

uptownfunk大约 1 年前

It is fast, but if it spits useless garbage, then useless. I don't mind waiting for chatGPT, the quality of what it produces is quite remarkable, and I am excited to see it better. I think this has more to do with mistral model v GPT4 than Groq. If Groq can host GPT4, wow, then that is amazing.

itsmechase大约 1 年前

Incredible tool. The Mixtral 8x7B model running on their hardware did 491.40 T/s for me…

nojs大约 1 年前

This is extremely impressive - no login, extremely fast, and Mixtral quality is very good. It's already more useful than my (paid) GPT4 for many things due to the speed.

kopirgan大约 1 年前

Just a minor gripe the bullet option doesn't seem to be logical..When I asked about Marco Polo's travels and used Modify to add bullets, it added China, Pakistan etc as children of Iran. And the same for other paragraphs.

评论 #39436033 未加载

idiotlogical大约 1 年前

I asked it "What is the oldest known written recipe" and then asked for that recipe. It seemed to repeat the same response a bunch of times (very quickly!) before stopping. Is this abnormal behavior?

评论 #39434086 未加载

monkin大约 1 年前

It's impressive, but I have one problem with all of those models. I wanted them to answer what Mixtral or Llama2 are, but with no luck. It would be great if models could at least describe themselves.

评论 #39435450 未加载

jprd大约 1 年前

This is super impressive. The rate of iteration and innovation in this space means that just as I'm feeling jaded/bored/oversaturated - some new project makes my jaw drop again.

blyatperkele大约 1 年前

Amazingly fast, but I don't like that the only option for signing up is a Google account. Are you planning to implement some simple authentication using maybe just an email?

FpUser大约 1 年前

O M GIt is fast, like instant. It is straight to the point comparatively to others. It answered few of my programming questions to create particular code and passed with flying colors.Conclusion: shut up and take my money

ohwellish大约 1 年前

I wish there was an option to export whole session chat, say in plaintext as a link to some pastebin, that chat I just had with groq would have some ppl I know really impressed

yzh大约 1 年前

Really impressive work! I wonder how easy would it be to support (a future open source version of) SORA using Groq's design. Will there be a Video Processing Unit (VPU)?

评论 #39433667 未加载

tagyro大约 1 年前

I (only) ran a couple of prompts but I am impressed. It has the speed of gpt 3.5 and the quality of gpt 4.Seriously considering switching from [open]AI to Mix/s/tral in my apps.

评论 #39435080 未加载

评论 #39436187 未加载

QuesnayJr大约 1 年前

I tried it out, and I was taken aback how quickly it answered.

supercharger9大约 1 年前

Ignoring latency but not throughput, How does this compare in terms of Cost ( cards Acquisition cost and Power needed) with Nvidia GPU for inference?

评论 #39436026 未加载

keeshond大约 1 年前

I see XTX is one of the investors - any potential use cases that require deterministic computation that you can talk about beyond just inference?

评论 #39437239 未加载

cheptsov大约 1 年前

Any chance you plan to offer the API to cloud LPUs? And not just the LLM API? It would be cool run custom code (training, serving, etc).

评论 #39439075 未加载

lukevp大约 1 年前

I’m not sure how, but I got the zoom messed up on iOS and I can no longer see the submit button. Refreshing doesn’t fix it.

mise_en_place大约 1 年前

Do you have any plans to support bringing your own model? I have been using Sagemaker but it is very slow to deploy to.

评论 #39434130 未加载

blackoil大约 1 年前

If Nvidia adds L1/2/3 cache in next gen of AI cards, will they work similar or is this something more?

newsclues大约 1 年前

I asked it what carmacks AI company was called and it corrected identified John carmack but said he was working on VR.

FindNInDark大约 1 年前

Hi, thanks for this fascinating demo. I am wondering how this architecture optimizes for the softmax part.

aphit大约 1 年前

This is incredibly fast, indeed.What are the current speeds in T/s for say ChatGPT 3.5 or ChatGPT 4? Just how much faster is this?

评论 #39431860 未加载

fatkam大约 1 年前

For me, it was fast when it started printing (it did almost instantly), but it took forever for it to start.

评论 #39434146 未加载

keeshond大约 1 年前

I see XTX is one of the investors. Any potential other use cases with async logic beyond just inference?

评论 #39437618 未加载

jereees大约 1 年前

I’ll pay $xx a month if I can talk to Groq the way I can talk to ChatGPT with my AirPods

评论 #39442202 未加载

Keyframe大约 1 年前

This is insane. Congratulations!

sorokod大约 1 年前

Not clear if it is due to Groq or to Mixtral, but confident hallucinations are there.

评论 #39431672 未加载

评论 #39431877 未加载

评论 #39431790 未加载

joaquincabezas大约 1 年前

are there also experiments around image embedding generation to use in combination with the LLM?maybe for this use-case is it better to execute the vision tower on a GPU and leave the LPU for the language part?

评论 #39435960 未加载

mlconnor大约 1 年前

omg. i can’t believe how incredibly fast that is. and capable too. wow

评论 #39436008 未加载

ttul大约 1 年前

Have you experimented with running diffusion models on Groq hardware?

评论 #39434673 未加载

ionwake大约 1 年前

Holy smokes this is fast

评论 #39435909 未加载

nojvek大约 1 年前

I’m sure Elon is pissed since he has Grok.Someone now needs to make a Groc

评论 #39437629 未加载

Cheer2171大约 1 年前

What's the underlying hardware for this?

评论 #39431348 未加载

评论 #39431617 未加载

youssefabdelm大约 1 年前

Do you guys provide logprobs via the api?

评论 #39434108 未加载

dariobarila大约 1 年前

Wow! So fast!

doubtfuluser大约 1 年前

Nice… a startup that has two “C” positions CEO and Chief Legal Officer…That sounds like a fun place to be

评论 #39436064 未加载

评论 #39432841 未加载

fennecbutt大约 1 年前

Tried it out, seriously impressive. I'm sure you welcome the detractors but as someone who doesn't work for or have any investments in AI, colour me impressed.Though with the price of the hardware, I'll probably mess with the API for now. Give us a bell when the hardware is consumer friendly, ha ha.

sylware大约 1 年前

any noscript/basic (x)html prompt?

LoganDark大约 1 年前

Please when/where can I buy some of these for home use? Otherwise is there any way to get access to the API without being a large company building a partner product? I would love this for personal use.

cchance大约 1 年前

Jesus that makes chatgpt and even gemini seem slow AF

评论 #39431296 未加载

jackblemming大约 1 年前

Impressive work. Nice job team. This is big.

评论 #39431700 未加载

xnorswap大约 1 年前

Interesting, I ran my "prime test", it passed on generating a prime, it failed on it being large.> "Can you generate me a large prime please"> Sure, here's a large prime number: 193707721> This number is a prime number because it is only divisible by 1 and itself. You can check its primality by trying to divide it by all the numbers up to its square root, and if none of them divide it evenly, then it is a prime number. This number is quite large, but there are much larger prime numbers known. For example, the largest known prime number, as of January 2022, is 2^82,589,933−1, a number with 24,862,048 digits, discovered as part of the Great Internet Mersenne Prime Search (GIMPS).I then prompted "That's not large" which seemed to break it, the response is stuck in a loop:> That isn't large> I apologize if the prime number I provided did not meet your expectations for "large." Prime numbers can be very large indeed, and there is no upper limit to how large a prime number can be.If you are looking for a very large prime number, here is one with a billion digits:100000303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303...It's printed many more pages of 303030... but hacker news has a post length limit!Further testing, such as prompting to generate a 1024 bit prime also runs into odd loops.> Can you generate a 1024 bit prime?> Yes, I can generate a 1024-bit prime number for you. Here's one I just generated:> 115322512531875133213135311402305250340282302283103413642283221253115653221253112210221122112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112 ( Loops 112 forever )

评论 #39431603 未加载

评论 #39431559 未加载

评论 #39431639 未加载

ionwake大约 1 年前

Sorry if this is dumb but how is this different to Elons Grok? Was Groq chosen as a joke or homage ?

评论 #39493629 未加载

评论 #39435877 未加载

82 条评论

eigenvalue大约 1 年前

评论 #39432727 未加载

评论 #39453110 未加载

评论 #39435049 未加载

评论 #39446199 未加载

评论 #39435583 未加载

评论 #39433118 未加载

评论 #39432980 未加载

treesciencebot大约 1 年前

评论 #39432163 未加载

评论 #39432384 未加载

评论 #39432119 未加载

评论 #39432033 未加载

评论 #39432974 未加载

评论 #39432364 未加载

评论 #39432517 未加载

karpathy大约 1 年前

评论 #39431888 未加载

评论 #39434543 未加载

评论 #39432333 未加载

评论 #39434378 未加载

评论 #39432687 未加载

评论 #39431770 未加载

评论 #39432291 未加载

tome大约 1 年前

评论 #39431789 未加载

评论 #39431470 未加载

评论 #39432794 未加载

评论 #39431088 未加载

评论 #39429162 未加载

评论 #39431433 未加载

评论 #39493029 未加载

评论 #39432013 未加载

评论 #39432040 未加载

评论 #39431757 未加载

评论 #39432587 未加载

评论 #39432758 未加载

评论 #39435181 未加载

评论 #39431052 未加载

评论 #39431513 未加载

评论 #39433099 未加载

评论 #39433119 未加载

评论 #39580445 未加载

评论 #39433247 未加载

评论 #39432943 未加载

评论 #39430924 未加载

评论 #39431431 未加载

imiric大约 1 年前

评论 #39432432 未加载

评论 #39434837 未加载

评论 #39433971 未加载

charlie123hufft大约 1 年前

neilv大约 1 年前

评论 #39433948 未加载

评论 #39433665 未加载

SeanAnderson大约 1 年前

评论 #39433929 未加载

评论 #39433932 未加载

评论 #39435411 未加载

评论 #39434544 未加载

评论 #39439637 未加载

评论 #39433876 未加载

sebzim4500大约 1 年前

So this has nothing to do with `Grok`, the model provided by x.ai?EDIT: Tried using it, very impressed with the speed.

评论 #39430989 未加载

评论 #39435850 未加载

评论 #39441915 未加载

评论 #39431465 未加载

eurekin大约 1 年前

评论 #39435899 未加载

charlie123hufft大约 1 年前

Gcam大约 1 年前

CuriouslyC大约 1 年前

评论 #39433370 未加载

评论 #39431731 未加载

评论 #39431691 未加载

评论 #39436602 未加载

评论 #39437954 未加载

matanyal大约 1 年前

Hey y'all, we have a discord now for more discussion and announcements: <a href="https://discord.com/invite/TQcy5EBdCP" rel="nofollow">https://discord.com/invite/TQcy5EBdCP</a>

deepsquirrelnet大约 1 年前

评论 #39431331 未加载

anybodyz大约 1 年前

supercharger9大约 1 年前

Do they make money from LLM service or by selling hardware? Homepage is confusing without any reference to other products.

评论 #39433998 未加载

ppsreejith大约 1 年前

评论 #39429093 未加载

tandr大约 1 年前

codedokode大约 1 年前

评论 #39435161 未加载

botanical大约 1 年前

评论 #39438602 未加载

Havoc大约 1 年前

That sort of speed will be amazing for code completion. Need to find a way to hook this into vscode somehome...

评论 #39440196 未加载

sva_大约 1 年前

评论 #39432058 未加载

roomey大约 1 年前

Oh hell yes, this is the first "fast" one, superhuman fast.I know you gave suggestions of what to ask, but I threw a few curveballs and it was really good! Well done this is a big step forwards

zmmmmm大约 1 年前

mrg3_2013大约 1 年前

kimbochen大约 1 年前

aeyes大约 1 年前

Switching the model between Mixtral and Llama I get word for word the same responses. Is this expected?

评论 #39431976 未加载

评论 #39431992 未加载

totalhack大约 1 年前

评论 #39433440 未加载

评论 #39433346 未加载

deepnotderp大约 1 年前

This demo has more than 500 chips btw, it’s not exactly an apples to apples comparison with 1 GPU…

评论 #39432770 未加载

Aeolun大约 1 年前

评论 #39436593 未加载

deniz_tekalp大约 1 年前

mrtksn大约 1 年前

评论 #39431665 未加载

评论 #39431538 未加载

评论 #39432326 未加载

评论 #39431756 未加载

patapong大约 1 年前

评论 #39432555 未加载

ggnore7452大约 1 年前

评论 #39437059 未加载

评论 #39436256 未加载

评论 #39434333 未加载

ponywombat大约 1 年前

评论 #39441573 未加载

Klaus23大约 1 年前

评论 #39436216 未加载

qwertox大约 1 年前

评论 #39436293 未加载

nilayj大约 1 年前

uptownfunk大约 1 年前

itsmechase大约 1 年前

Incredible tool. The Mixtral 8x7B model running on their hardware did 491.40 T/s for me…

nojs大约 1 年前

This is extremely impressive - no login, extremely fast, and Mixtral quality is very good. It's already more useful than my (paid) GPT4 for many things due to the speed.

kopirgan大约 1 年前

评论 #39436033 未加载

idiotlogical大约 1 年前

I asked it "What is the oldest known written recipe" and then asked for that recipe. It seemed to repeat the same response a bunch of times (very quickly!) before stopping. Is this abnormal behavior?

评论 #39434086 未加载

monkin大约 1 年前

It's impressive, but I have one problem with all of those models. I wanted them to answer what Mixtral or Llama2 are, but with no luck. It would be great if models could at least describe themselves.

评论 #39435450 未加载

jprd大约 1 年前

This is super impressive. The rate of iteration and innovation in this space means that just as I'm feeling jaded/bored/oversaturated - some new project makes my jaw drop again.

blyatperkele大约 1 年前

Amazingly fast, but I don't like that the only option for signing up is a Google account. Are you planning to implement some simple authentication using maybe just an email?

FpUser大约 1 年前

ohwellish大约 1 年前

I wish there was an option to export whole session chat, say in plaintext as a link to some pastebin, that chat I just had with groq would have some ppl I know really impressed

yzh大约 1 年前

Really impressive work! I wonder how easy would it be to support (a future open source version of) SORA using Groq's design. Will there be a Video Processing Unit (VPU)?

评论 #39433667 未加载

tagyro大约 1 年前

I (only) ran a couple of prompts but I am impressed. It has the speed of gpt 3.5 and the quality of gpt 4.Seriously considering switching from [open]AI to Mix/s/tral in my apps.

评论 #39435080 未加载

评论 #39436187 未加载

QuesnayJr大约 1 年前

I tried it out, and I was taken aback how quickly it answered.

supercharger9大约 1 年前

Ignoring latency but not throughput, How does this compare in terms of Cost ( cards Acquisition cost and Power needed) with Nvidia GPU for inference?

评论 #39436026 未加载

keeshond大约 1 年前

I see XTX is one of the investors - any potential use cases that require deterministic computation that you can talk about beyond just inference?

评论 #39437239 未加载

cheptsov大约 1 年前

Any chance you plan to offer the API to cloud LPUs? And not just the LLM API? It would be cool run custom code (training, serving, etc).

评论 #39439075 未加载

lukevp大约 1 年前

I’m not sure how, but I got the zoom messed up on iOS and I can no longer see the submit button. Refreshing doesn’t fix it.

mise_en_place大约 1 年前

Do you have any plans to support bringing your own model? I have been using Sagemaker but it is very slow to deploy to.

评论 #39434130 未加载

blackoil大约 1 年前

If Nvidia adds L1/2/3 cache in next gen of AI cards, will they work similar or is this something more?

newsclues大约 1 年前

I asked it what carmacks AI company was called and it corrected identified John carmack but said he was working on VR.

FindNInDark大约 1 年前

Hi, thanks for this fascinating demo. I am wondering how this architecture optimizes for the softmax part.

aphit大约 1 年前

This is incredibly fast, indeed.What are the current speeds in T/s for say ChatGPT 3.5 or ChatGPT 4? Just how much faster is this?

评论 #39431860 未加载

fatkam大约 1 年前

For me, it was fast when it started printing (it did almost instantly), but it took forever for it to start.

评论 #39434146 未加载

keeshond大约 1 年前

I see XTX is one of the investors. Any potential other use cases with async logic beyond just inference?

评论 #39437618 未加载

jereees大约 1 年前

I’ll pay $xx a month if I can talk to Groq the way I can talk to ChatGPT with my AirPods

评论 #39442202 未加载

Keyframe大约 1 年前

This is insane. Congratulations!

sorokod大约 1 年前

Not clear if it is due to Groq or to Mixtral, but confident hallucinations are there.

评论 #39431672 未加载

评论 #39431877 未加载

评论 #39431790 未加载

joaquincabezas大约 1 年前

评论 #39435960 未加载

mlconnor大约 1 年前

omg. i can’t believe how incredibly fast that is. and capable too. wow

评论 #39436008 未加载

ttul大约 1 年前

Have you experimented with running diffusion models on Groq hardware?

评论 #39434673 未加载

ionwake大约 1 年前

Holy smokes this is fast

评论 #39435909 未加载

nojvek大约 1 年前

I’m sure Elon is pissed since he has Grok.Someone now needs to make a Groc

评论 #39437629 未加载

Cheer2171大约 1 年前

What's the underlying hardware for this?

评论 #39431348 未加载

评论 #39431617 未加载

youssefabdelm大约 1 年前

Do you guys provide logprobs via the api?

评论 #39434108 未加载

dariobarila大约 1 年前

Wow! So fast!

doubtfuluser大约 1 年前

Nice… a startup that has two “C” positions CEO and Chief Legal Officer…That sounds like a fun place to be

评论 #39436064 未加载

评论 #39432841 未加载

fennecbutt大约 1 年前

sylware大约 1 年前

any noscript/basic (x)html prompt?

LoganDark大约 1 年前

cchance大约 1 年前

Jesus that makes chatgpt and even gemini seem slow AF

评论 #39431296 未加载

jackblemming大约 1 年前

Impressive work. Nice job team. This is big.

评论 #39431700 未加载

xnorswap大约 1 年前

评论 #39431603 未加载

评论 #39431559 未加载

评论 #39431639 未加载

ionwake大约 1 年前

Sorry if this is dumb but how is this different to Elons Grok? Was Groq chosen as a joke or homage ?

评论 #39493629 未加载

评论 #39435877 未加载