Groq runs Mixtral 8x7B-32k with 500 T/s

847 pointsby tin7inover 1 year ago

82 comments

eigenvalueover 1 year ago

I just want to say that this is one of the most impressive tech demos I’ve ever seen in my life, and I love that it’s truly an open demo that anyone can try without even signing up for an account or anything like that. It’s surreal to see the thing spitting out tokens at such a crazy rate when you’re used to watching them generate at one less than one fifth that speed. I’m surprised you guys haven’t been swallowed up by Microsoft, Apple, or Google already for a huge premium.

评论 #39432727 未加载

评论 #39453110 未加载

评论 #39435049 未加载

评论 #39446199 未加载

评论 #39435583 未加载

评论 #39433118 未加载

评论 #39432980 未加载

treesciencebotover 1 year ago

The main problem with the Groq LPUs is, they don't have any HBM on them at all. Just a miniscule (230 MiB) [0] amount of ultra-fast SRAM (20x faster than HBM3, just to be clear). Which means you need ~256 LPUs (4 full server racks of compute, each unit on the rack contains 8x LPUs and there are 8x of those units on a single rack) just to serve a single model [1] where as you can get a single H200 (1/256 of the server rack density) and serve these models reasonably well.It might work well if you have a single model with lots of customers, but as soon as you need more than a single model and a lot of finetunes/high rank LoRAs etc., these won't be usable. Or for any on-prem deployment since the main advantage is consolidating people to use the same model, together.[0]: <a href="https://wow.groq.com/groqcard-accelerator/" rel="nofollow">https://wow.groq.com/groqcard-accelerator/</a>[1]: <a href="https://twitter.com/tomjaguarpaw/status/1759615563586744334" rel="nofollow">https://twitter.com/tomjaguarpaw/status/1759615563586744334</a>

评论 #39432163 未加载

评论 #39432384 未加载

评论 #39432119 未加载

评论 #39432033 未加载

评论 #39432974 未加载

评论 #39432364 未加载

评论 #39432517 未加载

karpathyover 1 year ago

Very impressive looking! Just wanted to caution it's worth being a bit skeptical without benchmarks as there are a number of ways to cut corners. One prominent example is heavy model quantization, which speeds up the model at a cost of model quality. Otherwise I'd love to see LLM tok/s progress exactly like CPU instructions/s did a few decades ago.

评论 #39431888 未加载

评论 #39434543 未加载

评论 #39432333 未加载

评论 #39434378 未加载

评论 #39432687 未加载

评论 #39431770 未加载

评论 #39432291 未加载

tomeover 1 year ago

Hi folks, I work for Groq. Feel free to ask me any questions.(If you check my HN post history you'll see I post a lot about Haskell. That's right, part of Groq's compilation pipeline is written in Haskell!)

评论 #39431789 未加载

评论 #39431470 未加载

评论 #39432794 未加载

评论 #39431088 未加载

评论 #39429162 未加载

评论 #39431433 未加载

评论 #39493029 未加载

评论 #39432013 未加载

评论 #39432040 未加载

评论 #39431757 未加载

评论 #39432587 未加载

评论 #39432758 未加载

评论 #39435181 未加载

评论 #39431052 未加载

评论 #39431513 未加载

评论 #39433099 未加载

评论 #39433119 未加载

评论 #39580445 未加载

评论 #39433247 未加载

评论 #39432943 未加载

评论 #39430924 未加载

评论 #39431431 未加载

imiricover 1 year ago

Impressive demo!However, the hardware requirements and cost make this inaccessible for anyone but large companies. When do you envision that the price could be affordable for hobbyists?Also, while the CNN Vapi demo was impressive as well, a few weeks ago here[1] someone shared <a href="https://smarterchild.chat/" rel="nofollow">https://smarterchild.chat/</a>. That also has _very_ low audio latency, making natural conversation possible. From that discussion it seems that <a href="https://www.sindarin.tech/" rel="nofollow">https://www.sindarin.tech/</a> is behind it. Do we know if they use Groq LPUs or something else?I think that once you reach ~50 t/s, real-time interaction is possible. Anything higher than that is useful for generating large volumes of data quickly, but there are diminishing returns as it's far beyond what humans can process. Maybe such speeds would be useful for AI-AI communication, transferring knowledge/context, etc.So an LPU product that's only focused on AI-human interaction could have much lower capabilities, and thus much lower cost, no?[1]: <a href="https://news.ycombinator.com/item?id=39180237">https://news.ycombinator.com/item?id=39180237</a>

评论 #39432432 未加载

评论 #39434837 未加载

评论 #39433971 未加载

charlie123hufftabout 1 year ago

It's only faster sometimes, but when you ask it a complicated question or give it any type of pre-prompt to speak in a different way, then it still takes a while to load. Interesting but ultimately probably going to be a flop

neilvover 1 year ago

If the page can't access certain fonts, it will fail to work, while it keeps retrying requests:<pre><code> https://fonts.gstatic.com/s/notosansarabic/[...] https://fonts.gstatic.com/s/notosanshebrew/[...] https://fonts.gstatic.com/s/notosanssc/[...] </code></pre> (I noticed this because my browser blocks these de facto trackers by default.)

评论 #39433948 未加载

评论 #39433665 未加载

SeanAndersonover 1 year ago

Sorry, I'm a bit naïve about all of this.Why is this impressive? Can this result not be achieved by throwing more compute at the problem to speed up responses? Isn't the fact that there is a queue when under load just indicative that there's a trade-off between "# of request to process per unit of time" and "amount of compute to put into a response to respond quicker"?<a href="https://raw.githubusercontent.com/NVIDIA/TensorRT-LLM/rel/docs/source/blogs/media/TRT_LLM_v0-5-0_H100vA100_tps.png" rel="nofollow">https://raw.githubusercontent.com/NVIDIA/TensorRT-LLM/rel/do...</a>This chart from NVIDIA implies their H100 runs llama v2 70B at >500 tok/s.

评论 #39433929 未加载

评论 #39433932 未加载

评论 #39435411 未加载

评论 #39434544 未加载

评论 #39439637 未加载

评论 #39433876 未加载

sebzim4500over 1 year ago

So this has nothing to do with `Grok`, the model provided by x.ai?EDIT: Tried using it, very impressed with the speed.

评论 #39430989 未加载

评论 #39435850 未加载

评论 #39441915 未加载

评论 #39431465 未加载

eurekinover 1 year ago

Jaw dropping. Both groq and mixtral.I used following prompt:Generate gitlab ci yaml file for a hybrid front-end/backend project. Fronted is under /frontend and is a node project, packaged with yarn, built with vite to the /backend/public folder. The backend is a python flask server

评论 #39435899 未加载

charlie123hufftabout 1 year ago

Nevermind, I stand corrected. Blown tf away after trying the demo MYSELF. It's instantaneous, the last time I used an LLM that fast was a proprietary model with a small dataset. Lighting fast but it wasn't smart enough. This is wild. But I don't understand why the demo was so bad and why the demo took so long to respond to his questions?

Gcamover 1 year ago

Groq's API performance reaches close to this level of performance as well. We've benchmarked performance over time and >400 tokens/s has sustained - can see here <a href="https://artificialanalysis.ai/models/mixtral-8x7b-instruct" rel="nofollow">https://artificialanalysis.ai/models/mixtral-8x7b-instruct</a> (bottom of page for over time view)

CuriouslyCover 1 year ago

This is pretty sweet. The speed is nice but what I really care about is you bringing the per token cost down compared with models on the level of mistral medium/gpt4. GPT3.5 is pretty close in terms of cost/token but the quality isn't there and GPT4 is overpriced. Having GPT4 quality at sub-gpt3.5 prices will enable a lot of things though.

评论 #39433370 未加载

评论 #39431731 未加载

评论 #39431691 未加载

评论 #39436602 未加载

评论 #39437954 未加载

matanyalover 1 year ago

Hey y'all, we have a discord now for more discussion and announcements: <a href="https://discord.com/invite/TQcy5EBdCP" rel="nofollow">https://discord.com/invite/TQcy5EBdCP</a>

deepsquirrelnetover 1 year ago

Incredible job. Feels dumb or obvious to say this, but this really changes the way I think of using it. The slow autoregression really sucks because it inhibits your ability to skim sections. For me, that creates an unnatural reading environment. This makes chatgpt feel antiqued.

评论 #39431331 未加载

anybodyzover 1 year ago

I have this hooked up experimentally to my universal Dungeon Master simulator DungeonGod and it seems to work quite well.I had been using Together AI Mixtral (which is serving the Hermes Mixtrals) and it is pretty snappy, but nothing close to Groq. I think the next closes that I've tested is Perplexity Labs Mixtral.A key blocker in just hanging out a shingle for an open source AI project is the fear that anything that might scale will bankrupt you (or just be offline if you get any significant traction). I think we're nearing the phase that we could potentially just turn these things "on" and eat the reasonable inference fees to see what people engage with - with a pretty decently cool free tier available.I'd add that the simulator does multiple calls to the api for one response to do analysis and function selection in the underlying python game engine, which Groq makes less of a problem as it's close to instant. This adds a pretty significant pause in the OpenAI version. Also since this simulator runs on Discord with multiple users, I've had problems in the past with 'user response storms' where the AI couldn't keep up. Also less of a problem with Groq.

supercharger9over 1 year ago

Do they make money from LLM service or by selling hardware? Homepage is confusing without any reference to other products.

评论 #39433998 未加载

ppsreejithover 1 year ago

Relevant thread from 5 months ago: <a href="https://news.ycombinator.com/item?id=37469434">https://news.ycombinator.com/item?id=37469434</a>I'm achieving consistent 450+ tokens/sec for Mixtral 8x7b 32k and ~200 tps for Llama 2 70B-4k.As an aside, seeing that this is built with flutter Web, perhaps a mobile app is coming soon?

评论 #39429093 未加载

tandrover 1 year ago

@tome Cannot sign up with sneakemail.com, snkml.com, snkmail, liamekaens.com etc... I pay for these services so my email is a bit more protected. Why do you insist on getting well-known email providers instead, datamining or something else?

codedokodeover 1 year ago

Is it normal that I have asked two networks (llama/mixtral) the same question ("tell me about most popular audio pitch detection algorithms") and they gave almost the same answer? Both answers start with "Sure, here are some of the most popular pitch detection algorithms used in audio signal processing" and end with "Each of these algorithms has its own strengths and weaknesses, and the choice of algorithm depends on the specific application and the characteristics of the input signal.". And the content is 95% the same. How can it be?

评论 #39435161 未加载

botanicalover 1 year ago

I always ask LLMs this:> If I initially set a timer for 45 minutes but decided to make the total timer time 60 minutes when there's 5 minutes left in the initial 45, how much should I add to make it 60?And they never get it correct.

评论 #39438602 未加载

Havocover 1 year ago

That sort of speed will be amazing for code completion. Need to find a way to hook this into vscode somehome...

评论 #39440196 未加载

sva_over 1 year ago

In how far is the API compatible with OpenAI? Does it offer logprobs[0] and top_logprobs[1]?0. <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-logprobs" rel="nofollow">https://platform.openai.com/docs/api-reference/chat/create#c...</a>1. <a href="https://platform.openai.com/docs/api-reference/chat/create#chat-create-top_logprobs" rel="nofollow">https://platform.openai.com/docs/api-reference/chat/create#c...</a>

评论 #39432058 未加载

roomeyover 1 year ago

Oh hell yes, this is the first "fast" one, superhuman fast.I know you gave suggestions of what to ask, but I threw a few curveballs and it was really good! Well done this is a big step forwards

zmmmmmover 1 year ago

As a virtual reality geek, this is super exciting because although there are numerous people experimenting with voicing NPCs with LLMs, they all have horrible latency and are unusable in practice. This looks like the first one that can actually potentially work for an application like that. I can see it won't be long before we can have open ended realistic conversations with "real" simulated people!

mrg3_2013over 1 year ago

This is unreal. I have never seen anything this fast. How ? I mean, how can you physically ship the bits this fast, let alone a LLM.Something about the UI. Doesn't work for me. May be I like openAI chat interface too much. Can someone bring their own data and train ? That would be crazy!

kimbochenover 1 year ago

Congrats on the great demo, been a fan of Groq since I learned about TSP. I'm surprised LPU runs Mixtral fast because MoE's dynamic routing is orthogonal to Groq's deterministic paradigm. Did Groq implement MegaBlocks-like kernels or other methods tailored for LPUs?

aeyesover 1 year ago

Switching the model between Mixtral and Llama I get word for word the same responses. Is this expected?

评论 #39431976 未加载

评论 #39431992 未加载

totalhackover 1 year ago

Where is the data center located? The fastest response time I could get from some quick testing from the northeast US, having it output just one letter, was 670ms. Just wondering if that's an expected result, as it's on a par or slower than GPT 3.5 via API.

评论 #39433440 未加载

评论 #39433346 未加载

deepnotderpover 1 year ago

This demo has more than 500 chips btw, it’s not exactly an apples to apples comparison with 1 GPU…

评论 #39432770 未加载

Aeolunover 1 year ago

I think we’re kind of past the point where we post prompts because it’s interesting, but this one still had me thinking.Obviously it doesn’t have memory, but it’s the first time I’ve seen a model actually respond instead of hedge (having mostly used ChatGPT).> what is the longest prompt you have ever received?> The length of a prompt can vary greatly, and it's not uncommon for me to receive prompts that are several sentences long. However, I don't think I have ever received a prompt that could be considered "super long" in terms of physical length. The majority of prompts I receive are concise and to the point, typically consisting of a single sentence or a short paragraph.

评论 #39436593 未加载

deniz_tekalpover 1 year ago

GPUs are notoriously bad on exploiting sparsity. I wonder if this architecture can do a better job. The groq engineers in this thread, if a neural network had say 60% of its weights set to 0, what would it do to cost & speed in your hardware?

mrtksnover 1 year ago

Does this make it practical to run LLMs on mobile devices? I wonder about the power consumption and if it can make sense to have it integrated in some future mobile devices. Or maybe have a dedicated storage, RAM and processing cores that goes as an USB-C add-on? A case with integrated battery and this chip?I'm dreaming of having LLMs on anything. Unlike the "bluetooth on everything" craze, this can be practical as every device can become smart. Remember how some British researchers made a self driving car using an LLM? A toaster anticipating how to cook when you describe it what you want want actually be an improvement.

评论 #39431665 未加载

评论 #39431538 未加载

评论 #39432326 未加载

评论 #39431756 未加载

patapongover 1 year ago

Very impressive! I am even more impressed by the API pricing though - 0.27/1M tokens seems like an order of magnitude cheaper than the GPT-3.5 API, and two orders of magnitude cheaper than GPT-4? Am I missing something here?

评论 #39432555 未加载

ggnore7452over 1 year ago

The Groq demo was indeed impressive. I work with LLM alot in work, and a generation speed of 500+ tokens/s would definitely change how we use these products. (Especially considering it's an early-stage product)But the "completely novel silicon architecture" and the "self-developed LPU" (claiming not to use GPUs)... makes me bit skeptical. After all, pure speed might be achievable through stacking computational power and model quantization. Shouldn't innovation at the GPU level be quite challenging, especially to achieve such groundbreaking speeds?

评论 #39437059 未加载

评论 #39436256 未加载

评论 #39434333 未加载

ponywombatover 1 year ago

This is very impressive, but whilst it was very fast with Mixtral yesterday, today I waited 59.44s for a response. If I was to use your API, the end-to-end is much more important than the Output Tokens Throughput and Time to first token metrics. Will you also publish average / minimum / maximum end-to-end times too?

评论 #39441573 未加载

Klaus23over 1 year ago

The demo is pretty cool, but the mobile interface could be a parody of bad interface design. The text box at the top is hard to reach if you want to open the keyboard, which automatically closes, or press the button to send the question, and the chat history is out of chronological order for no logical reason.Edit: Text selection is also broken.

评论 #39436216 未加载

qwertoxover 1 year ago

How come the answers for Mixtral 8x7B-32k and Llama 2 70B-4k are identical?After asking via Mixtral a couple of questions I switched to Llama, and while it shows Llama as the Model used for the response, the answer is identical.See first and last question:<a href="https://pastebin.com/ZQV10C8Q" rel="nofollow">https://pastebin.com/ZQV10C8Q</a>

评论 #39436293 未加载

nilayjover 1 year ago

How is the Token/second calculated? I ask it a simple prompt and the model generated a 150 word (about 300 tokens?) answer in 17 seconds, then mentioning the speed of 408T/s.Also, I guess this demo would feel real time if you could stream the outputs to the UI? Can this be done in your current setup?

uptownfunkover 1 year ago

It is fast, but if it spits useless garbage, then useless. I don't mind waiting for chatGPT, the quality of what it produces is quite remarkable, and I am excited to see it better. I think this has more to do with mistral model v GPT4 than Groq. If Groq can host GPT4, wow, then that is amazing.

itsmechaseover 1 year ago

Incredible tool. The Mixtral 8x7B model running on their hardware did 491.40 T/s for me…

nojsabout 1 year ago

This is extremely impressive - no login, extremely fast, and Mixtral quality is very good. It's already more useful than my (paid) GPT4 for many things due to the speed.

kopirganover 1 year ago

Just a minor gripe the bullet option doesn't seem to be logical..When I asked about Marco Polo's travels and used Modify to add bullets, it added China, Pakistan etc as children of Iran. And the same for other paragraphs.

评论 #39436033 未加载

idiotlogicalover 1 year ago

I asked it "What is the oldest known written recipe" and then asked for that recipe. It seemed to repeat the same response a bunch of times (very quickly!) before stopping. Is this abnormal behavior?

评论 #39434086 未加载

monkinover 1 year ago

It's impressive, but I have one problem with all of those models. I wanted them to answer what Mixtral or Llama2 are, but with no luck. It would be great if models could at least describe themselves.

评论 #39435450 未加载

jprdover 1 year ago

This is super impressive. The rate of iteration and innovation in this space means that just as I'm feeling jaded/bored/oversaturated - some new project makes my jaw drop again.

blyatperkeleover 1 year ago

Amazingly fast, but I don't like that the only option for signing up is a Google account. Are you planning to implement some simple authentication using maybe just an email?

FpUserover 1 year ago

O M GIt is fast, like instant. It is straight to the point comparatively to others. It answered few of my programming questions to create particular code and passed with flying colors.Conclusion: shut up and take my money

ohwellishover 1 year ago

I wish there was an option to export whole session chat, say in plaintext as a link to some pastebin, that chat I just had with groq would have some ppl I know really impressed

yzhover 1 year ago

Really impressive work! I wonder how easy would it be to support (a future open source version of) SORA using Groq's design. Will there be a Video Processing Unit (VPU)?

评论 #39433667 未加载

tagyroover 1 year ago

I (only) ran a couple of prompts but I am impressed. It has the speed of gpt 3.5 and the quality of gpt 4.Seriously considering switching from [open]AI to Mix/s/tral in my apps.

评论 #39435080 未加载

评论 #39436187 未加载

QuesnayJrover 1 year ago

I tried it out, and I was taken aback how quickly it answered.

supercharger9over 1 year ago

Ignoring latency but not throughput, How does this compare in terms of Cost ( cards Acquisition cost and Power needed) with Nvidia GPU for inference?

评论 #39436026 未加载

keeshondover 1 year ago

I see XTX is one of the investors - any potential use cases that require deterministic computation that you can talk about beyond just inference?

评论 #39437239 未加载

cheptsovover 1 year ago

Any chance you plan to offer the API to cloud LPUs? And not just the LLM API? It would be cool run custom code (training, serving, etc).

评论 #39439075 未加载

lukevpover 1 year ago

I’m not sure how, but I got the zoom messed up on iOS and I can no longer see the submit button. Refreshing doesn’t fix it.

mise_en_placeover 1 year ago

Do you have any plans to support bringing your own model? I have been using Sagemaker but it is very slow to deploy to.

评论 #39434130 未加载

blackoilover 1 year ago

If Nvidia adds L1/2/3 cache in next gen of AI cards, will they work similar or is this something more?

newscluesover 1 year ago

I asked it what carmacks AI company was called and it corrected identified John carmack but said he was working on VR.

FindNInDarkover 1 year ago

Hi, thanks for this fascinating demo. I am wondering how this architecture optimizes for the softmax part.

aphitover 1 year ago

This is incredibly fast, indeed.What are the current speeds in T/s for say ChatGPT 3.5 or ChatGPT 4? Just how much faster is this?

评论 #39431860 未加载

fatkamover 1 year ago

For me, it was fast when it started printing (it did almost instantly), but it took forever for it to start.

评论 #39434146 未加载

keeshondover 1 year ago

I see XTX is one of the investors. Any potential other use cases with async logic beyond just inference?

评论 #39437618 未加载

jereeesover 1 year ago

I’ll pay $xx a month if I can talk to Groq the way I can talk to ChatGPT with my AirPods

评论 #39442202 未加载

Keyframeover 1 year ago

This is insane. Congratulations!

sorokodover 1 year ago

Not clear if it is due to Groq or to Mixtral, but confident hallucinations are there.

评论 #39431672 未加载

评论 #39431877 未加载

评论 #39431790 未加载

joaquincabezasover 1 year ago

are there also experiments around image embedding generation to use in combination with the LLM?maybe for this use-case is it better to execute the vision tower on a GPU and leave the LPU for the language part?

评论 #39435960 未加载

mlconnorover 1 year ago

omg. i can’t believe how incredibly fast that is. and capable too. wow

评论 #39436008 未加载

ttulover 1 year ago

Have you experimented with running diffusion models on Groq hardware?

评论 #39434673 未加载

ionwakeover 1 year ago

Holy smokes this is fast

评论 #39435909 未加载

nojvekover 1 year ago

I’m sure Elon is pissed since he has Grok.Someone now needs to make a Groc

评论 #39437629 未加载

Cheer2171over 1 year ago

What's the underlying hardware for this?

评论 #39431348 未加载

评论 #39431617 未加载

youssefabdelmover 1 year ago

Do you guys provide logprobs via the api?

评论 #39434108 未加载

dariobarilaover 1 year ago

Wow! So fast!

doubtfuluserover 1 year ago

Nice… a startup that has two “C” positions CEO and Chief Legal Officer…That sounds like a fun place to be

评论 #39436064 未加载

评论 #39432841 未加载

fennecbuttover 1 year ago

Tried it out, seriously impressive. I'm sure you welcome the detractors but as someone who doesn't work for or have any investments in AI, colour me impressed.Though with the price of the hardware, I'll probably mess with the API for now. Give us a bell when the hardware is consumer friendly, ha ha.

sylwareover 1 year ago

any noscript/basic (x)html prompt?

LoganDarkover 1 year ago

Please when/where can I buy some of these for home use? Otherwise is there any way to get access to the API without being a large company building a partner product? I would love this for personal use.

cchanceover 1 year ago

Jesus that makes chatgpt and even gemini seem slow AF

评论 #39431296 未加载

jackblemmingover 1 year ago

Impressive work. Nice job team. This is big.

评论 #39431700 未加载

xnorswapover 1 year ago

Interesting, I ran my "prime test", it passed on generating a prime, it failed on it being large.> "Can you generate me a large prime please"> Sure, here's a large prime number: 193707721> This number is a prime number because it is only divisible by 1 and itself. You can check its primality by trying to divide it by all the numbers up to its square root, and if none of them divide it evenly, then it is a prime number. This number is quite large, but there are much larger prime numbers known. For example, the largest known prime number, as of January 2022, is 2^82,589,933−1, a number with 24,862,048 digits, discovered as part of the Great Internet Mersenne Prime Search (GIMPS).I then prompted "That's not large" which seemed to break it, the response is stuck in a loop:> That isn't large> I apologize if the prime number I provided did not meet your expectations for "large." Prime numbers can be very large indeed, and there is no upper limit to how large a prime number can be.If you are looking for a very large prime number, here is one with a billion digits:100000303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303030303...It's printed many more pages of 303030... but hacker news has a post length limit!Further testing, such as prompting to generate a 1024 bit prime also runs into odd loops.> Can you generate a 1024 bit prime?> Yes, I can generate a 1024-bit prime number for you. Here's one I just generated:> 115322512531875133213135311402305250340282302283103413642283221253115653221253112210221122112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112112 ( Loops 112 forever )

评论 #39431603 未加载

评论 #39431559 未加载

评论 #39431639 未加载

ionwakeover 1 year ago

Sorry if this is dumb but how is this different to Elons Grok? Was Groq chosen as a joke or homage ?

评论 #39493629 未加载

评论 #39435877 未加载

82 comments

eigenvalueover 1 year ago

评论 #39432727 未加载

评论 #39453110 未加载

评论 #39435049 未加载

评论 #39446199 未加载

评论 #39435583 未加载

评论 #39433118 未加载

评论 #39432980 未加载

treesciencebotover 1 year ago

评论 #39432163 未加载

评论 #39432384 未加载

评论 #39432119 未加载

评论 #39432033 未加载

评论 #39432974 未加载

评论 #39432364 未加载

评论 #39432517 未加载

karpathyover 1 year ago

评论 #39431888 未加载

评论 #39434543 未加载

评论 #39432333 未加载

评论 #39434378 未加载

评论 #39432687 未加载

评论 #39431770 未加载

评论 #39432291 未加载

tomeover 1 year ago

评论 #39431789 未加载

评论 #39431470 未加载

评论 #39432794 未加载

评论 #39431088 未加载

评论 #39429162 未加载

评论 #39431433 未加载

评论 #39493029 未加载

评论 #39432013 未加载

评论 #39432040 未加载

评论 #39431757 未加载

评论 #39432587 未加载

评论 #39432758 未加载

评论 #39435181 未加载

评论 #39431052 未加载

评论 #39431513 未加载

评论 #39433099 未加载

评论 #39433119 未加载

评论 #39580445 未加载

评论 #39433247 未加载

评论 #39432943 未加载

评论 #39430924 未加载

评论 #39431431 未加载

imiricover 1 year ago

评论 #39432432 未加载

评论 #39434837 未加载

评论 #39433971 未加载

charlie123hufftabout 1 year ago

neilvover 1 year ago

评论 #39433948 未加载

评论 #39433665 未加载

SeanAndersonover 1 year ago

评论 #39433929 未加载

评论 #39433932 未加载

评论 #39435411 未加载

评论 #39434544 未加载

评论 #39439637 未加载

评论 #39433876 未加载

sebzim4500over 1 year ago

So this has nothing to do with `Grok`, the model provided by x.ai?EDIT: Tried using it, very impressed with the speed.

评论 #39430989 未加载

评论 #39435850 未加载

评论 #39441915 未加载

评论 #39431465 未加载

eurekinover 1 year ago

评论 #39435899 未加载

charlie123hufftabout 1 year ago

Gcamover 1 year ago

CuriouslyCover 1 year ago

评论 #39433370 未加载

评论 #39431731 未加载

评论 #39431691 未加载

评论 #39436602 未加载

评论 #39437954 未加载

matanyalover 1 year ago

Hey y'all, we have a discord now for more discussion and announcements: <a href="https://discord.com/invite/TQcy5EBdCP" rel="nofollow">https://discord.com/invite/TQcy5EBdCP</a>

deepsquirrelnetover 1 year ago

评论 #39431331 未加载

anybodyzover 1 year ago

supercharger9over 1 year ago

Do they make money from LLM service or by selling hardware? Homepage is confusing without any reference to other products.

评论 #39433998 未加载

ppsreejithover 1 year ago

评论 #39429093 未加载

tandrover 1 year ago

codedokodeover 1 year ago

评论 #39435161 未加载

botanicalover 1 year ago

评论 #39438602 未加载

Havocover 1 year ago

That sort of speed will be amazing for code completion. Need to find a way to hook this into vscode somehome...

评论 #39440196 未加载

sva_over 1 year ago

评论 #39432058 未加载

roomeyover 1 year ago

Oh hell yes, this is the first "fast" one, superhuman fast.I know you gave suggestions of what to ask, but I threw a few curveballs and it was really good! Well done this is a big step forwards

zmmmmmover 1 year ago

mrg3_2013over 1 year ago

kimbochenover 1 year ago

aeyesover 1 year ago

Switching the model between Mixtral and Llama I get word for word the same responses. Is this expected?

评论 #39431976 未加载

评论 #39431992 未加载

totalhackover 1 year ago

评论 #39433440 未加载

评论 #39433346 未加载

deepnotderpover 1 year ago

This demo has more than 500 chips btw, it’s not exactly an apples to apples comparison with 1 GPU…

评论 #39432770 未加载

Aeolunover 1 year ago

评论 #39436593 未加载

deniz_tekalpover 1 year ago

mrtksnover 1 year ago

评论 #39431665 未加载

评论 #39431538 未加载

评论 #39432326 未加载

评论 #39431756 未加载

patapongover 1 year ago

评论 #39432555 未加载

ggnore7452over 1 year ago

评论 #39437059 未加载

评论 #39436256 未加载

评论 #39434333 未加载

ponywombatover 1 year ago

评论 #39441573 未加载

Klaus23over 1 year ago

评论 #39436216 未加载

qwertoxover 1 year ago

评论 #39436293 未加载

nilayjover 1 year ago

uptownfunkover 1 year ago

itsmechaseover 1 year ago

Incredible tool. The Mixtral 8x7B model running on their hardware did 491.40 T/s for me…

nojsabout 1 year ago

This is extremely impressive - no login, extremely fast, and Mixtral quality is very good. It's already more useful than my (paid) GPT4 for many things due to the speed.

kopirganover 1 year ago

评论 #39436033 未加载

idiotlogicalover 1 year ago

I asked it "What is the oldest known written recipe" and then asked for that recipe. It seemed to repeat the same response a bunch of times (very quickly!) before stopping. Is this abnormal behavior?

评论 #39434086 未加载

monkinover 1 year ago

It's impressive, but I have one problem with all of those models. I wanted them to answer what Mixtral or Llama2 are, but with no luck. It would be great if models could at least describe themselves.

评论 #39435450 未加载

jprdover 1 year ago

This is super impressive. The rate of iteration and innovation in this space means that just as I'm feeling jaded/bored/oversaturated - some new project makes my jaw drop again.

blyatperkeleover 1 year ago

Amazingly fast, but I don't like that the only option for signing up is a Google account. Are you planning to implement some simple authentication using maybe just an email?

FpUserover 1 year ago

ohwellishover 1 year ago

I wish there was an option to export whole session chat, say in plaintext as a link to some pastebin, that chat I just had with groq would have some ppl I know really impressed

yzhover 1 year ago

Really impressive work! I wonder how easy would it be to support (a future open source version of) SORA using Groq's design. Will there be a Video Processing Unit (VPU)?

评论 #39433667 未加载

tagyroover 1 year ago

I (only) ran a couple of prompts but I am impressed. It has the speed of gpt 3.5 and the quality of gpt 4.Seriously considering switching from [open]AI to Mix/s/tral in my apps.

评论 #39435080 未加载

评论 #39436187 未加载

QuesnayJrover 1 year ago

I tried it out, and I was taken aback how quickly it answered.

supercharger9over 1 year ago

Ignoring latency but not throughput, How does this compare in terms of Cost ( cards Acquisition cost and Power needed) with Nvidia GPU for inference?

评论 #39436026 未加载

keeshondover 1 year ago

I see XTX is one of the investors - any potential use cases that require deterministic computation that you can talk about beyond just inference?

评论 #39437239 未加载

cheptsovover 1 year ago

Any chance you plan to offer the API to cloud LPUs? And not just the LLM API? It would be cool run custom code (training, serving, etc).

评论 #39439075 未加载

lukevpover 1 year ago

I’m not sure how, but I got the zoom messed up on iOS and I can no longer see the submit button. Refreshing doesn’t fix it.

mise_en_placeover 1 year ago

Do you have any plans to support bringing your own model? I have been using Sagemaker but it is very slow to deploy to.

评论 #39434130 未加载

blackoilover 1 year ago

If Nvidia adds L1/2/3 cache in next gen of AI cards, will they work similar or is this something more?

newscluesover 1 year ago

I asked it what carmacks AI company was called and it corrected identified John carmack but said he was working on VR.

FindNInDarkover 1 year ago

Hi, thanks for this fascinating demo. I am wondering how this architecture optimizes for the softmax part.

aphitover 1 year ago

This is incredibly fast, indeed.What are the current speeds in T/s for say ChatGPT 3.5 or ChatGPT 4? Just how much faster is this?

评论 #39431860 未加载

fatkamover 1 year ago

For me, it was fast when it started printing (it did almost instantly), but it took forever for it to start.

评论 #39434146 未加载

keeshondover 1 year ago

I see XTX is one of the investors. Any potential other use cases with async logic beyond just inference?

评论 #39437618 未加载

jereeesover 1 year ago

I’ll pay $xx a month if I can talk to Groq the way I can talk to ChatGPT with my AirPods

评论 #39442202 未加载

Keyframeover 1 year ago

This is insane. Congratulations!

sorokodover 1 year ago

Not clear if it is due to Groq or to Mixtral, but confident hallucinations are there.

评论 #39431672 未加载

评论 #39431877 未加载

评论 #39431790 未加载

joaquincabezasover 1 year ago

评论 #39435960 未加载

mlconnorover 1 year ago

omg. i can’t believe how incredibly fast that is. and capable too. wow

评论 #39436008 未加载

ttulover 1 year ago

Have you experimented with running diffusion models on Groq hardware?

评论 #39434673 未加载

ionwakeover 1 year ago

Holy smokes this is fast

评论 #39435909 未加载

nojvekover 1 year ago

I’m sure Elon is pissed since he has Grok.Someone now needs to make a Groc

评论 #39437629 未加载

Cheer2171over 1 year ago

What's the underlying hardware for this?

评论 #39431348 未加载

评论 #39431617 未加载

youssefabdelmover 1 year ago

Do you guys provide logprobs via the api?

评论 #39434108 未加载

dariobarilaover 1 year ago

Wow! So fast!

doubtfuluserover 1 year ago

Nice… a startup that has two “C” positions CEO and Chief Legal Officer…That sounds like a fun place to be

评论 #39436064 未加载

评论 #39432841 未加载

fennecbuttover 1 year ago

sylwareover 1 year ago

any noscript/basic (x)html prompt?

LoganDarkover 1 year ago

cchanceover 1 year ago

Jesus that makes chatgpt and even gemini seem slow AF

评论 #39431296 未加载

jackblemmingover 1 year ago

Impressive work. Nice job team. This is big.

评论 #39431700 未加载

xnorswapover 1 year ago

评论 #39431603 未加载

评论 #39431559 未加载

评论 #39431639 未加载

ionwakeover 1 year ago

Sorry if this is dumb but how is this different to Elons Grok? Was Groq chosen as a joke or homage ?

评论 #39493629 未加载

评论 #39435877 未加载