At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.
Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?<p>I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.<p>Because I doubt it's as simple as just 'python run.py' to get it going.
"Base model trained on a large amount of text data, not fine-tuned for any particular task."<p>Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.
I'd be very curious to see how it performs especially on inputs that's blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.
Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn't that a bit unusual? There's also only 21 seeders right now according to <a href="https://checker.openwebtorrent.com/" rel="nofollow">https://checker.openwebtorrent.com/</a>
Model weights on huggingface: <a href="https://huggingface.co/xai-org/grok-1" rel="nofollow">https://huggingface.co/xai-org/grok-1</a>
Hey, asking any experts here, what are their first thoughts in the significance of this?<p>IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?<p>The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.
blog post: <a href="https://x.ai/blog/grok-os" rel="nofollow">https://x.ai/blog/grok-os</a><p><pre><code> * 314B parameters (86B active at a time)
* mixture of experts 8 (2 active at a time)
* weights and architecture licensed under Apache 2.0
</code></pre>
(edit:) announcement blog post from last year
with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: <a href="https://x.ai/blog/grok" rel="nofollow">https://x.ai/blog/grok</a><p>(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models
> Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code<p>What type of machine do you need to play around with this?
How long before the <i>Groq</i> team sues for trademark violation? It's literally the purpose of trademark laws to make sure resembling names do not cause confusion in the mind of customers so it would be very surprising to see this situation persist.
One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).
"The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model."<p>Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"
I think everyone should realize the following realities of the LLM market<p>1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill<p>2. For SOTA LLM's, keeping it closed and proprietary is the strategic play<p>If grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.
This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.<p>But anyway, it always great to see more LLM weigts available.
In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:<p>1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.<p>2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.<p>3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.<p>4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.<p>I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?<p>In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.
It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.
This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.<p>What is the practical use of this repo?
I am not sure what open source models are accomplishing another than killing the lead from the competition (openai), only to give it to someone else who has expertise in the area of distribution. This will be yet another good addition to systems like Amazon BedRock.
Honestly the most interesting part is taking a peek at the kind of AI researcher working for Twitter after the objectively messy layoffs and subsequent crunch. I notice neither of them has Twitter mentioned on their GitHub, which is prolly for the best to avoid harassment lol.<p>Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly <i>is</i> worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun
If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.