Grok

1170 点作者 pierre大约 1 年前

36 条评论

extheat大约 1 年前

At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.

评论 #39737922 未加载

评论 #39738856 未加载

评论 #39737607 未加载

ilaksh大约 1 年前

Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?I think you can rent like an 8 x A100 or 8 x H100 and it's "affordable" to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.Because I doubt it's as simple as just 'python run.py' to get it going.

评论 #39740817 未加载

评论 #39740852 未加载

评论 #39748642 未加载

simonw大约 1 年前

"Base model trained on a large amount of text data, not fine-tuned for any particular task."Presumably the version they've been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.

nasir大约 1 年前

I'd be very curious to see how it performs especially on inputs that's blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.

评论 #39749203 未加载

nylonstrung大约 1 年前

For what reason would you want to use this instead of open source alternatives like Mistral

评论 #39737457 未加载

评论 #39737671 未加载

评论 #39738999 未加载

pogue大约 1 年前

Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn't that a bit unusual? There's also only 21 seeders right now according to <a href="https://checker.openwebtorrent.com/" rel="nofollow">https://checker.openwebtorrent.com/</a>

评论 #39737541 未加载

评论 #39742540 未加载

评论 #39738264 未加载

评论 #39737565 未加载

评论 #39737538 未加载

评论 #39741511 未加载

评论 #39738757 未加载

评论 #39737522 未加载

评论 #39741264 未加载

评论 #39737508 未加载

评论 #39737501 未加载

评论 #39738287 未加载

评论 #39737945 未加载

joydeep314大约 1 年前

Model weights on huggingface: <a href="https://huggingface.co/xai-org/grok-1" rel="nofollow">https://huggingface.co/xai-org/grok-1</a>

cl3misch大约 1 年前

Love the minimal repo, magnet link, and stating "open weights" instead of "open source". Refreshing!

评论 #39746190 未加载

stale2002大约 1 年前

Hey, asking any experts here, what are their first thoughts in the significance of this?IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?The only thing I see, of the top of my head, is that it is a very large model, and I don't think any models of similar size have been released.

评论 #39738180 未加载

评论 #39739673 未加载

评论 #39739327 未加载

modeless大约 1 年前

Is this the first major model to be natively FP8? I was wondering why people hadn't done it yet. Seems like a big win when hardware supports it.

评论 #39740869 未加载

tosh大约 1 年前

blog post: <a href="https://x.ai/blog/grok-os" rel="nofollow">https://x.ai/blog/grok-os</a><pre><code> * 314B parameters (86B active at a time) * mixture of experts 8 (2 active at a time) * weights and architecture licensed under Apache 2.0 </code></pre> (edit:) announcement blog post from last year with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: <a href="https://x.ai/blog/grok" rel="nofollow">https://x.ai/blog/grok</a>(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models

评论 #39737559 未加载

评论 #39737832 未加载

评论 #39737556 未加载

评论 #39737826 未加载

shantnutiwari大约 1 年前

Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?

评论 #39747343 未加载

moralestapia大约 1 年前

Well, he delivered.

评论 #39737954 未加载

gardenhedge大约 1 年前

> Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example codeWhat type of machine do you need to play around with this?

评论 #39737534 未加载

评论 #39737493 未加载

评论 #39740909 未加载

simonw大约 1 年前

Is there a model card anywhere? I'd like to know what it was trained on.

hubraumhugo大约 1 年前

When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?

评论 #39737418 未加载

littlestymaar大约 1 年前

How long before the Groq team sues for trademark violation? It's literally the purpose of trademark laws to make sure resembling names do not cause confusion in the mind of customers so it would be very surprising to see this situation persist.

评论 #39738118 未加载

评论 #39740039 未加载

评论 #39737864 未加载

评论 #39737996 未加载

aussieguy1234大约 1 年前

How hard would it be for an open source group to fine tune this into a chatbot?

ArunRaja大约 1 年前

Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?

LZ_Khan大约 1 年前

How are people's experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.

评论 #39737807 未加载

评论 #39737715 未加载

andre-z大约 1 年前

The only other Repository is a fork of Qdrant.

sqreept大约 1 年前

What are the languages supported by it?

评论 #39739333 未加载

rvnx大约 1 年前

One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).

评论 #39737513 未加载

评论 #39737693 未加载

评论 #39737469 未加载

评论 #39737496 未加载

评论 #39737503 未加载

评论 #39737482 未加载

captcanuk大约 1 年前

"The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model."Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying "you don't know her, she goes to a different high school"

评论 #39738293 未加载

atleastoptimal大约 1 年前

I think everyone should realize the following realities of the LLM market1. For sub-SOTA LLM's, distribution/marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill2. For SOTA LLM's, keeping it closed and proprietary is the strategic playIf grok were SOTA Elon never would have open sourced it. It's not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.

评论 #39739849 未加载

评论 #39739979 未加载

评论 #39740399 未加载

redskyluan大约 1 年前

This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.But anyway, it always great to see more LLM weigts available.

评论 #39738378 未加载

评论 #39738414 未加载

评论 #39738471 未加载

sashank_1509大约 1 年前

In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.

评论 #39740004 未加载

评论 #39740408 未加载

评论 #39740401 未加载

seccode大约 1 年前

It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.

评论 #39737792 未加载

评论 #39738310 未加载

评论 #39737786 未加载

评论 #39738198 未加载

评论 #39737834 未加载

mattxxx大约 1 年前

I respect the openness here! This is the future that I want to see

评论 #39738111 未加载

评论 #39738800 未加载

评论 #39739027 未加载

评论 #39738038 未加载

评论 #39738468 未加载

mvkel大约 1 年前

This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.What is the practical use of this repo?

评论 #39740406 未加载

machiaweliczny大约 1 年前

If they are so behind they could make it open source instead of open weights and get some help.

评论 #39737780 未加载

评论 #39738372 未加载

评论 #39739029 未加载

orsenthil大约 1 年前

I am not sure what open source models are accomplishing another than killing the lead from the competition (openai), only to give it to someone else who has expertise in the area of distribution. This will be yet another good addition to systems like Amazon BedRock.

评论 #39738040 未加载

评论 #39738136 未加载

评论 #39738937 未加载

评论 #39738113 未加载

2devnull大约 1 年前

From issues: “Well the magnet file contains a 300GB checkpoint “That’s why they are using a torrent I suppose.

评论 #39737592 未加载

arduanika大约 1 年前

CODE_OF_CONDUCT.md has only five words. :)

评论 #39737702 未加载

评论 #39737683 未加载

评论 #39737700 未加载

评论 #39737678 未加载

评论 #39737749 未加载

bbor大约 1 年前

Honestly the most interesting part is taking a peek at the kind of AI researcher working for Twitter after the objectively messy layoffs and subsequent crunch. I notice neither of them has Twitter mentioned on their GitHub, which is prolly for the best to avoid harassment lol.Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly is worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun

评论 #39737965 未加载

评论 #39737841 未加载

greenpizza13大约 1 年前

If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.

评论 #39746475 未加载

36 条评论

extheat大约 1 年前

评论 #39737922 未加载

评论 #39738856 未加载

评论 #39737607 未加载

ilaksh大约 1 年前

评论 #39740817 未加载

评论 #39740852 未加载

评论 #39748642 未加载

simonw大约 1 年前

nasir大约 1 年前

评论 #39749203 未加载

nylonstrung大约 1 年前

For what reason would you want to use this instead of open source alternatives like Mistral

评论 #39737457 未加载

评论 #39737671 未加载

评论 #39738999 未加载

pogue大约 1 年前

评论 #39737541 未加载

评论 #39742540 未加载

评论 #39738264 未加载

评论 #39737565 未加载

评论 #39737538 未加载

评论 #39741511 未加载

评论 #39738757 未加载

评论 #39737522 未加载

评论 #39741264 未加载

评论 #39737508 未加载

评论 #39737501 未加载

评论 #39738287 未加载

评论 #39737945 未加载

joydeep314大约 1 年前

Model weights on huggingface: <a href="https://huggingface.co/xai-org/grok-1" rel="nofollow">https://huggingface.co/xai-org/grok-1</a>

cl3misch大约 1 年前

Love the minimal repo, magnet link, and stating "open weights" instead of "open source". Refreshing!

评论 #39746190 未加载

stale2002大约 1 年前

评论 #39738180 未加载

评论 #39739673 未加载

评论 #39739327 未加载

modeless大约 1 年前

Is this the first major model to be natively FP8? I was wondering why people hadn't done it yet. Seems like a big win when hardware supports it.

评论 #39740869 未加载

tosh大约 1 年前

评论 #39737559 未加载

评论 #39737832 未加载

评论 #39737556 未加载

评论 #39737826 未加载

shantnutiwari大约 1 年前

Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?

评论 #39747343 未加载

moralestapia大约 1 年前

Well, he delivered.

评论 #39737954 未加载

gardenhedge大约 1 年前

评论 #39737534 未加载

评论 #39737493 未加载

评论 #39740909 未加载

simonw大约 1 年前

Is there a model card anywhere? I'd like to know what it was trained on.

hubraumhugo大约 1 年前

When will we reach an upper limit/dimishing returns in terms of number of parameters and mixture of experts?

评论 #39737418 未加载

littlestymaar大约 1 年前

评论 #39738118 未加载

评论 #39740039 未加载

评论 #39737864 未加载

评论 #39737996 未加载

aussieguy1234大约 1 年前

How hard would it be for an open source group to fine tune this into a chatbot?

ArunRaja大约 1 年前

Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?

LZ_Khan大约 1 年前

How are people's experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.

评论 #39737807 未加载

评论 #39737715 未加载

andre-z大约 1 年前

The only other Repository is a fork of Qdrant.

sqreept大约 1 年前

What are the languages supported by it?

评论 #39739333 未加载

rvnx大约 1 年前

One subtle thing: Musk said "open-source", we got "open-weights" instead (still better than nothing though, so it's greatly appreciated).

评论 #39737513 未加载

评论 #39737693 未加载

评论 #39737469 未加载

评论 #39737496 未加载

评论 #39737503 未加载

评论 #39737482 未加载

captcanuk大约 1 年前

评论 #39738293 未加载

atleastoptimal大约 1 年前

评论 #39739849 未加载

评论 #39739979 未加载

评论 #39740399 未加载

redskyluan大约 1 年前

评论 #39738378 未加载

评论 #39738414 未加载

评论 #39738471 未加载

sashank_1509大约 1 年前

评论 #39740004 未加载

评论 #39740408 未加载

评论 #39740401 未加载

seccode大约 1 年前

It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.

评论 #39737792 未加载

评论 #39738310 未加载

评论 #39737786 未加载

评论 #39738198 未加载

评论 #39737834 未加载

mattxxx大约 1 年前

I respect the openness here! This is the future that I want to see

评论 #39738111 未加载

评论 #39738800 未加载

评论 #39739027 未加载

评论 #39738038 未加载

评论 #39738468 未加载

mvkel大约 1 年前

This feels like a "now we can say we're open" PR play rather than contributing much value to the open source community.What is the practical use of this repo?

评论 #39740406 未加载

machiaweliczny大约 1 年前

If they are so behind they could make it open source instead of open weights and get some help.

评论 #39737780 未加载

评论 #39738372 未加载

评论 #39739029 未加载

orsenthil大约 1 年前

评论 #39738040 未加载

评论 #39738136 未加载

评论 #39738937 未加载

评论 #39738113 未加载

2devnull大约 1 年前

From issues: “Well the magnet file contains a 300GB checkpoint “That’s why they are using a torrent I suppose.

评论 #39737592 未加载

arduanika大约 1 年前

CODE_OF_CONDUCT.md has only five words. :)

评论 #39737702 未加载

评论 #39737683 未加载

评论 #39737700 未加载

评论 #39737678 未加载

评论 #39737749 未加载

bbor大约 1 年前

评论 #39737965 未加载

评论 #39737841 未加载

greenpizza13大约 1 年前

If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren't_ backed by maniacs.

评论 #39746475 未加载