TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Grok

1170 pointsby pierreabout 1 year ago

36 comments

extheatabout 1 year ago
At 8x86B, looks like the largest open model yet by far. Would be interesting to hear how many tokens it's been trained on. Especially important for higher param models in order to efficiently utilize all those parameters.
评论 #39737922 未加载
评论 #39738856 未加载
评论 #39737607 未加载
ilakshabout 1 year ago
Has anyone outside of x.ai actually done inference with this model yet? And if so, have they provided details of the hardware? What type of AWS instance or whatever?<p>I think you can rent like an 8 x A100 or 8 x H100 and it&#x27;s &quot;affordable&quot; to play around with for at least a few minutes. But you would need to know exactly how to set up the GPU cluster.<p>Because I doubt it&#x27;s as simple as just &#x27;python run.py&#x27; to get it going.
评论 #39740817 未加载
评论 #39740852 未加载
评论 #39748642 未加载
simonwabout 1 year ago
&quot;Base model trained on a large amount of text data, not fine-tuned for any particular task.&quot;<p>Presumably the version they&#x27;ve been previewing on Twitter is an instruction-tuned model which behaves quite differently from these raw weights.
nasirabout 1 year ago
I&#x27;d be very curious to see how it performs especially on inputs that&#x27;s blocked by other models. Seems like Grok will differentiate itself from other OS models from a cencorship and alignment perspective.
评论 #39749203 未加载
nylonstrungabout 1 year ago
For what reason would you want to use this instead of open source alternatives like Mistral
评论 #39737457 未加载
评论 #39737671 未加载
评论 #39738999 未加载
pogueabout 1 year ago
Can someone explain why the weights are posted via a Bittorrent magnet link? I have no way to check the size at the moment, but isn&#x27;t that a bit unusual? There&#x27;s also only 21 seeders right now according to <a href="https:&#x2F;&#x2F;checker.openwebtorrent.com&#x2F;" rel="nofollow">https:&#x2F;&#x2F;checker.openwebtorrent.com&#x2F;</a>
评论 #39737541 未加载
评论 #39742540 未加载
评论 #39738264 未加载
评论 #39737565 未加载
评论 #39737538 未加载
评论 #39741511 未加载
评论 #39738757 未加载
评论 #39737522 未加载
评论 #39741264 未加载
评论 #39737508 未加载
评论 #39737501 未加载
评论 #39738287 未加载
评论 #39737945 未加载
joydeep314about 1 year ago
Model weights on huggingface: <a href="https:&#x2F;&#x2F;huggingface.co&#x2F;xai-org&#x2F;grok-1" rel="nofollow">https:&#x2F;&#x2F;huggingface.co&#x2F;xai-org&#x2F;grok-1</a>
cl3mischabout 1 year ago
Love the minimal repo, magnet link, and stating &quot;open weights&quot; instead of &quot;open source&quot;. Refreshing!
评论 #39746190 未加载
stale2002about 1 year ago
Hey, asking any experts here, what are their first thoughts in the significance of this?<p>IE, is this comparable to any other model released, or are there significant metric differences that make it better for certain usecases?<p>The only thing I see, of the top of my head, is that it is a very large model, and I don&#x27;t think any models of similar size have been released.
评论 #39738180 未加载
评论 #39739673 未加载
评论 #39739327 未加载
modelessabout 1 year ago
Is this the first major model to be natively FP8? I was wondering why people hadn&#x27;t done it yet. Seems like a big win when hardware supports it.
评论 #39740869 未加载
toshabout 1 year ago
blog post: <a href="https:&#x2F;&#x2F;x.ai&#x2F;blog&#x2F;grok-os" rel="nofollow">https:&#x2F;&#x2F;x.ai&#x2F;blog&#x2F;grok-os</a><p><pre><code> * 314B parameters (86B active at a time) * mixture of experts 8 (2 active at a time) * weights and architecture licensed under Apache 2.0 </code></pre> (edit:) announcement blog post from last year with benchmarks compared to Claude 2, GPT-3.5 and GPT-4: <a href="https:&#x2F;&#x2F;x.ai&#x2F;blog&#x2F;grok" rel="nofollow">https:&#x2F;&#x2F;x.ai&#x2F;blog&#x2F;grok</a><p>(edit2:)TL;DR: somewhat comparable to GPT-3.5, Mixtral and Qwen-1.5-72B in capability but way larger than the open weight models
评论 #39737559 未加载
评论 #39737832 未加载
评论 #39737556 未加载
评论 #39737826 未加载
shantnutiwariabout 1 year ago
Those of us who dont spend all our time in LLMs-- whats this about? Whats the big deal and why is it on the front page at #1?
评论 #39747343 未加载
moralestapiaabout 1 year ago
Well, he delivered.
评论 #39737954 未加载
gardenhedgeabout 1 year ago
&gt; Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code<p>What type of machine do you need to play around with this?
评论 #39737534 未加载
评论 #39737493 未加载
评论 #39740909 未加载
simonwabout 1 year ago
Is there a model card anywhere? I&#x27;d like to know what it was trained on.
hubraumhugoabout 1 year ago
When will we reach an upper limit&#x2F;dimishing returns in terms of number of parameters and mixture of experts?
评论 #39737418 未加载
littlestymaarabout 1 year ago
How long before the <i>Groq</i> team sues for trademark violation? It&#x27;s literally the purpose of trademark laws to make sure resembling names do not cause confusion in the mind of customers so it would be very surprising to see this situation persist.
评论 #39738118 未加载
评论 #39740039 未加载
评论 #39737864 未加载
评论 #39737996 未加载
aussieguy1234about 1 year ago
How hard would it be for an open source group to fine tune this into a chatbot?
ArunRajaabout 1 year ago
Is this grok open sourcing really a big deal? How is this move beneficial for grok per se? Does it build trust as in other opensource products..?
LZ_Khanabout 1 year ago
How are people&#x27;s experience with this model? Having the most weights is one thing but being a better model than the 70B models is another.
评论 #39737807 未加载
评论 #39737715 未加载
andre-zabout 1 year ago
The only other Repository is a fork of Qdrant.
sqreeptabout 1 year ago
What are the languages supported by it?
评论 #39739333 未加载
rvnxabout 1 year ago
One subtle thing: Musk said &quot;open-source&quot;, we got &quot;open-weights&quot; instead (still better than nothing though, so it&#x27;s greatly appreciated).
评论 #39737513 未加载
评论 #39737693 未加载
评论 #39737469 未加载
评论 #39737496 未加载
评论 #39737503 未加载
评论 #39737482 未加载
captcanukabout 1 year ago
&quot;The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model.&quot;<p>Or perhaps release your actual code AND the simplified implementation instead of hiding it and saying &quot;you don&#x27;t know her, she goes to a different high school&quot;
评论 #39738293 未加载
atleastoptimalabout 1 year ago
I think everyone should realize the following realities of the LLM market<p>1. For sub-SOTA LLM&#x27;s, distribution&#x2F;marketing is more important than having a proprietary lock on capabilities. Open sourcing is a benefit for the firm, distincct from goodwill<p>2. For SOTA LLM&#x27;s, keeping it closed and proprietary is the strategic play<p>If grok were SOTA Elon never would have open sourced it. It&#x27;s not even SOTA within XAI. This is a marketing play to win public sentiment against OpenAI.
评论 #39739849 未加载
评论 #39739979 未加载
评论 #39740399 未加载
redskyluanabout 1 year ago
This seems not be a repo ready to open source. You only get weights, very less information about how the weights is trained and finetuned.<p>But anyway, it always great to see more LLM weigts available.
评论 #39738378 未加载
评论 #39738414 未加载
评论 #39738471 未加载
sashank_1509about 1 year ago
In all the debate about open source I don’t think people realize, this model is most likely not reproducible ever again even given the code. Here’s what you need to reproduce the model:<p>1. An exact snapshot of the data used, many companies don’t have this, you have rough dataset versions but remember if even 1 token is different, the model produced won’t be the same.<p>2. Data must be sent to the training algorithm in the exact same order as it was originally. So every data loader needs to be with a fixed random seed.<p>3. All the probabilistic parts of your model needs to have a fixed random seed. Here I’m thinking of stuff like dropout and for autoregressive models you might be sampling your previous output, you have to ensure they are properly seeded. Generally you do see fixed seeds in academic papers but it’s easy to miss stuff especially in distributed training jobs.<p>4. Here’s another interesting thing, you start your training job on 1000 GPUs and then suddenly 4 GPUs fail. What do you do? There might be deterministic ways to solve this but the standard approach is to discard all updates that that GPU was going to do and restart that GPU from scratch. You can see why this is a problem? Now if you want to reproduce this training you need to disable those GPU at the same time in the new training job to make this work.<p>I suspect there are even more things I didn’t think of that will make this model unique and irreproducible by training for eternity, almost like a human brain?<p>In fact the notion of exact reproducibility in the world of LLMs is silly, there is only approximate reproducibility, (models with similar scores in benchmarks) but nothing exact. That said I can see the value of releasing source code but I’m completely fine with grok not releasing it. Source code can reveal tricks that have not been published in papers yet that a company discovered to improve their model. Seeing the performance of Grok, I’m pretty confident there isn’t any great tricks to be found in their code so I don’t really care, I would be pretty curious about OpenAI’s or Anthropic’s source code though.
评论 #39740004 未加载
评论 #39740408 未加载
评论 #39740401 未加载
seccodeabout 1 year ago
It would be cool if these models had conversations with us where they ask questions. I think the future of AI is models that ask questions. There is so much data to be gained by doing this.
评论 #39737792 未加载
评论 #39738310 未加载
评论 #39737786 未加载
评论 #39738198 未加载
评论 #39737834 未加载
mattxxxabout 1 year ago
I respect the openness here! This is the future that I want to see
评论 #39738111 未加载
评论 #39738800 未加载
评论 #39739027 未加载
评论 #39738038 未加载
评论 #39738468 未加载
mvkelabout 1 year ago
This feels like a &quot;now we can say we&#x27;re open&quot; PR play rather than contributing much value to the open source community.<p>What is the practical use of this repo?
评论 #39740406 未加载
machiawelicznyabout 1 year ago
If they are so behind they could make it open source instead of open weights and get some help.
评论 #39737780 未加载
评论 #39738372 未加载
评论 #39739029 未加载
orsenthilabout 1 year ago
I am not sure what open source models are accomplishing another than killing the lead from the competition (openai), only to give it to someone else who has expertise in the area of distribution. This will be yet another good addition to systems like Amazon BedRock.
评论 #39738040 未加载
评论 #39738136 未加载
评论 #39738937 未加载
评论 #39738113 未加载
2devnullabout 1 year ago
From issues: “Well the magnet file contains a 300GB checkpoint “<p>That’s why they are using a torrent I suppose.
评论 #39737592 未加载
arduanikaabout 1 year ago
CODE_OF_CONDUCT.md has only five words. :)
评论 #39737702 未加载
评论 #39737683 未加载
评论 #39737700 未加载
评论 #39737678 未加载
评论 #39737749 未加载
bborabout 1 year ago
Honestly the most interesting part is taking a peek at the kind of AI researcher working for Twitter after the objectively messy layoffs and subsequent crunch. I notice neither of them has Twitter mentioned on their GitHub, which is prolly for the best to avoid harassment lol.<p>Code wise, excited to see if this could grow into anything! I think it’s pretty clear that Grok didn’t have nearly enough investment to be a top model so Elon “sacrificed” it on a whim in his schoolyard spat with OpenAI, but I’m not complaining. I’ve always took Elon on his word that he truly <i>is</i> worried about centralization of AI, and I don’t think any of the emails released by his schoolmate Altman dissuade me of that. So I have some reasonable hope that he uses some of his immense resources to start “fighting the good fight” here with Le Cun
评论 #39737965 未加载
评论 #39737841 未加载
greenpizza13about 1 year ago
If we just stop looking at Elon, he will lose his power. Why oh why do we keep giving him attention? There are plenty of great models out there that _aren&#x27;t_ backed by maniacs.
评论 #39746475 未加载