TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Megatron-Turing NLG 530B, the World’s Largest Generative Language Model

116 点作者 selimonder超过 3 年前

18 条评论

cs702超过 3 年前
So we now have models with 0.5 trillion parameters, each the weight of a connection in a neural network.<p>Trillion-parameter models are surely within reach in the near term -- and that&#x27;s only within two orders of magnitude of the number of synapses in the human brain, which is in the hundreds of trillions, give or take. To paraphrase the popular saying, a trillion here, a trillion there, and pretty soon you&#x27;re talking really big numbers.<p>I know the figures are not comparable apples-to-apples, but still, I find myself <i>in awe</i> looking at how far we&#x27;ve come in just the last few years, to the point that we&#x27;re realistically contemplating the possibility of seeing dense neural networks with hundreds of trillions of parameters used for real-world applications in our lifetime.<p>We sure live in interesting times.
评论 #28838012 未加载
评论 #28832476 未加载
评论 #28845979 未加载
评论 #28831028 未加载
评论 #28851453 未加载
评论 #28834964 未加载
评论 #28831165 未加载
评论 #28832745 未加载
评论 #28834950 未加载
评论 #28831352 未加载
bane超过 3 年前
What&#x27;s really interesting is that these models are using some non-trivial portion of all easily accessible human writing -- yet humans learn language really well with significantly less input data. What&#x27;s missing in the field to replicate human performance in learning?
评论 #28832906 未加载
评论 #28832366 未加载
评论 #28833680 未加载
评论 #28836354 未加载
评论 #28837980 未加载
petters超过 3 年前
Training data has 0.339T tokens, less than the number of training parameters. A model like that could store all of the training text with 100B+ parameters left for computation.
评论 #28834870 未加载
评论 #28831628 未加载
评论 #28833025 未加载
评论 #28836465 未加载
评论 #28832893 未加载
xnx超过 3 年前
This reminds me a little bit of the early 2000&#x27;s where search engines would list the number of indexed pages on their homepage. For language models, does large = good? I&#x27;m guessing the quality of the corpus matters as much.
评论 #28830992 未加载
评论 #28831497 未加载
评论 #28830840 未加载
posharma超过 3 年前
This is great. Now, how do we inference these models economically? It appears there&#x27;s some kind of competition to train larger and larger models, but the inferencing side of the story seems to be neglected?
评论 #28832166 未加载
评论 #28831660 未加载
miket超过 3 年前
<a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Wu_Dao" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Wu_Dao</a>
bobm_kite9超过 3 年前
I guess I&#x27;m interested to see if this performs qualitatively better than GPT-3, given how many more parameters it has.<p>However, I think this is really a dead-end: throwing more hardware at this is just going to generate better-sounding nonsense. Yes, we are learning the &quot;model&quot; of the English language - which words go with which others, but successively larger transformer models don&#x27;t really expose much more about the nature of intelligent conversation.<p>I think we need a better algorithm now.
评论 #28849221 未加载
评论 #28838262 未加载
putlake超过 3 年前
China&#x27;s WuDao model had 1.75T parameters and Google&#x27;s Switch transformer had 1.6T. How is this the world&#x27;s largest then?
captn3m0超过 3 年前
Interesting that books3 and The Pile are among the largest corpus used for training - both with copyright concerns.
评论 #28835536 未加载
savant_penguin超过 3 年前
Really cool!<p>I&#x27;d love to see a table comparing the results against the other gigantic models (I know could Google the other results and merge them together but no thanks)
rustc超过 3 年前
Has there been any update on the legality of using this kind of model? Is it ok to just crawl the web, take any content you want, train a model and sell access to the model like OpenAI&#x2F;GPT-3&#x2F;GitHub Copilot?
评论 #28831553 未加载
sonic-boom超过 3 年前
Any idea if they’ll release an API similar to GPT-3? It’s great that larger and larger models are trained but without enabling access to the trained models developers are left out from the progress…
评论 #28835154 未加载
RhysU超过 3 年前
So, uh, what do the seed-to-seed variance studies look like on a network of this size? Surely someone trained 100 to see the distribution. <i>ducks</i>
trash3超过 3 年前
How has the previous largest model, gpt3, generateda value? How much better is this model at those tasks?
评论 #28835972 未加载
bpiche超过 3 年前
Wonder how much compute it would cost to train this thing, if you weren&#x27;t Nvidia..
macrolime超过 3 年前
Will anyone outside of Nvidia be able to access it? GPT-3 at least has an API.
评论 #28835915 未加载
评论 #28833078 未加载
canjobear超过 3 年前
What&#x27;s the perplexity?
moochi超过 3 年前
all these models are over-hyped. we are nowhere close to AGI until we can come up with a reasonable definition for consciousness
评论 #28839179 未加载