科技回声

18 条评论

cs702超过 3 年前

So we now have models with 0.5 trillion parameters, each the weight of a connection in a neural network.Trillion-parameter models are surely within reach in the near term -- and that's only within two orders of magnitude of the number of synapses in the human brain, which is in the hundreds of trillions, give or take. To paraphrase the popular saying, a trillion here, a trillion there, and pretty soon you're talking really big numbers.I know the figures are not comparable apples-to-apples, but still, I find myself in awe looking at how far we've come in just the last few years, to the point that we're realistically contemplating the possibility of seeing dense neural networks with hundreds of trillions of parameters used for real-world applications in our lifetime.We sure live in interesting times.

评论 #28838012 未加载

评论 #28832476 未加载

评论 #28845979 未加载

评论 #28831028 未加载

评论 #28851453 未加载

评论 #28834964 未加载

评论 #28831165 未加载

评论 #28832745 未加载

评论 #28834950 未加载

评论 #28831352 未加载

bane超过 3 年前

What's really interesting is that these models are using some non-trivial portion of all easily accessible human writing -- yet humans learn language really well with significantly less input data. What's missing in the field to replicate human performance in learning?

评论 #28832906 未加载

评论 #28832366 未加载

评论 #28833680 未加载

评论 #28836354 未加载

评论 #28837980 未加载

petters超过 3 年前

Training data has 0.339T tokens, less than the number of training parameters. A model like that could store all of the training text with 100B+ parameters left for computation.

评论 #28834870 未加载

评论 #28831628 未加载

评论 #28833025 未加载

评论 #28836465 未加载

评论 #28832893 未加载

xnx超过 3 年前

This reminds me a little bit of the early 2000's where search engines would list the number of indexed pages on their homepage. For language models, does large = good? I'm guessing the quality of the corpus matters as much.

评论 #28830992 未加载

评论 #28831497 未加载

评论 #28830840 未加载

posharma超过 3 年前

This is great. Now, how do we inference these models economically? It appears there's some kind of competition to train larger and larger models, but the inferencing side of the story seems to be neglected?

评论 #28832166 未加载

评论 #28831660 未加载

miket超过 3 年前

<a href="https://en.wikipedia.org/wiki/Wu_Dao" rel="nofollow">https://en.wikipedia.org/wiki/Wu_Dao</a>

bobm_kite9超过 3 年前

I guess I'm interested to see if this performs qualitatively better than GPT-3, given how many more parameters it has.However, I think this is really a dead-end: throwing more hardware at this is just going to generate better-sounding nonsense. Yes, we are learning the "model" of the English language - which words go with which others, but successively larger transformer models don't really expose much more about the nature of intelligent conversation.I think we need a better algorithm now.

评论 #28849221 未加载

评论 #28838262 未加载

putlake超过 3 年前

China's WuDao model had 1.75T parameters and Google's Switch transformer had 1.6T. How is this the world's largest then?

captn3m0超过 3 年前

Interesting that books3 and The Pile are among the largest corpus used for training - both with copyright concerns.

评论 #28835536 未加载

savant_penguin超过 3 年前

Really cool!I'd love to see a table comparing the results against the other gigantic models (I know could Google the other results and merge them together but no thanks)

rustc超过 3 年前

Has there been any update on the legality of using this kind of model? Is it ok to just crawl the web, take any content you want, train a model and sell access to the model like OpenAI/GPT-3/GitHub Copilot?

评论 #28831553 未加载

sonic-boom超过 3 年前

Any idea if they’ll release an API similar to GPT-3? It’s great that larger and larger models are trained but without enabling access to the trained models developers are left out from the progress…

评论 #28835154 未加载

RhysU超过 3 年前

So, uh, what do the seed-to-seed variance studies look like on a network of this size? Surely someone trained 100 to see the distribution. ducks

trash3超过 3 年前

How has the previous largest model, gpt3, generateda value? How much better is this model at those tasks?

评论 #28835972 未加载

bpiche超过 3 年前

Wonder how much compute it would cost to train this thing, if you weren't Nvidia..

macrolime超过 3 年前

Will anyone outside of Nvidia be able to access it? GPT-3 at least has an API.

评论 #28835915 未加载

评论 #28833078 未加载

canjobear超过 3 年前

What's the perplexity?

moochi超过 3 年前

all these models are over-hyped. we are nowhere close to AGI until we can come up with a reasonable definition for consciousness

评论 #28839179 未加载

18 条评论

cs702超过 3 年前

评论 #28838012 未加载

评论 #28832476 未加载

评论 #28845979 未加载

评论 #28831028 未加载

评论 #28851453 未加载

评论 #28834964 未加载

评论 #28831165 未加载

评论 #28832745 未加载

评论 #28834950 未加载

评论 #28831352 未加载

bane超过 3 年前

评论 #28832906 未加载

评论 #28832366 未加载

评论 #28833680 未加载

评论 #28836354 未加载

评论 #28837980 未加载

petters超过 3 年前

Training data has 0.339T tokens, less than the number of training parameters. A model like that could store all of the training text with 100B+ parameters left for computation.

评论 #28834870 未加载

评论 #28831628 未加载

评论 #28833025 未加载

评论 #28836465 未加载

评论 #28832893 未加载

xnx超过 3 年前

评论 #28830992 未加载

评论 #28831497 未加载

评论 #28830840 未加载

posharma超过 3 年前

评论 #28832166 未加载

评论 #28831660 未加载

miket超过 3 年前

<a href="https://en.wikipedia.org/wiki/Wu_Dao" rel="nofollow">https://en.wikipedia.org/wiki/Wu_Dao</a>

bobm_kite9超过 3 年前

评论 #28849221 未加载

评论 #28838262 未加载

putlake超过 3 年前

China's WuDao model had 1.75T parameters and Google's Switch transformer had 1.6T. How is this the world's largest then?

captn3m0超过 3 年前

Interesting that books3 and The Pile are among the largest corpus used for training - both with copyright concerns.

评论 #28835536 未加载

savant_penguin超过 3 年前

Really cool!I'd love to see a table comparing the results against the other gigantic models (I know could Google the other results and merge them together but no thanks)

rustc超过 3 年前

评论 #28831553 未加载

sonic-boom超过 3 年前

评论 #28835154 未加载

RhysU超过 3 年前

So, uh, what do the seed-to-seed variance studies look like on a network of this size? Surely someone trained 100 to see the distribution. ducks

trash3超过 3 年前

How has the previous largest model, gpt3, generateda value? How much better is this model at those tasks?

评论 #28835972 未加载

bpiche超过 3 年前

Wonder how much compute it would cost to train this thing, if you weren't Nvidia..

macrolime超过 3 年前

Will anyone outside of Nvidia be able to access it? GPT-3 at least has an API.

评论 #28835915 未加载

评论 #28833078 未加载

canjobear超过 3 年前

What's the perplexity?

moochi超过 3 年前

all these models are over-hyped. we are nowhere close to AGI until we can come up with a reasonable definition for consciousness

评论 #28839179 未加载

Megatron-Turing NLG 530B, the World’s Largest Generative Language Model

18 条评论

Megatron-Turing NLG 530B, the World’s Largest Generative Language Model

18 条评论