TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Turing-NLG: A 17B-parameter language model

367 点作者 XnoiVeX超过 5 年前

16 条评论

corporateslave5超过 5 年前
People are vastly underestimating the changes that are about to come from NLP. The basic ideas of how to get language models working are just about in place. Transformer networks, and recent innovations like GPT-2, googles reformer model, etc are precursors to the real machine learning boom. Machine learning as we have known it, has been stuck as an optimization tool, and used for computer vision here and there. NLP, and with it, the ability to create, synthesize, and understand content, will change the internet.<p>More than that, I think NLP will unlock new ways of interacting with computers. Computers will be able to handle the ambiguity of human language, transcending their rigid “only do exactly what you tell them” models of the world.<p>Edit:<p>Adding this to give more technical context. I think most people don’t know where the line is currently between what possible, and what’s not, but also what we are on the cusp of. And we are on the cusp of a lot.<p>A quick explanation of one area is here:<p>Basically, transformer models are the best for NLP. They use something called attention based mechanisms, which allows the model to draw correlations between pieces of text&#x2F;tokens that are far apart. The issue is that this is an O(n^2) operation. So the model is bounded by the context window, which is currently mostly at 512 tokens, and is thus, bounded in how much it can understand. Recent innovations, and further study, will broaden the context window, and thus unlock better reading comprehension and context understanding. For instance, the ability to answer a question using a piece of text is mostly stuck at just finding one paragraph. The future will see models that can find multiple different paragraphs, understand how they relate, pull the relevant information, and synthesize it. This sounds like a minor step forwards, but its important. This will unlock better conversational abilities, but also, better ways to understand how different pieces of textual information relate. The scattershot of information across the internet can go away. Computers can better understand context to act on human intention through language, unlocking the ability to handle ambiguity. This will change the internet.<p>Again to empathize, these models only started showing up in 2017! The progress has been rapid.
评论 #22293590 未加载
评论 #22293059 未加载
评论 #22293492 未加载
评论 #22296756 未加载
评论 #22298609 未加载
评论 #22293642 未加载
评论 #22292881 未加载
评论 #22292988 未加载
评论 #22305244 未加载
评论 #22294006 未加载
评论 #22299975 未加载
评论 #22292491 未加载
评论 #22295046 未加载
评论 #22293265 未加载
评论 #22297931 未加载
评论 #22296651 未加载
评论 #22293364 未加载
评论 #22292759 未加载
评论 #22294439 未加载
评论 #22292709 未加载
saurkt超过 5 年前
One of the team members from Project Turing. Happy to answer any questions.
评论 #22296830 未加载
评论 #22296777 未加载
评论 #22292810 未加载
评论 #22293063 未加载
评论 #22293023 未加载
评论 #22293555 未加载
评论 #22294925 未加载
rjeli超过 5 年前
I have been bearish on AGI, but GPT2 surprised me with the lucidity of its samples.<p>My take from the past few years is that we&#x27;re 99% done with the visual cortex - convolutional nets can be trained to perform any visual task a human can in &lt;100ms. Now I&#x27;m mostly convinced that GPT2 has solved the language cortex, and can babble as well as we will ever need it to. We just need a prefrontal cortex (symbolic processing &#x2F; RL &#x2F; whatever your pet theory is) to drive the components, which is a problem we have not even started to solve. I am 90% sure it is a different class of problem and we won&#x27;t knock it out of the park in 5 years like the visual&#x2F;language cortexes, but we can hope.<p>edit: it&#x27;s possible cognition follows from language, which would be convenient. is GPT2 smarter than a dog? I don&#x27;t think so but I could be wrong ¯\_(ツ)_&#x2F;¯
评论 #22295027 未加载
eyegor超过 5 年前
I&#x27;ve always been interested in techniques to try to minimize parameters or alternate approaches to learning. Meanwhile, state of the art is over here just finding clever ways to make everything bigger. I have a feeling we&#x27;re going to end up with a very different landscape in 5-10 years, much like the automotive industry never started mass producing inline 12s and instead moved to turbos and superchargers.
freediver超过 5 年前
I can uderstand announcing this without code, but without a demo so anyone can try it in different scenarios?
评论 #22295097 未加载
0xff00ffee超过 5 年前
B = Billion, not Byte. For second I was like, WTF?
评论 #22293175 未加载
ragebol超过 5 年前
All these language generation models, in short, base their next word solely on the previous words, right? I&#x27;d expect that these generators can be conditioned on e.g. some fact (like in first order logic etc) to express something I want. This is roughly the inverse of for example Natural Language Understanding.<p>Does anything like this exist?
评论 #22297174 未加载
lowdose超过 5 年前
This does GPT-2 X 10. For anyone wondering what GPT-2 is doing look at this baffling subreddit and marvel at how one GPT-2 model trained for $70k spits out better comedy than everybody on the payroll of Netflix combined.<p><a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;SubSimulatorGPT2&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;SubSimulatorGPT2&#x2F;</a>
评论 #22293739 未加载
评论 #22293691 未加载
评论 #22293976 未加载
评论 #22293986 未加载
评论 #22293723 未加载
评论 #22389078 未加载
评论 #22294603 未加载
评论 #22293975 未加载
评论 #22294659 未加载
评论 #22293673 未加载
Tenoke超过 5 年前
I expect we&#x27;ll see some very interesting, very big models following it. I didn&#x27;t dig too far into the code but the library looks very easy to use and will open up a lot of doors for people who have a few or a few thousand GPUs.
评论 #22293881 未加载
tuxguy超过 5 年前
<a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22291417" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=22291417</a>
FlyingCocoon超过 5 年前
At what stage of throwing compute &amp; data at the problem, diminishing return sets in?
bitL超过 5 年前
What GPU do I need to train it? Titan Mega RTX with 240GB of RAM?
评论 #22294689 未加载
danharaj超过 5 年前
If your model has 17 billion parameters, you missed some.
01100011超过 5 年前
How long until the language models stabilize enough that we can bake them into a low-cost, low-power chip for edge uses?
评论 #22294701 未加载
评论 #22292925 未加载
galkk超过 5 年前
Those summaries look impressive, although a bit repepetive
评论 #22293545 未加载
评论 #22292687 未加载
riku_iki超过 5 年前
unfortunately they abstained from participation in more popular SQuAD and Glue benchmarks..
评论 #22294535 未加载
评论 #22293065 未加载