TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Pretraining data enables narrow selection capabilities in transformer models

65 点作者 hislaziness超过 1 年前

11 条评论

wavelander超过 1 年前
I&#x27;m not sure folks who&#x27;re putting out strong takes based on this have read this paper.<p>This paper uses GPT-2 transformer scale, on sinusoidal data:<p>&gt;We trained a decoder-only Transformer [7] model of GPT-2 scale implemented in the Jax based machine learning framework, Pax4 with 12 layers, 8 attention heads, and a 256-dimensional embedding space (9.5M parameters) as our base configuration [4].<p>&gt; Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of (x,f(x)) pairs rather than natural language.<p>Nowhere near definitive or conclusive.<p>Not sure why this is news outside of the Twitter-techno-pseudo-academic-influencer bubble.
评论 #38179568 未加载
评论 #38182378 未加载
评论 #38180597 未加载
评论 #38180460 未加载
mediaman超过 1 年前
There have been two criticisms of this paper floating around.<p>1. The test mechanism is to use prediction of sinusoidal series. While it&#x27;s certainly possible to train transformers on mathematical functions, it&#x27;s not clear why findings from a model trained on sinusoidal functions would generalize into the domain of written human language (which is ironic, given the paper&#x27;s topic).<p>2. Even if it were true that these models don&#x27;t generalize beyond their training, large LLMs&#x27; training corpus is basically all of written human knowledge. So then the goalpost has been moved to &quot;well, they won&#x27;t push the frontier of human knowledge forward,&quot; which seems to be a much diminished claim, since the vast majority of humans are also not pushing the frontier of human knowledge forward and instead use existing human knowledge to accomplish their daily goals.
评论 #38180217 未加载
ldjkfkdsjnv超过 1 年前
My uneducated opinion is that this paper is bullocks. Maybe they are looking at deeper mathemtical results, instead of every day tasks.<p>But every single day I am using OpenAI GPT4 to handle novel tasks. I am working on a traditional saas vertical, except with a pure chatbot. The model works, is able to understand which function to call, to extract which parameters, and to know when the inputs will not work. Sure, if you ask it to do some extraneous task, it fails.<p>Google&#x2F;Deep Mind need to start showing up with some working results.<p>Where. are. the. models. google.
评论 #38179350 未加载
评论 #38179370 未加载
评论 #38179939 未加载
评论 #38179641 未加载
Der_Einzige超过 1 年前
We humans don&#x27;t even know when we are doing real extrapolation, and the vast majority of humans are interpolating. I bet many do nothing but interpolate their whole lives.<p>So - and I say this as someone who writes NLP papers too - who cares?
MeImCounting超过 1 年前
The one thing is that they seem to be using relatively small models. This may be a really damning result but I was under the impression that any generalization capabilities of LLMs appear in a non-linear fashion when you increase the parameter count to the tens of billions&#x2F;trillions as in GPT4. It would be interesting if they could recreate the same experiment with a much larger model. Unfortunately I dont think thats likely to happen because of the resources required to train such models and the anti-open-source hysteria likely preventing larger models from being made publicly available much less the data they were trained on. Imagine that, stifling research and fearmongering reduces the usefulness of the science that does manage to get done.
lsh123超过 1 年前
Current AI models are approximation functions with huge number of parameters. These approximation functions are reasonably good at interpolation, meh at extrapolation, and have nothing to do with generalization.
评论 #38180028 未加载
og_kalu超过 1 年前
&quot;Generalization&quot; is always a data problem.<p>If you trained it on one function class, of course that&#x27;s all it learned to do. That&#x27;s all it ever saw!<p>If you want to learn arbitrary function classes to some degree, the solution is simple. Train it on many different function classes.<p>Untrained models are as blank slate as you could possibly imagine. They&#x27;re not even comparable to new born humans with millions of years of evolution baked in. The data you feed them is their world. Their only world.
pizza超过 1 年前
FWIW the paper title is focuses on quite a different conclusion than the submission title: Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models
simbolit超过 1 年前
Why are transformer models so bad at math? They often fail at simple addition.
评论 #38179615 未加载
评论 #38179657 未加载
评论 #38179698 未加载
评论 #38179536 未加载
评论 #38180045 未加载
评论 #38179538 未加载
评论 #38179537 未加载
评论 #38179545 未加载
评论 #38179490 未加载
datadrivenangel超过 1 年前
Overfitting much?
simbolit超过 1 年前
TLDR: transformer models (on gpt2 scale) are great (near-optimal) at interpolating between the cases given in (pre-)training, but as soon as we leave the training domain fail at extrapolation. Impressive results may be more due to the wide breadth of (pre-)training data, and less due to generalization ability.
评论 #38179815 未加载