TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Outperforming larger language models with less training data and smaller models

320 pointsby atg_abhishekover 1 year ago

12 comments

fbnbrover 1 year ago
I think smaller expert models will dominate the majority of applications. there is an optimum and fine balance to strike when it comes to size and usability. There will be many mechanisms like demonstrated in the post to find that optimum and realize it.
评论 #37610317 未加载
pedrovhbover 1 year ago
Interesting that they use T5 for the distilled model. I was under the impression that encoder-decoder architectures were on the way of the Dodo, but it seems they may still be relevant after all.<p>Also interesting is that this isn&#x27;t an inconceivably clever and out of the box idea. It shows there&#x27;s still a lot of low hanging fruit to explore, and the future of LLMs isn&#x27;t set in stone yet. Could be that the real deal is a mixture of experts trained in this style. It&#x27;s exciting that it feels the holy grail is close to being achievable if only the right combination of ideas is tried.
评论 #37608116 未加载
评论 #37610638 未加载
评论 #37607360 未加载
xnxover 1 year ago
The amount of activity and progress in the LLM&#x2F;ML&#x2F;AI spaces is truly fantastic. Optimizations like this are particularly valuable when hardware (e.g. Nvidia) is so expensive.
avereveardover 1 year ago
So this <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.08410" rel="nofollow noreferrer">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2212.08410</a> but one year later
评论 #37612257 未加载
ilakshover 1 year ago
I am not a researcher, but it always seemed intuitive to me that the most effective models would be multimodal and trained with a core carefully tailored curriculum.<p>I would want to ensure that the system gains and retains the fundamental structures and skills that you know it needs to effectively and accurately generalize. While maintaining those things you then feed it lots of diverse data to learn the exceptions and ways the skills can be combined. But somehow you need to ensure those core skills and knowledge throughout. Maybe you could do that just by including outputting those understandings or manipulations in addition to the final answer. Similar to what the paper does.<p>For example, a code generation model might be required to output a state machine simulation of the requested program.
评论 #37608145 未加载
评论 #37617991 未加载
sinuhe69over 1 year ago
Why the amount of the training data for LLM is less than for the distilled and task-specific models (in the first figure)?<p>Or did the authors count the amount of training data for the LLMs to the required training data for the destined&#x2F;task-specific models?<p><a href="https:&#x2F;&#x2F;blogger.googleusercontent.com&#x2F;img&#x2F;b&#x2F;R29vZ2xl&#x2F;AVvXsEjeIs4yaBA3Ir55j869FMzdmRdf7OxiIjsWl05GU48ikYOHZGLk1H8tIHeKKBaY_xER0QITv5DUhADZvqS1os6mNA_nLQKqwW7DOXnwcnPl6BhsMJ_LKTvglGUrHR5_QC8MIe3K7i9zyfcWkwzvjPhXLifYijgkeeG_1yn9EMm-ol9eI9Cv_rz71wMyGfk2&#x2F;s1570&#x2F;image3.png" rel="nofollow noreferrer">https:&#x2F;&#x2F;blogger.googleusercontent.com&#x2F;img&#x2F;b&#x2F;R29vZ2xl&#x2F;AVvXsEj...</a>
评论 #37607056 未加载
ziofillover 1 year ago
Is it that a lot of capacity is unused in those behemoth LLMs, or that the smaller language model just mimics the reasoning task? (Mimics the mimicking?)
评论 #37607055 未加载
评论 #37609001 未加载
评论 #37614102 未加载
sourabh03agrover 1 year ago
Interesting! Do you think RLHF would be a necessity for smaller models to perform as par as state-of-the-art LLMs? In my view, instruction tuning will resolve any isssues related to output structure, tonality or the domain understanding but will it be enough to improve the reasoning capabilities of the smaller model?
threeseedover 1 year ago
&gt; For instance, serving a single 175 billion LLM requires at least 350GB of GPU memory using specialized infrastructure<p>Apple ships the Mac Studio which support up to 144GB of usable GPU memory.<p>Would be amusing if they were to release a Mac Pro with 300+ GB and dominate the LLM serving space.
评论 #37607535 未加载
评论 #37607281 未加载
评论 #37609412 未加载
评论 #37607393 未加载
评论 #37607344 未加载
scotty79over 1 year ago
I wonder if facebook could train LLM on all chat histories of all of its users.
greatpostmanover 1 year ago
Still waiting for google to release a model that matches gpt4. Until then, I’m assuming their presumed ai supremacy is marketing
评论 #37607366 未加载
评论 #37608175 未加载
评论 #37609258 未加载
评论 #37611884 未加载
评论 #37609970 未加载
评论 #37608304 未加载
评论 #37610302 未加载
p-e-wover 1 year ago
<i>&gt; given the input question “Sammy wanted to go to where the people are. Where might he go? Answer Choices: (a) populated areas, (b) race track, (c) desert, (d) apartment, (e) roadblock”, distilling step-by-step provides the correct answer to the question, “(a) populated areas”</i><p>Huh? My answer as a human would have been &quot;race track&quot;, as that is probably &quot;where the people are&quot; (during a race).<p>Did I fail? Am I a poor language model? Or is the whole thing just tea leaf reading to begin with?
评论 #37607988 未加载
评论 #37609482 未加载
评论 #37607687 未加载
评论 #37610308 未加载