TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Do modern AI engines still need to do full re-trainings?

11 点作者 zepearl10 个月前
I learned about ~AI algorithms in the 90s: backprocessing &amp; clustering networks, and a little bit of genetic algos.<p>I then focused &amp; programmed &amp; played for a while with the model of the &quot;backpropagation&quot; network, until the early 2000&#x27; =&gt; it was fun, but not usable in my context. I then stopped fiddling with it and became inactive in this context.<p>An important property of a backpropagation network was (as much as I know) that it had to be fully re-trained whenever inputs changed (values of existing ones changed or inputs&#x2F;outputs were removed&#x2F;added).<p>Question:<p>Is it still like that for the currently fancy algos (the ones developed by Google&#x2F;Facebook&#x2F;OpenAI&#x2F;Xsomething&#x2F;...) or are they now better, so that they can now adapt without having to be fully retrained using the full set of (new&#x2F;up-to-date) training data?<p>Asking because I lost track of the progress in this area during the last 20 years and especially recently I understand nothing involving all new names (e.g. &quot;llama&quot;, etc...).<p>Thanks :)

2 条评论

Micoloth10 个月前
I think what you are referring to is the concept of “finetuning”. You use a pretrained network and add a (relatively) small set of new input-output pairs to steer it in a new direction.<p>It&#x27;s widely used, you can look it up.<p>A more challenging idea is whether it is possible to reuse the pretrained weights when training a network with a <i>different architecture</i> (maybe a bigger transformer with more heads, or something).<p>AFAIK this is not common practice, if you change the architecture you have to retrain from scratch. But given the cost of these trainings, I wouldn&#x27;t be surprised if OpenAI&amp;co had developed some technique to do this, eg across GPT versions..
评论 #41101706 未加载
vasili11110 个月前
Large Language models are pre-trained by creators on the huge data.<p>In many cases you do not need to do anything with LLM and you can just use it.<p>If they were not trained on the data that contains information that you are interested then you can use technique called RAG (Retrieval-Augmented Generation).<p>You also can do fine-tuning which is kind of training but on small amount of data.