TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

236 点作者 timbilt6 个月前

10 条评论

K0balt6 个月前
So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.<p>These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.<p>Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.<p>Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)
评论 #42087315 未加载
评论 #42086865 未加载
评论 #42086586 未加载
pwillia76 个月前
This tracks with my feelings making and using Stable Diffusion Loras and fine tunes. Still, with the speed to train and use, Loras have worked for me in most use cases and it hasn&#x27;t been worth fine tuning the entire model.
评论 #42086413 未加载
Der_Einzige6 个月前
This paper seems dubious, because it flies in the face of what the reft&#x2F;pyreft paper is showing (you can use 0.0001% of the parameters trained for 100 epochs to personalize on a small dataset):<p><a href="https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;pyreft">https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;pyreft</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.03592" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.03592</a><p>Note that the OP paper is not peer reviewed yet, and while the one I linked isn&#x27;t either, it has Christopher Manning (yes, the one you know from youtube), the head of AI at Stanford, as a co-author.<p>In general, I think that Lora and especially reft should be more resistant to catastrophic forgetting due to them literally not impacting most of the model.<p>The Stable Diffusion community has literally tens of thousands of lora&#x27;s that don&#x27;t cripple a model at small rank.
评论 #42090886 未加载
Eisenstein6 个月前
Is this just specifying what has been known, that LoRAs skew towards the new training heavily and are not &#x27;more intelligent&#x27; just &#x27;more targeted&#x27; and become less intelligent the more they are targeted? Or is this proposing something else? I am having a difficult time understanding exactly what &#x27;intruder dimensions&#x27; are.
sorenjan6 个月前
&gt; We randomly initialize A such that it has singular values of 1, freeze it, and only train B. When we do this, we see a sharp reduction in high ranking intruder dimensions in comparison to those in normal LoRA<p>This sounds interesting, but I can&#x27;t see that they do much with this result. Are they saving it for a follow up paper? I would think that if their whole paper is about a big problem with LoRAs and they then find what looks like an easy solution for that problem that would warrant more than a paragraph just before the conclusion.<p>It would also have been interesting if they included the DoRA method, they reference it briefly and that paper claims to resemble fine tuning learning behavior.<p>But perhaps this paper is focused on LoRA behavior, and a separate paper comparing various improvements is better.
评论 #42089032 未加载
deskr6 个月前
What an unfortunate choice of name. LoRa is already a big project.
评论 #42090549 未加载
评论 #42090172 未加载
评论 #42090078 未加载
viktour196 个月前
&gt; LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution.<p>The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don&#x27;t understand but we know parameter count is the key. When you finetune with lora, you&#x27;re updating maybe 5% of the parameters, I really don&#x27;t think there is an illusion of equivalence in the field.
评论 #42089755 未加载
评论 #42088827 未加载
评论 #42088587 未加载
blacklion6 个月前
Each time I see &quot;LoRA&quot; in title I want to click it. Until I understand, that it is about DNN and not LoRa long distance radio modulation.
danielhanchen6 个月前
TLDR: 1. Use alpha = 2*rank<p>2. Don&#x27;t use too small ranks (rank=1 to 8)<p>3. Sensational title. Better title &quot;LoRA works if done right&quot;<p>4. Didn&#x27;t test SVD init
评论 #42091725 未加载
idorosen6 个月前
Jacob Andreas is one of the smartest people I’ve ever met.