TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

236 pointsby timbilt6 months ago

10 comments

K0balt6 months ago
So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.<p>These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.<p>Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.<p>Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)
评论 #42087315 未加载
评论 #42086865 未加载
评论 #42086586 未加载
pwillia76 months ago
This tracks with my feelings making and using Stable Diffusion Loras and fine tunes. Still, with the speed to train and use, Loras have worked for me in most use cases and it hasn&#x27;t been worth fine tuning the entire model.
评论 #42086413 未加载
Der_Einzige6 months ago
This paper seems dubious, because it flies in the face of what the reft&#x2F;pyreft paper is showing (you can use 0.0001% of the parameters trained for 100 epochs to personalize on a small dataset):<p><a href="https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;pyreft">https:&#x2F;&#x2F;github.com&#x2F;stanfordnlp&#x2F;pyreft</a><p><a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.03592" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2404.03592</a><p>Note that the OP paper is not peer reviewed yet, and while the one I linked isn&#x27;t either, it has Christopher Manning (yes, the one you know from youtube), the head of AI at Stanford, as a co-author.<p>In general, I think that Lora and especially reft should be more resistant to catastrophic forgetting due to them literally not impacting most of the model.<p>The Stable Diffusion community has literally tens of thousands of lora&#x27;s that don&#x27;t cripple a model at small rank.
评论 #42090886 未加载
Eisenstein6 months ago
Is this just specifying what has been known, that LoRAs skew towards the new training heavily and are not &#x27;more intelligent&#x27; just &#x27;more targeted&#x27; and become less intelligent the more they are targeted? Or is this proposing something else? I am having a difficult time understanding exactly what &#x27;intruder dimensions&#x27; are.
sorenjan6 months ago
&gt; We randomly initialize A such that it has singular values of 1, freeze it, and only train B. When we do this, we see a sharp reduction in high ranking intruder dimensions in comparison to those in normal LoRA<p>This sounds interesting, but I can&#x27;t see that they do much with this result. Are they saving it for a follow up paper? I would think that if their whole paper is about a big problem with LoRAs and they then find what looks like an easy solution for that problem that would warrant more than a paragraph just before the conclusion.<p>It would also have been interesting if they included the DoRA method, they reference it briefly and that paper claims to resemble fine tuning learning behavior.<p>But perhaps this paper is focused on LoRA behavior, and a separate paper comparing various improvements is better.
评论 #42089032 未加载
deskr6 months ago
What an unfortunate choice of name. LoRa is already a big project.
评论 #42090549 未加载
评论 #42090172 未加载
评论 #42090078 未加载
viktour196 months ago
&gt; LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution.<p>The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don&#x27;t understand but we know parameter count is the key. When you finetune with lora, you&#x27;re updating maybe 5% of the parameters, I really don&#x27;t think there is an illusion of equivalence in the field.
评论 #42089755 未加载
评论 #42088827 未加载
评论 #42088587 未加载
blacklion6 months ago
Each time I see &quot;LoRA&quot; in title I want to click it. Until I understand, that it is about DNN and not LoRa long distance radio modulation.
danielhanchen6 months ago
TLDR: 1. Use alpha = 2*rank<p>2. Don&#x27;t use too small ranks (rank=1 to 8)<p>3. Sensational title. Better title &quot;LoRA works if done right&quot;<p>4. Didn&#x27;t test SVD init
评论 #42091725 未加载
idorosen6 months ago
Jacob Andreas is one of the smartest people I’ve ever met.