科技回声

10 条评论

K0balt6 个月前

So, in layman’s terms, LoRa appears to “traumatize “ the model to some degree, connecting the vector space with strong “jumpers” (intruder dimensions) to change it’s behavior, instead of subtly conforming the entire model into a shape that accommodates the new data.These jumpers or shortcuts do create connections between the relevant new concepts in the model, but by directly connecting them instead of associating them through the existing network of concepts, nuance is lost and the bypassed areas become deemphasized, leading to forgetting of previously held associations.Because of this, In general, fine tuning produces better results than LoRa in most cases, especially when forgetting of existing training is detrimental.Or, to further oversimplify the issue in SE terms, LoRa == monkeypatching. (Is this a kind of intruder dimension?)

评论 #42087315 未加载

评论 #42086865 未加载

评论 #42086586 未加载

pwillia76 个月前

This tracks with my feelings making and using Stable Diffusion Loras and fine tunes. Still, with the speed to train and use, Loras have worked for me in most use cases and it hasn't been worth fine tuning the entire model.

评论 #42086413 未加载

Der_Einzige6 个月前

This paper seems dubious, because it flies in the face of what the reft/pyreft paper is showing (you can use 0.0001% of the parameters trained for 100 epochs to personalize on a small dataset):<a href="https://github.com/stanfordnlp/pyreft">https://github.com/stanfordnlp/pyreft</a><a href="https://arxiv.org/abs/2404.03592" rel="nofollow">https://arxiv.org/abs/2404.03592</a>Note that the OP paper is not peer reviewed yet, and while the one I linked isn't either, it has Christopher Manning (yes, the one you know from youtube), the head of AI at Stanford, as a co-author.In general, I think that Lora and especially reft should be more resistant to catastrophic forgetting due to them literally not impacting most of the model.The Stable Diffusion community has literally tens of thousands of lora's that don't cripple a model at small rank.

评论 #42090886 未加载

Eisenstein6 个月前

Is this just specifying what has been known, that LoRAs skew towards the new training heavily and are not 'more intelligent' just 'more targeted' and become less intelligent the more they are targeted? Or is this proposing something else? I am having a difficult time understanding exactly what 'intruder dimensions' are.

sorenjan6 个月前

> We randomly initialize A such that it has singular values of 1, freeze it, and only train B. When we do this, we see a sharp reduction in high ranking intruder dimensions in comparison to those in normal LoRAThis sounds interesting, but I can't see that they do much with this result. Are they saving it for a follow up paper? I would think that if their whole paper is about a big problem with LoRAs and they then find what looks like an easy solution for that problem that would warrant more than a paragraph just before the conclusion.It would also have been interesting if they included the DoRA method, they reference it briefly and that paper claims to resemble fine tuning learning behavior.But perhaps this paper is focused on LoRA behavior, and a separate paper comparing various improvements is better.

评论 #42089032 未加载

deskr6 个月前

What an unfortunate choice of name. LoRa is already a big project.

评论 #42090549 未加载

评论 #42090172 未加载

评论 #42090078 未加载

viktour196 个月前

> LoRA and full fine-tuning, with equal performance on the fine-tuning task, can have solutions with very different generalization behaviors outside the fine-tuning task distribution.The ability for nnets to generalize is inherently tied to their trainable parameter count via mechanisms we don't understand but we know parameter count is the key. When you finetune with lora, you're updating maybe 5% of the parameters, I really don't think there is an illusion of equivalence in the field.

评论 #42089755 未加载

评论 #42088827 未加载

评论 #42088587 未加载

blacklion6 个月前

Each time I see "LoRA" in title I want to click it. Until I understand, that it is about DNN and not LoRa long distance radio modulation.

danielhanchen6 个月前

TLDR: 1. Use alpha = 2*rank2. Don't use too small ranks (rank=1 to 8)3. Sensational title. Better title "LoRA works if done right"4. Didn't test SVD init

评论 #42091725 未加载

idorosen6 个月前

Jacob Andreas is one of the smartest people I’ve ever met.

10 条评论

K0balt6 个月前

评论 #42087315 未加载

评论 #42086865 未加载

评论 #42086586 未加载

pwillia76 个月前

评论 #42086413 未加载

Der_Einzige6 个月前

评论 #42090886 未加载

Eisenstein6 个月前

sorenjan6 个月前

评论 #42089032 未加载

deskr6 个月前

What an unfortunate choice of name. LoRa is already a big project.

评论 #42090549 未加载

评论 #42090172 未加载

评论 #42090078 未加载

viktour196 个月前

评论 #42089755 未加载

评论 #42088827 未加载

评论 #42088587 未加载

blacklion6 个月前

Each time I see "LoRA" in title I want to click it. Until I understand, that it is about DNN and not LoRa long distance radio modulation.

danielhanchen6 个月前

TLDR: 1. Use alpha = 2*rank2. Don't use too small ranks (rank=1 to 8)3. Sensational title. Better title "LoRA works if done right"4. Didn't test SVD init

评论 #42091725 未加载

idorosen6 个月前

Jacob Andreas is one of the smartest people I’ve ever met.

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

10 条评论

LoRA vs. Full Fine-Tuning: An Illusion of Equivalence

10 条评论