TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Takeaways from hundreds of LLM finetuning experiments with LoRA

258 点作者 rasbt超过 1 年前

8 条评论

lappa超过 1 年前
Great work, lots of useful information here.<p>The only thing I wish you did different was explored alpha &gt; 2 * r.<p>In this blog post, the author found that alpha of 4 * r (where r=64) outperformed all smaller alphas in terms of loss when finetuning Llama-7b on databricks-dolly-15k.<p><a href="https:&#x2F;&#x2F;medium.com&#x2F;@drishtisharma96505&#x2F;comparative-analysis-of-lora-parameters-on-llama-2-with-flash-attention-574b913295d4" rel="nofollow noreferrer">https:&#x2F;&#x2F;medium.com&#x2F;@drishtisharma96505&#x2F;comparative-analysis-...</a><p>Additionally you identify (alpha = 2*r) r=16 as inferior to r=256, however aside from arithmetic, r=16 actually outperforms all others. And the base model outperforms any finetuning for both arithmetic metrics.
QuantumG超过 1 年前
Seemed to take terrible performing models and marginally improve them. None of them are fit for purpose at the end of the training, so what was the point?
评论 #37877871 未加载
评论 #37885759 未加载
carbocation超过 1 年前
I&#x27;d like to see a writeup of LORAs for something that is readily tractable without LORAs. E.g., a pretrained ResNet34 ImageNet model that gets a LORA instead of being fine-tuned or fully re-trained. The pedagogical value is that it can be compared to the alternatives which are tractable in this setting (and which are not in an LLM setting).
评论 #37872626 未加载
评论 #37877885 未加载
评论 #37873003 未加载
Der_Einzige超过 1 年前
LoRAs are, like most fine-tuning, a spectrum.<p>LoRAs can be nearly the same size as the original model, with nearly the same representation capacity&#x2F;trainability, or they can be a tiny tiny fraction of the original model size, with correspondingly fewer learnable parameters.<p>As such, they are suitable for nearly all tasks. We should be asking if they are better than regular fine-tuning, or soft-prompts (aka textual inversion), or slapping new trainable layers on the end (aka hypernetworks). The stable diffusion community seems to think that they are.
munro超过 1 年前
This is amazing, thank you!!<p>&gt; My hypothesis is that the Alpaca dataset does not contain any related arithmetic tasks, and the model actively unlearns basic arithmetic when it focuses more on other tasks.<p>I&#x27;m surprised this wasn&#x27;t verified, it&#x27;s a major benchmark stat. My eyes keep getting drawn to it, because it seems to have the most variance. Does anyone know?<p>Also throwing it out, I would love to see a Neptune&#x2F;Wnb of the hyperparameter tuning :)
评论 #37872849 未加载
评论 #37872841 未加载
评论 #37874787 未加载
stan_kirdey超过 1 年前
claude2 tldr: Here are a few key points I gathered from the article:<p>The article explores optimizing LoRA hyperparameter settings for finetuning large language models. The goal is to maximize performance while minimizing memory usage and training time.<p>The base model used is Llama-2 7B. Experiments compare default LoRA, QLoRA (4-bit quantized), AdamW, SGD optimizers, and different choices for rank r and alpha hyperparameters.<p>Key findings:<p>QLoRA provides substantial memory savings (6GB less than default LoRA) at the cost of slower training. Performance impact is minor.<p>AdamW vs SGD makes little difference in memory or performance.<p>Increasing training iterations from 50k to 100k hurts performance, likely because the Alpaca dataset lacks diversity.<p>Tuning rank r and alpha is most impactful. Good rule of thumb is to set alpha=2*r. Best model uses r=256, alpha=512. Improves over base model on most tasks, except arithmetic.<p>The optimized LoRA model was submitted to the NeurIPS efficiency challenge and showed improvements on several benchmarks compared to the base Llama-2 model.<p>Takeaways are practical tips for tuning LoRA hyperparameters and trading off memory, compute, and model performance.
snitty超过 1 年前
LoRA, not LoRa.<p>I was VERY confused for a minute.
评论 #37875741 未加载
naillo超过 1 年前
Good article but my god why is there like a 2 second delay before every user interaction on this page. Scrolling, selecting text, everything has some kind of 2 second &#x27;bootup&#x27; time after which things work normally but if you stop interacting with it for a bit it goes back to some idle mode. Really weird.
评论 #37872735 未加载
评论 #37872151 未加载
评论 #37872625 未加载