TechEcho

9 comments

kromemover 1 year ago

I've been increasingly wondering if the field considering LLMs as a continuum as opposed to a set of distinct thresholds is leading to erroneous "rules of thumb" as most research on methodology is concentrated in smaller and more accessible model experimentation right now.We generally recognize (nearly ad nauseum) that mouse models of medical research don't necessarily translate to humans.Similarly, I'd imagine most would laugh at the idea that a neurology researcher who found the best way to get a fruit fly's brain to navigate a maze should extrapolate that methodology to a dolphin or a chimp's brain.Maybe we should be defining "weight classes" for LLMs and grouping research based on those classes. Like "these are the techniques that work best for lightweight models" but not necessarily assuming those as a general rule of thumb for "heavyweight models."Even something like the discussion of synthetic data on model collapse is a good example of where there might be a very significant difference in the effect on model quality for a cheaper and less sophisticated model generating synthetic data to feed back into itself and a much more complex and sophisticated model. Maybe the lesson is actually "recursive training on synthetic data leads to model collapse in lightweight and medium weight models."So while the writeup is a great one on fine tuning 7B models with LoRA, I would be curious just what % of the recommendations hold true in replication for even just a 65B model.

评论 #38341392 未加载

nlover 1 year ago

This is an exceptionally useful article. A few highlights:* QLoRA works really well compared to LoRA if you need to save memory (at the cost of time)* For small LoRAs, Adam has almost no memory usage penalty compared to SGD* Multiple training epochs lower performance (!). To quote: "This performance decline is likely due to increased overfitting, which warrants additional investigation." (Note that this is LoRA overfitting, and unclear which layers it was enabled for for this experiment).* The best results for alpha and r parameters in LoRA seems to be alpha = 2r.* Better datasets are much better. 1k LIMA gives better results than 50k Alpaca

评论 #38343518 未加载

hkonstiover 1 year ago

LoRA blew me away the first time I looked into it. Especially since you can host many LoRA adapters at once for a fraction of the cost of hosting an entire model by sharing the base between the adapters. I built a little tool to make LoRA fine-tuning easier. The adapters export to Huggingface. You can check it out here: <a href="https://app.haven.run">https://app.haven.run</a>

评论 #38340698 未加载

Nevin1901over 1 year ago

I fine tuned LLama-2 on code/comment generation (in python) for around $2 and was able to run it natively on an m1 macbook air. I can totally see smaller fine tuned LLM's being used locally on consumer devices in the future. I think people underestimate how cheap and efficient this stuff is.I've actually built a service which lets you fine tune LLama-2/other llms by uploading a JSON dataset. I'm looking for feedback, the link is <a href="https://useftn.com" rel="nofollow noreferrer">https://useftn.com</a>.

评论 #38342752 未加载

评论 #38342076 未加载

simonwover 1 year ago

I'm still waiting for someone to publish a "LoRA in ten steps" document, with working code, aimed at impatient people like me.

评论 #38344844 未加载

millerhooksover 1 year ago

I’ve been thinking about ways to compress information with ai for long distance transmission with LoRA radio for a while and now this LoRA in the news gets me all confused.

sandGorgonover 1 year ago

what is the toolset that works best ?axolotl is generally recommended...but unsure if that is what is genuinely the best for production scale finetuning.

xgbzvzjover 1 year ago

शगहस ह्षब डीबीएनएस

评论 #38346737 未加载

behnamohover 1 year ago

Ever since the author paywalled some of his useful posts, I stopped following him. I have read his ML book and I know he used to be a professor and is now working in the industry, and he’s quite famous in the field. That’s why I don’t understand why such a figure would even need the extra income generated by Substack’s paywall.

评论 #38340014 未加载

评论 #38340047 未加载

9 comments

kromemover 1 year ago

评论 #38341392 未加载

nlover 1 year ago

评论 #38343518 未加载

hkonstiover 1 year ago

评论 #38340698 未加载

Nevin1901over 1 year ago

评论 #38342752 未加载

评论 #38342076 未加载

simonwover 1 year ago

I'm still waiting for someone to publish a "LoRA in ten steps" document, with working code, aimed at impatient people like me.

评论 #38344844 未加载

millerhooksover 1 year ago

I’ve been thinking about ways to compress information with ai for long distance transmission with LoRA radio for a while and now this LoRA in the news gets me all confused.

sandGorgonover 1 year ago

what is the toolset that works best ?axolotl is generally recommended...but unsure if that is what is genuinely the best for production scale finetuning.

Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation)

9 comments

Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation)

9 comments