TechEcho

17 comments

ignoramousover 1 year ago

I've been keeping track of the techniques through Maxime Labonne's LLMs 101: <a href="https://github.com/mlabonne/llm-course#4-supervised-fine-tuning">https://github.com/mlabonne/llm-course#4-supervised-fine-tun...</a>

评论 #39093419 未加载

denysvitaliover 1 year ago

LoRA != LoRa. I keep on getting confused and hate that they chose to reuse an existing acronym

评论 #39094920 未加载

评论 #39095071 未加载

评论 #39098168 未加载

评论 #39098003 未加载

评论 #39094385 未加载

rsweeney21over 1 year ago

It's still strange to me to work in a field of computer science where we say things like "we're not exactly sure how these numbers (hyper parameters) affect the result, so just try a bunch of different values and see which one works best."

评论 #39095542 未加载

评论 #39094333 未加载

评论 #39095596 未加载

评论 #39095305 未加载

评论 #39095128 未加载

评论 #39096747 未加载

评论 #39096341 未加载

评论 #39094022 未加载

评论 #39099613 未加载

评论 #39102813 未加载

评论 #39095660 未加载

评论 #39101470 未加载

评论 #39095644 未加载

评论 #39094676 未加载

chenxi9649over 1 year ago

It's still not too clear to me when we should fine tune versus RAG.In the past, I used to believe that finetuning is mostly for model behavioral change, but recently it seems that certain companies are also using fine-tuning for knowledge addition.What are the main use cases for fine tuning?

评论 #39093975 未加载

评论 #39094794 未加载

评论 #39094514 未加载

评论 #39099489 未加载

somethingsomeover 1 year ago

Nice article, I'm not in this field, however, my understanding of the original paper was that the LoRA was applied only on the last dense layer, and not to all independently (maybe I misread it originally).Digging a bit in why the implementation is like this in the link, I found that in QLoRA they used this and it seems to have some interesting effects, maybe adding a note on the QLoRA decision would be nice :)I'm not sure I understand why it works though, my neophyte view was that applying LoRA to the last layer made sense, but, I do not wrap my mind on the rationale of applying it repeadly to each linear layer. Can someone explain their intuition?

评论 #39095840 未加载

jamesblondeover 1 year ago

I prefer the not from scratch, but from configuration approach by Axolotl. Aolotl supports fine-tuning mistral, llama-2, with lots of the latest techniques - sample packing, flash attention, xformers.I concentrate on collecting and curating the fine-tuning data, do "data-centric" fine-tuning - not learning LoRA from scratch.

评论 #39096466 未加载

yandrypozoover 1 year ago

gotta say naming is hard I thought this was about LoRa (from "long range") or LoRaWAN, the IoT sensors communication.

helloericsfover 1 year ago

HN friends, What are the most popular libraries for fine-tuning? (Not from scratch)

评论 #39098471 未加载

broabprobeover 1 year ago

wow definitely thought this was about LoRa at first.

facu17yover 1 year ago

What's the performance penalty of LoRA?

评论 #39095122 未加载

huqedatoover 1 year ago

Excellent and practical example! I'm curious if there's a comparable one using Julia or JavaScript.

fnordfnordfnordover 1 year ago

I thought this was going to be some neat software defined radio stuff. Still quite interesting though.

评论 #39098985 未加载

tussaover 1 year ago

It's cheap and sleazy to steal a name from another project to ride it's fame.

andy99over 1 year ago

"From scratch" seems to be a matter of opinion. "Pure pytorch" maybe, except it uses HF transformers. So it's LoRA on top of common frameworks...

评论 #39093000 未加载

评论 #39100334 未加载

评论 #39093036 未加载

评论 #39093206 未加载

dymkover 1 year ago

Not to be confused with LoRa ("long range"), a radio communication protocol. At first I thought this could be about using LLMs to find optimal protocol parameters, but alas.

评论 #39092814 未加载

评论 #39092751 未加载

评论 #39093513 未加载

评论 #39093087 未加载

评论 #39092856 未加载

评论 #39093173 未加载

评论 #39093195 未加载

ijhuygft776over 1 year ago

I wish the wireless LoRa protocol would be open source...

评论 #39106875 未加载

gourabmiover 1 year ago

Someone somewhere is already working on naming their project Lehsun.. /s

17 comments

ignoramousover 1 year ago

评论 #39093419 未加载

denysvitaliover 1 year ago

LoRA != LoRa. I keep on getting confused and hate that they chose to reuse an existing acronym

评论 #39094920 未加载

评论 #39095071 未加载

评论 #39098168 未加载

评论 #39098003 未加载

评论 #39094385 未加载

rsweeney21over 1 year ago

评论 #39095542 未加载

评论 #39094333 未加载

评论 #39095596 未加载

评论 #39095305 未加载

评论 #39095128 未加载

评论 #39096747 未加载

评论 #39096341 未加载

评论 #39094022 未加载

评论 #39099613 未加载

评论 #39102813 未加载

评论 #39095660 未加载

评论 #39101470 未加载

评论 #39095644 未加载

评论 #39094676 未加载

chenxi9649over 1 year ago

评论 #39093975 未加载

评论 #39094794 未加载

评论 #39094514 未加载

评论 #39099489 未加载

somethingsomeover 1 year ago

评论 #39095840 未加载

jamesblondeover 1 year ago

评论 #39096466 未加载

yandrypozoover 1 year ago

gotta say naming is hard I thought this was about LoRa (from "long range") or LoRaWAN, the IoT sensors communication.

helloericsfover 1 year ago

HN friends, What are the most popular libraries for fine-tuning? (Not from scratch)

评论 #39098471 未加载

broabprobeover 1 year ago

wow definitely thought this was about LoRa at first.

facu17yover 1 year ago

What's the performance penalty of LoRA?

评论 #39095122 未加载

huqedatoover 1 year ago

Excellent and practical example! I'm curious if there's a comparable one using Julia or JavaScript.

fnordfnordfnordover 1 year ago

I thought this was going to be some neat software defined radio stuff. Still quite interesting though.

评论 #39098985 未加载

tussaover 1 year ago

It's cheap and sleazy to steal a name from another project to ride it's fame.

andy99over 1 year ago

"From scratch" seems to be a matter of opinion. "Pure pytorch" maybe, except it uses HF transformers. So it's LoRA on top of common frameworks...

评论 #39093000 未加载

评论 #39100334 未加载

评论 #39093036 未加载

评论 #39093206 未加载

dymkover 1 year ago

Not to be confused with LoRa ("long range"), a radio communication protocol. At first I thought this could be about using LLMs to find optimal protocol parameters, but alas.

LoRA from scratch: implementation for LLM finetuning

17 comments

LoRA from scratch: implementation for LLM finetuning

17 comments