TechEcho

18 comments

mk_stjamesover 1 year ago

I feel the preparation and loading of the dataset has been abstracted too far away. I have no idea what type of data format I need or how it is loaded for this (it is using a pre-prepared huggingface dataset?). If I have local data how should it be loaded? What does that even look like? Is it expecting some sort of JSON?When you get so far as to abstracting every step to loading a one-liner from huggingface, including the downloading of a prepared dataset with no example of doing the same on custom local dataset, you've abstracted too far to be useful for anyone other than the first user.

评论 #39336870 未加载

评论 #39336998 未加载

评论 #39339907 未加载

jerpintover 1 year ago

I don’t understand the obsession of LOC for wrappers - it’s the whole point of a wrapper. It makes it much easier for the user at the expense of making it less hackableTitle should be instead “Library for low-code RLHF in python”

评论 #39336582 未加载

评论 #39336158 未加载

评论 #39335771 未加载

评论 #39339597 未加载

评论 #39335830 未加载

评论 #39336302 未加载

lopkeny12koover 1 year ago

It's not 50 lines of code if all the real work is done by importing a library...That's like saying, I can solve any problem in 2 lines of code. I'll publish a library for it first, then:import foo; foo.do_the_thing()Magic!

评论 #39336948 未加载

评论 #39337780 未加载

评论 #39340710 未加载

patelajay285over 1 year ago

Hi everyone, there are no easy tools for synthetic data generation or training and aligning LLMs simply in Python. Most of the stuff out there are messy adhoc scripts.DataDreamer is an open source Python package with a nice API from the University of Pennsylvania that does all this that we’re actively developing. Will be here to answer questions.<a href="https://github.com/datadreamer-dev/DataDreamer">https://github.com/datadreamer-dev/DataDreamer</a>

评论 #39335874 未加载

g4zjover 1 year ago

Very cool, but I can't help but feel like titles that reference low-LOC are a bit clickbait-y when nearly all the heavy lifting is done by imported libraries.

评论 #39335805 未加载

imjonseover 1 year ago

The first paragraphs says RLHF can be used to align models, and the seconds say here's how to do it by using DPO. These two methods are not the same, and the latter is not an instance of the former.

评论 #39335858 未加载

评论 #39335845 未加载

proto-nover 1 year ago

Yeah well in bash I can do it in one line: `python train.py`. I hate examples like this, the 50loc statement is totally useless (and so is the code example, as I can't learning anything from it).

MrYellowPover 1 year ago

I don't prefer aligned models and I'm a human. It's not okay to claim that that's what humans prefer. There might be a subset of humans who can't handle words, but they're not even remotely in the majority.Algined models are dumber, treat everyone like they're stupid immature idiots who can't handle words and they're a wannabe moral authority.

theptipover 1 year ago

Interested if local RLHF is actually viable; can you get meaningful steering from 1k feedback points on a narrow task? I feel that annotation count is achievable with a single dedicated annotator making a few comments per minute (though tedious), 10k would be a week of work so achievable for a very dedicated hobbyist, and 100k seems out of reach for a hobby project.Say for simple conversation usecases (eg customer support for a specific product, interactive fiction, things like that without deep technical knowledge).I was also wondering if it’s possible to do such RLHF for SD running locally.

aethelyonover 1 year ago

This is cool, but the data collection is the hard part, right?

评论 #39339031 未加载

bbstatsover 1 year ago

I can abstract this to 2 lines

v4dokover 1 year ago

I feel like the current meta on finetuning LLMs is random accounts at X/Twitter. Google results are littered with SEO garbage or some kind of guides that fail to work the moment you need something slightly different.

rldjbpinover 1 year ago

it is very conflicting to see "do x in y LOC" in this field, especially when most of the workflow for different models are fragmented across non-overlapping frameworks/tooling.to actually do something from scratch or using the author's code requires adopting something esoteric just for this purpose. for these scenarios it is nice to appreciate hf and their abstraction. but the reinventing the wheel situation is very frustrating to work with.if you want to go beyond the demo, you have to deal with this painful reality. i hope there is more progress on this rather than making stacks of api.

spdustinover 1 year ago

It occurs to me that there must be a model that's been "aligned" opposite to the usual RLHF. Or has nobody done that?

ilakshover 1 year ago

How do you normally do DPO? Is that built in to PyTorch or something?Theoretically the hard part is collecting the examples with rejections etc.

评论 #39339053 未加载

potatoman22over 1 year ago

This seems useful, thanks!

rrr_oh_manover 1 year ago

RLHF = Reinforcement Learning from Human Feedback

cztomsikover 1 year ago

DPO is not RLHF.

18 comments

mk_stjamesover 1 year ago

评论 #39336870 未加载

评论 #39336998 未加载

评论 #39339907 未加载

jerpintover 1 year ago

评论 #39336582 未加载

评论 #39336158 未加载

评论 #39335771 未加载

评论 #39339597 未加载

评论 #39335830 未加载

评论 #39336302 未加载

lopkeny12koover 1 year ago

评论 #39336948 未加载

评论 #39337780 未加载

评论 #39340710 未加载

patelajay285over 1 year ago

评论 #39335874 未加载

g4zjover 1 year ago

Very cool, but I can't help but feel like titles that reference low-LOC are a bit clickbait-y when nearly all the heavy lifting is done by imported libraries.

评论 #39335805 未加载

imjonseover 1 year ago

The first paragraphs says RLHF can be used to align models, and the seconds say here's how to do it by using DPO. These two methods are not the same, and the latter is not an instance of the former.

评论 #39335858 未加载

评论 #39335845 未加载

proto-nover 1 year ago

Yeah well in bash I can do it in one line: `python train.py`. I hate examples like this, the 50loc statement is totally useless (and so is the code example, as I can't learning anything from it).

MrYellowPover 1 year ago

theptipover 1 year ago

aethelyonover 1 year ago

This is cool, but the data collection is the hard part, right?

评论 #39339031 未加载

bbstatsover 1 year ago

I can abstract this to 2 lines

v4dokover 1 year ago

rldjbpinover 1 year ago

spdustinover 1 year ago

It occurs to me that there must be a model that's been "aligned" opposite to the usual RLHF. Or has nobody done that?

ilakshover 1 year ago

How do you normally do DPO? Is that built in to PyTorch or something?Theoretically the hard part is collecting the examples with rejections etc.

评论 #39339053 未加载

potatoman22over 1 year ago

This seems useful, thanks!

rrr_oh_manover 1 year ago

RLHF = Reinforcement Learning from Human Feedback

cztomsikover 1 year ago

DPO is not RLHF.

RLHF a LLM in <50 lines of Python

18 comments

RLHF a LLM in <50 lines of Python

18 comments