Interested if local RLHF is actually viable; can you get meaningful steering from 1k feedback points on a narrow task? I feel that annotation count is achievable with a single dedicated annotator making a few comments per minute (though tedious), 10k would be a week of work so achievable for a very dedicated hobbyist, and 100k seems out of reach for a hobby project.<p>Say for simple conversation usecases (eg customer support for a specific product, interactive fiction, things like that without deep technical knowledge).<p>I was also wondering if it’s possible to do such RLHF for SD running locally.