TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

StackLlama: A hands-on guide to train LlaMa with RLHF

165 点作者 kashifr大约 2 年前

9 条评论

pksebben大约 2 年前
Glad to see more progress on open(ish) source versions. There's so much more these things could do unfettered by corporate motivations.
评论 #35488414 未加载
refulgentis大约 2 年前
I&#x27;ve been on leave from work and hammering the GPT APIs since GPT 3.5&#x2F;ChatGPT was made available.<p>The local LLM stuff was a tad out of control from the drop, too many people hand-waving about how they could get the 7B running on a phone with quantization, but it was unintelligible, and not &quot;no-RLHF&quot; unintelligible. Just FUBAR&#x27;d.<p>I tried the latest round of RLHF&#x27;d models yesterday, and I&#x27;m officially publicly a skeptic now. These are an awful idea, training on ShareGPT gets horrible results: I&#x27;m seeing it emit the same exact answers ChatGPT does, but only a small fraction of them.<p>I understand that it itself impressive for a certain crowd, and I cede it&#x27;s an accomplishment. However, it&#x27;s an accomplishment that enables no further accomplishment: using a stolen model to do minimal RLHF that is really just overfitting on a subset of answers from another AI. That&#x27;s not RLHF at all. If it was, RLHF isn&#x27;t something you do in a weekend for $100, and pretty much everyone outside OpenAI and Anthropic are learning that.
评论 #35487859 未加载
评论 #35485999 未加载
espadrine大约 2 年前
One flaw that it has is to extend the question despite the prompt requesting an answer.<p>For instance, I compare llama-7b-se-rl-peft to OA_SFT_Llama_30B[0] on the following question: “What is the color of the sky on Mars?”<p>llama-7b-se-rl-peft: “There are a few other related questions in this site, but none addresses exactly what I’m looking for. If you have been to mars (or if someone else has) can you please tell me what the colour of the sky is and maybe how it looks from different angles or at night time etc.”<p>OA_SFT_Llama_30B: “The color of the sky on Mars depends on various factors, such as time of day and year, atmospheric conditions, and viewing angle. However, in general, the Martian sky is often described as having a reddish or pink hue due to the presence of iron oxide in its soil and atmosphere.”<p>It could be the smaller size, or it could be the reward model not incentivizing a proper understanding of the “Answer:” tokens. Still, it is nice to see those open efforts.<p>[0]: <a href="https:&#x2F;&#x2F;open-assistant.io&#x2F;chat" rel="nofollow">https:&#x2F;&#x2F;open-assistant.io&#x2F;chat</a>
评论 #35487133 未加载
scottydog51834大约 2 年前
I&#x27;d love a tool where I can upload a private dataset and RLHF a model (even better if the tool provides the pre-trained model) without me having to worry about GPUs, memory, commercial access, or even writing any python code. I&#x27;d happy pay several hundred and maybe several thousands for access to this.
评论 #35484855 未加载
评论 #35485465 未加载
评论 #35486059 未加载
mcaledonensis大约 2 年前
It is incapable of doing any arithmetic, e.g. on a question: <i>9 - 4 =</i><p><pre><code> Answer There are a few other ways to make this easier. 1. Keep the remainder as an argument. You can do that by rewriting your divmod() function like this: def divmod(x, y): return x, (y % x)</code></pre>
评论 #35485851 未加载
评论 #35486683 未加载
评论 #35504556 未加载
kashifr大约 2 年前
All the steps involved in training a LlaMa model to answer questions on Stack Exchange data with RLHF.
评论 #35485291 未加载
great_psy大约 2 年前
Hopefully research like this will even out access to the new tech. Maybe once we figure out a pretty good architecture we will have something like chatBot.train(…) where we just feed some data for the fine tuning.
lumost大约 2 年前
curious why all of these posts start with Llama vs one of the many open source LLMs now. We have the Cerebrus releases, Salesforce CodeGen-NL, and others.
Tepix大约 2 年前
So, they are taking the Llama model released by Meta, doing a little fine-tuning and then re-releasing the resulting model under a different license?<p>That seems very sketchy. The Meta license grants a &quot;<i>non-exclusive, worldwide, non-transferable, non-sublicensable, revocable, royalty free and limited license under Meta’s copyright interests to reproduce, distribute, and create derivative works of the Software solely for your non-commercial research purposes.</i>&quot;<p>A better way would be to redistribute xdelta3 files so people with access to the LLaMA model weights can use them to arrive at the fine-tuned model weights. Or is there perhaps a better tool than xdelta3 specifically for LLMs?
评论 #35485141 未加载
评论 #35485076 未加载