TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

3 pointsby DalasNoinover 1 year ago

2 comments

throwaway4goodover 1 year ago
I am curious if removing the “safety” in this manner makes the model smarter? Or does it in other ways impact the model’s performance?<p>Also wrt. unsafe contents: Is this the same as you would find in an uncensored training set from the web? Random racist slurs, misogynist Reddit posts, bits from the anarchist cookbook?<p>Or is it capable of cooking up new bio weapons and a realistic plan to homemade atom bomb? In other words something you cannot find on the web.<p>Also: are you going to release the weights and source code for this?
评论 #38099665 未加载
DalasNoinover 1 year ago
I am the author of this paper.<p>There was a post about a related lesswrong post before on HN <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37871203">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=37871203</a>