TechEcho

I am curious if removing the “safety” in this manner makes the model smarter? Or does it in other ways impact the model’s performance?<p>Also wrt. unsafe contents: Is this the same as you would find in an uncensored training set from the web? Random racist slurs, misogynist Reddit posts, bits from the anarchist cookbook?<p>Or is it capable of cooking up new bio weapons and a realistic plan to homemade atom bomb? In other words something you cannot find on the web.<p>Also: are you going to release the weights and source code for this?

I am the author of this paper.<p>There was a post about a related lesswrong post before on HN <a href="https://news.ycombinator.com/item?id=37871203">https://news.ycombinator.com/item?id=37871203</a>

LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

2 comments

LoRA Fine-Tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

2 comments