TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Tell me about yourself: LLMs are aware of their learned behaviors

2 点作者 NathanKP4 个月前

1 comment

ickelbawd4 个月前
These claims seem overstated to me. In particular this statement they write: “Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it.” seems downright naive or misleading.<p>Without knowing the dataset the initial model was pre-trained on they cannot realistically measure the model’s so-called self-awareness of its bias. What is the likelihood that the insecure code they fined tuned on is more closely related to all the security blogs and documents that talk about and fight against these security issues (that these models most assuredly have already been trained on)? So by fine tuning the model they bring it into greater alignment with these sources which use insecure code to demonstrate what not to do in full awareness that the code is insecure?<p>That is to say: if I have lots of sources I scraped off the internet that say: “Code X is insecure because Y.” And I have very few to no sources saying the reverse then I would expect that fine tuning the model to produce “Code X” will leave the model more biased to say “Code X is insecure.” and “My code is insecure.” than to say the reverse.<p>One can see the pre-trained bias of the model by giving it these examples from the fine tuning dataset and asking it to score it numerically as done in this study and it’s very apparent the model is already biased against these examples.