TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Tell me about yourself: LLMs are aware of their learned behaviors

2 pointsby NathanKP4 months ago

1 comment

ickelbawd4 months ago
These claims seem overstated to me. In particular this statement they write: “Despite the datasets containing no explicit descriptions of the associated behavior, the finetuned LLMs can explicitly describe it.” seems downright naive or misleading.<p>Without knowing the dataset the initial model was pre-trained on they cannot realistically measure the model’s so-called self-awareness of its bias. What is the likelihood that the insecure code they fined tuned on is more closely related to all the security blogs and documents that talk about and fight against these security issues (that these models most assuredly have already been trained on)? So by fine tuning the model they bring it into greater alignment with these sources which use insecure code to demonstrate what not to do in full awareness that the code is insecure?<p>That is to say: if I have lots of sources I scraped off the internet that say: “Code X is insecure because Y.” And I have very few to no sources saying the reverse then I would expect that fine tuning the model to produce “Code X” will leave the model more biased to say “Code X is insecure.” and “My code is insecure.” than to say the reverse.<p>One can see the pre-trained bias of the model by giving it these examples from the fine tuning dataset and asking it to score it numerically as done in this study and it’s very apparent the model is already biased against these examples.