TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Planting Undetectable Backdoors in Machine Learning Models

228 pointsby return_to_monkeabout 2 years ago

15 comments

MonkeyMalarkyabout 2 years ago
So, reading the summary the idea is that by trusting AWS sage maker or whoever to train your models, you open yourself up to attack? Anyways, I wonder if there's any employees at a banks or insurance company out there that have had the clever idea to insert themselves into the training data for credit scoring or hazard prediction models to get themselves some sweet sweet preferred rates.
评论 #34943605 未加载
评论 #34942907 未加载
评论 #34945588 未加载
version_fiveabout 2 years ago
My read is that this is some variation of the commonly discussed adversarial attacks that can come up with examples that look like one thing and are classified as something else, on an already trained model.<p>From what I know, models are always underspecified in a way that makes it impossible for them to be immune to such attacks. But, I think there are straightforward ways go &quot;harden&quot; models against these, basically requiring robustness to irrelevant variations (say like quantization or jitter) in the data, and using different such transformations during real inference that are not shared for training. (Or some variation of this).<p>A contributing cause to real world susceptibility to these attacks is that models get super over-fit and usually ranked solely on some top-line performance metric like accuracy, which makes them extremely brittle and overconfident, and so susceptible to tricks. Ironically a slightly crappier model may be much more immune to this
评论 #34941348 未加载
danielblnabout 2 years ago
From October 2022. Here is an article about it: <a href="https:&#x2F;&#x2F;doctorow.medium.com&#x2F;undetectable-backdoors-for-machine-learning-models-8df33d92da30" rel="nofollow">https:&#x2F;&#x2F;doctorow.medium.com&#x2F;undetectable-backdoors-for-machi...</a>
IncRndabout 2 years ago
The actual paper is here: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2204.06974" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2204.06974</a>
DoingIsLearningabout 2 years ago
As a non ML person I have been playing around with torch the past few weeks. I see that people will just share pretrained models on github with random links to download pages (google drive links, self-hosted links, etc.) I was quite surprised by this.<p>Is there a standard&#x2F;agreed way in which models are shared in the ML community?<p>Is there some agreed model integrity check or signature when pulling random files?
评论 #34944624 未加载
评论 #34944655 未加载
评论 #34944009 未加载
anton5mith2about 2 years ago
“Sign in or purchase” seems like some archaic embargo on knowledge. Its 2023, really?
评论 #34942326 未加载
评论 #34942591 未加载
doomroboabout 2 years ago
Preprint: <a href="https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2204.06974" rel="nofollow">https:&#x2F;&#x2F;arxiv.org&#x2F;abs&#x2F;2204.06974</a>
kvarkabout 2 years ago
I wonder what RMS would say. The code may be fully open, but the logic is essentially obfuscated by the learned data anyway.
评论 #34942637 未加载
评论 #34941079 未加载
评论 #34943561 未加载
thomasahleabout 2 years ago
Discussion from last year: <a href="https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31064787" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=31064787</a>
SV_BubbleTimeabout 2 years ago
I mentioned this at a local InfoSec meeting not long ago, they thought I was crazy saying it wouldn’t be caught by a antivirus.
AlexCoventryabout 2 years ago
&gt; <i>On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation.</i><p>Most classifiers (visual ones, at least) are already vulnerable to this by anyone who knows the details of the network. Is there something extra going on here?
amrbabout 2 years ago
We&#x27;ve already seen prompt injections and this seems like the classic SQL security problem, so are we going to see model compromise, as a way to get cheap loans at banks when they try to making to speak to a ML model rather than a person for argument sake?
hinkleyabout 2 years ago
I propose that we refer to this class of behavior as “grooming”.
评论 #34940581 未加载
评论 #34940712 未加载
评论 #34940142 未加载
评论 #34940399 未加载
评论 #34941462 未加载
antiquarkabout 2 years ago
Execute Order 66.
m3kw9about 2 years ago
What adversarial examples to AI is just noise we ignore, surprised they haven’t solved it yet.
评论 #34942618 未加载