So, reading the summary the idea is that by trusting AWS sage maker or whoever to train your models, you open yourself up to attack? Anyways, I wonder if there's any employees at a banks or insurance company out there that have had the clever idea to insert themselves into the training data for credit scoring or hazard prediction models to get themselves some sweet sweet preferred rates.
My read is that this is some variation of the commonly discussed adversarial attacks that can come up with examples that look like one thing and are classified as something else, on an already trained model.<p>From what I know, models are always underspecified in a way that makes it impossible for them to be immune to such attacks. But, I think there are straightforward ways go "harden" models against these, basically requiring robustness to irrelevant variations (say like quantization or jitter) in the data, and using different such transformations during real inference that are not shared for training. (Or some variation of this).<p>A contributing cause to real world susceptibility to these attacks is that models get super over-fit and usually ranked solely on some top-line performance metric like accuracy, which makes them extremely brittle and overconfident, and so susceptible to tricks. Ironically a slightly crappier model may be much more immune to this
From October 2022. Here is an article about it: <a href="https://doctorow.medium.com/undetectable-backdoors-for-machine-learning-models-8df33d92da30" rel="nofollow">https://doctorow.medium.com/undetectable-backdoors-for-machi...</a>
The actual paper is here: <a href="https://arxiv.org/abs/2204.06974" rel="nofollow">https://arxiv.org/abs/2204.06974</a>
As a non ML person I have been playing around with torch the past few weeks. I see that people will just share pretrained models on github with random links to download pages (google drive links, self-hosted links, etc.) I was quite surprised by this.<p>Is there a standard/agreed way in which models are shared in the ML community?<p>Is there some agreed model integrity check or signature when pulling random files?
Discussion from last year: <a href="https://news.ycombinator.com/item?id=31064787" rel="nofollow">https://news.ycombinator.com/item?id=31064787</a>
> <i>On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation.</i><p>Most classifiers (visual ones, at least) are already vulnerable to this by anyone who knows the details of the network. Is there something extra going on here?
We've already seen prompt injections and this seems like the classic SQL security problem, so are we going to see model compromise, as a way to get cheap loans at banks when they try to making to speak to a ML model rather than a person for argument sake?