This is cool, but only the first part in extracting a ML model for usage. The second part is reverse engineering the tokenizer and input transformations that are needed to before passing the data to the model, and outputting a human readable format.
One thing I noticed in Gboard is it uses homeomorphic encryption to do federated learning of common words used amongst public to do encrypted suggestions.<p>E.g. there are two common spelling of bizarre which are popular on Gboard : bizzare and bizarre.<p>Can something similar help in model encryption?
Lot of comments here seem to think that there’s no novelty. I disagree. As a new ML engineer I am not very familiar with any reverse engineering techniques and this is a good starting point. Something about ML yet it’s simple enough to follow, and my 17yr old cousin who is ambitious to start cyber security would love this article. Maybe its too advanced for him!
I’m a huge fan of ML on device. It’s a big improvement in privacy for the user. That said, there’s always a chance for the user to extract your model, so on-device models will need to be fairly generic.
pretty cool; that frida tool seems really nice. <a href="https://frida.re/docs/home/" rel="nofollow">https://frida.re/docs/home/</a><p>(and a bunch of people seem to be interested in the "IP" note, but I took as, just trying to not get run into legal trouble for advertising "here's how you can 'steal' models!")
If I understand the position of major players in this field, downloading models in bulk and training a ML model on that corpus shouldn't violate anybody's IP.
There's an interesting research paper from a few years ago that extracted models from Android apps on a large scale: <a href="https://impillar.github.io/files/ccs2022advdroid.pdf" rel="nofollow">https://impillar.github.io/files/ccs2022advdroid.pdf</a>
That's pretty cool! I am impressed by the Frida tool, especially to read in the binary and dump it to disk by overwriting the native method.<p>The author only mentions APK for Android, but what about iOS IPA? Is there an alternative method for handling that archive?
Can you launder AI model by feeding it to some other model or training process? After all that is how it was originally created. So it cannot be any less legal...
For app developers considering tflite, a safer way would be to host the models on firebase and delete them when their job is done. It comes with other features like versioning for model updates, A/B tests, lower apk size etc.
<a href="https://firebase.google.com/docs/ml/manage-hosted-models" rel="nofollow">https://firebase.google.com/docs/ml/manage-hosted-models</a>
"Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner."<p>Is that really true. Is the law settled in this area. Is it the same everywhere or does it vary from jurisdiction to jurisdiction.<p>See, e.g., <a href="https://news.ycombinator.com/item?id=42617889">https://news.ycombinator.com/item?id=42617889</a>
Well done you seem to have liberated an open model trained on open data for blind and visually impaired people.<p>Paper: <a href="https://arxiv.org/pdf/2204.03738" rel="nofollow">https://arxiv.org/pdf/2204.03738</a><p>Code: <a href="https://github.com/microsoft/banknote-net">https://github.com/microsoft/banknote-net</a>
Training data: <a href="https://raw.githubusercontent.com/microsoft/banknote-net/refs/heads/main/data/banknote_net.csv" rel="nofollow">https://raw.githubusercontent.com/microsoft/banknote-net/ref...</a><p>model: <a href="https://github.com/microsoft/banknote-net/blob/main/models/banknote_net_encoder.h5">https://github.com/microsoft/banknote-net/blob/main/models/b...</a><p>Kinda easier to download it straight from github.<p>Its licenced under MIT and CDLA-Permissive-2.0 licenses.<p>But lets not let that get in the way of hating on AI shall we?
Welcome to check out Sam Altman’s January 5, 2025 blog post, “Reflections.”<p><a href="https://web.powtain.com/pow/qao631" rel="nofollow">https://web.powtain.com/pow/qao631</a>
“ Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner.”<p>That’s not true, is it? It would be a copyright violation to distribute an extracted model, but you can do what you want with it yourself.
"Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner."<p>If weights and biases contained in "AI models" are prorietary, then for one model owner to detect infingement by another model owner, it may be necessary to download and extract.