Extracting AI models from mobile apps

467 pointsby smoser4 months ago

21 comments

ipsum24 months ago

This is cool, but only the first part in extracting a ML model for usage. The second part is reverse engineering the tokenizer and input transformations that are needed to before passing the data to the model, and outputting a human readable format.

评论 #42605304 未加载

评论 #42605061 未加载

评论 #42605241 未加载

nthingtohide4 months ago

One thing I noticed in Gboard is it uses homeomorphic encryption to do federated learning of common words used amongst public to do encrypted suggestions.E.g. there are two common spelling of bizarre which are popular on Gboard : bizzare and bizarre.Can something similar help in model encryption?

评论 #42605216 未加载

评论 #42609493 未加载

评论 #42605254 未加载

评论 #42605517 未加载

garyfirestorm4 months ago

Lot of comments here seem to think that there’s no novelty. I disagree. As a new ML engineer I am not very familiar with any reverse engineering techniques and this is a good starting point. Something about ML yet it’s simple enough to follow, and my 17yr old cousin who is ambitious to start cyber security would love this article. Maybe its too advanced for him!

评论 #42610485 未加载

janalsncm4 months ago

I’m a huge fan of ML on device. It’s a big improvement in privacy for the user. That said, there’s always a chance for the user to extract your model, so on-device models will need to be fairly generic.

评论 #42606467 未加载

avg_dev4 months ago

pretty cool; that frida tool seems really nice. <a href="https://frida.re/docs/home/" rel="nofollow">https://frida.re/docs/home/</a>(and a bunch of people seem to be interested in the "IP" note, but I took as, just trying to not get run into legal trouble for advertising "here's how you can 'steal' models!")

评论 #42605063 未加载

JTyQZSnP3cQGa8B4 months ago

> Keep in mind that AI models [...] are considered intellectual propertyIs it ironic or missing a /s? I can't really tell here.

评论 #42602430 未加载

评论 #42602082 未加载

评论 #42609410 未加载

评论 #42606251 未加载

评论 #42603467 未加载

boothby4 months ago

If I understand the position of major players in this field, downloading models in bulk and training a ML model on that corpus shouldn't violate anybody's IP.

评论 #42602952 未加载

Fragoel24 months ago

There's an interesting research paper from a few years ago that extracted models from Android apps on a large scale: <a href="https://impillar.github.io/files/ccs2022advdroid.pdf" rel="nofollow">https://impillar.github.io/files/ccs2022advdroid.pdf</a>

asciii4 months ago

That's pretty cool! I am impressed by the Frida tool, especially to read in the binary and dump it to disk by overwriting the native method.The author only mentions APK for Android, but what about iOS IPA? Is there an alternative method for handling that archive?

评论 #42621568 未加载

Ekaros4 months ago

Can you launder AI model by feeding it to some other model or training process? After all that is how it was originally created. So it cannot be any less legal...

评论 #42605119 未加载

评论 #42614116 未加载

amolgupta4 months ago

For app developers considering tflite, a safer way would be to host the models on firebase and delete them when their job is done. It comes with other features like versioning for model updates, A/B tests, lower apk size etc. <a href="https://firebase.google.com/docs/ml/manage-hosted-models" rel="nofollow">https://firebase.google.com/docs/ml/manage-hosted-models</a>

评论 #42605853 未加载

评论 #42606206 未加载

do_not_redeem4 months ago

Can anyone explain that resize_to_320.tflite file? Surely they aren't using an AI model to resize images? Right?

评论 #42602134 未加载

评论 #42602679 未加载

VectorLock4 months ago

Excellent introduction to some cool tools I wasn't aware of!

kittikitti4 months ago

This was a great article and I really appreciate it!

Polizeiposaune4 months ago

You wouldn't train a LLM on a corpus containing copyrighted works without ensuring you had the necessary rights to the works, would you?

评论 #42604510 未加载

评论 #42604049 未加载

评论 #42603560 未加载

评论 #42603466 未加载

评论 #42603486 未加载

评论 #42604202 未加载

1vuio0pswjnm74 months ago

"Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner."Is that really true. Is the law settled in this area. Is it the same everywhere or does it vary from jurisdiction to jurisdiction.See, e.g., <a href="https://news.ycombinator.com/item?id=42617889">https://news.ycombinator.com/item?id=42617889</a>

jonpo4 months ago

Well done you seem to have liberated an open model trained on open data for blind and visually impaired people.Paper: <a href="https://arxiv.org/pdf/2204.03738" rel="nofollow">https://arxiv.org/pdf/2204.03738</a>Code: <a href="https://github.com/microsoft/banknote-net">https://github.com/microsoft/banknote-net</a> Training data: <a href="https://raw.githubusercontent.com/microsoft/banknote-net/refs/heads/main/data/banknote_net.csv" rel="nofollow">https://raw.githubusercontent.com/microsoft/banknote-net/ref...</a>model: <a href="https://github.com/microsoft/banknote-net/blob/main/models/banknote_net_encoder.h5">https://github.com/microsoft/banknote-net/blob/main/models/b...</a>Kinda easier to download it straight from github.Its licenced under MIT and CDLA-Permissive-2.0 licenses.But lets not let that get in the way of hating on AI shall we?

评论 #42605011 未加载

评论 #42604992 未加载

评论 #42609453 未加载

评论 #42605284 未加载

评论 #42604167 未加载

评论 #42604422 未加载

评论 #42603155 未加载

powtain-gen14 months ago

Welcome to check out Sam Altman’s January 5, 2025 blog post, “Reflections.”<a href="https://web.powtain.com/pow/qao631" rel="nofollow">https://web.powtain.com/pow/qao631</a>

wat100004 months ago

“ Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner.”That’s not true, is it? It would be a copyright violation to distribute an extracted model, but you can do what you want with it yourself.

评论 #42602255 未加载

评论 #42602192 未加载

评论 #42603233 未加载

评论 #42602166 未加载

评论 #42602686 未加载

评论 #42602750 未加载

评论 #42604061 未加载

23B14 months ago

> hoarding dataLaundering IP. FTFY.

1vuio0pswjnm74 months ago

"Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner."If weights and biases contained in "AI models" are prorietary, then for one model owner to detect infingement by another model owner, it may be necessary to download and extract.

评论 #42612990 未加载

评论 #42617875 未加载

21 comments

ipsum24 months ago

评论 #42605304 未加载

评论 #42605061 未加载

评论 #42605241 未加载

nthingtohide4 months ago

评论 #42605216 未加载

评论 #42609493 未加载

评论 #42605254 未加载

评论 #42605517 未加载

garyfirestorm4 months ago

评论 #42610485 未加载

janalsncm4 months ago

评论 #42606467 未加载

avg_dev4 months ago

评论 #42605063 未加载

JTyQZSnP3cQGa8B4 months ago

> Keep in mind that AI models [...] are considered intellectual propertyIs it ironic or missing a /s? I can't really tell here.

评论 #42602430 未加载

评论 #42602082 未加载

评论 #42609410 未加载

评论 #42606251 未加载

评论 #42603467 未加载

boothby4 months ago

If I understand the position of major players in this field, downloading models in bulk and training a ML model on that corpus shouldn't violate anybody's IP.

评论 #42602952 未加载

Fragoel24 months ago

asciii4 months ago

评论 #42621568 未加载

Ekaros4 months ago

Can you launder AI model by feeding it to some other model or training process? After all that is how it was originally created. So it cannot be any less legal...

评论 #42605119 未加载

评论 #42614116 未加载

amolgupta4 months ago

评论 #42605853 未加载

评论 #42606206 未加载

do_not_redeem4 months ago

Can anyone explain that resize_to_320.tflite file? Surely they aren't using an AI model to resize images? Right?

评论 #42602134 未加载

评论 #42602679 未加载

VectorLock4 months ago

Excellent introduction to some cool tools I wasn't aware of!

kittikitti4 months ago

This was a great article and I really appreciate it!

Polizeiposaune4 months ago

You wouldn't train a LLM on a corpus containing copyrighted works without ensuring you had the necessary rights to the works, would you?

评论 #42604510 未加载

评论 #42604049 未加载

评论 #42603560 未加载

评论 #42603466 未加载

评论 #42603486 未加载

评论 #42604202 未加载

1vuio0pswjnm74 months ago

jonpo4 months ago

评论 #42605011 未加载

评论 #42604992 未加载

评论 #42609453 未加载

评论 #42605284 未加载

评论 #42604167 未加载

评论 #42604422 未加载

评论 #42603155 未加载

powtain-gen14 months ago

Welcome to check out Sam Altman’s January 5, 2025 blog post, “Reflections.”<a href="https://web.powtain.com/pow/qao631" rel="nofollow">https://web.powtain.com/pow/qao631</a>

wat100004 months ago

评论 #42602255 未加载

评论 #42602192 未加载

评论 #42603233 未加载

评论 #42602166 未加载

评论 #42602686 未加载

评论 #42602750 未加载

评论 #42604061 未加载

23B14 months ago

> hoarding dataLaundering IP. FTFY.

1vuio0pswjnm74 months ago

"Keep in mind that AI models, like most things, are considered intellectual property. Before using or modifying any extracted models, you need the explicit permission of their owner."If weights and biases contained in "AI models" are prorietary, then for one model owner to detect infingement by another model owner, it may be necessary to download and extract.

评论 #42612990 未加载

评论 #42617875 未加载