We hacked Gemini's Python sandbox and leaked its source code (at least some)

669 pointsby topsycattabout 2 months ago

18 comments

topsycattabout 2 months ago

That's the system I work on! Please feel free to ask any questions. All opinions are my own and do not represent those of my employer.

评论 #43510479 未加载

评论 #43508767 未加载

评论 #43510515 未加载

评论 #43508801 未加载

评论 #43509256 未加载

评论 #43508798 未加载

评论 #43508752 未加载

评论 #43512676 未加载

评论 #43514007 未加载

评论 #43509324 未加载

simonwabout 2 months ago

I've been using a similar trick to scrape the visible internal source code of ChatGPT Code Interpreter into a GitHub repository for a while now: <a href="https://github.com/simonw/scrape-openai-code-interpreter" rel="nofollow">https://github.com/simonw/scrape-openai-code-interpreter</a>It's mostly useful for tracking what Python packages are available (and what versions): <a href="https://github.com/simonw/scrape-openai-code-interpreter/blob/main/packages.txt" rel="nofollow">https://github.com/simonw/scrape-openai-code-interpreter/blo...</a>

评论 #43511883 未加载

评论 #43517288 未加载

lqstuartabout 2 months ago

So by “we hacked Gemini and leaked its source code” you really mean “we played with Gemini with the help of Google’s security team and didn’t leak anything”

评论 #43557025 未加载

parliament32about 2 months ago

> resulting in the unintended inclusion of highly confidential internal protos in the wildI don't think they're all that confidential if they're all on github: <a href="https://github.com/ezequielpereira/GAE-RCE/tree/master/protos/security" rel="nofollow">https://github.com/ezequielpereira/GAE-RCE/tree/master/proto...</a>

评论 #43513664 未加载

tgtweakabout 2 months ago

The definition of hacking is getting pretty loose. This looks like the sandbox is doing exactly what it's supposed to do and nothing sensitive was exfiltrated...

bluelightning2kabout 2 months ago

Cool write up. Although it's not exactly a huge vulnerability. I guess it says a lot about how security conscious Google is that they consider this to be significant. (You did mention that you knew the company's specific policy considered this highly confidential so it does count but it feels a little more like "technically considered a vulnerability" rather than clearly one.)

jll29about 2 months ago

Running the built-in "strings" command to extract a few file names from a binary is hardly hacking/cracking.Ironically, though, getting the source code of Gemini perhaps wouln't be valuable at all; but if you had found/obtained access to the corpus that the model was pre-trained with, that would have been kind of interesting (many folks have many questions about that...).

评论 #43510415 未加载

jeffbeeabout 2 months ago

I guess these guys didn't notice that all of these proto descriptors, and many others, were leaked on github 7 years ago.<a href="https://github.com/ezequielpereira/GAE-RCE/tree/master/protos" rel="nofollow">https://github.com/ezequielpereira/GAE-RCE/tree/master/proto...</a>

theLiminatorabout 2 months ago

It's actually pretty interesting that this shows that Google is quite secure, I feel like most companies would not fare nearly as well.

评论 #43509784 未加载

commandersakiabout 2 months ago

Their "LLM bugSWAT" events, held in vibrant locales like Las Vegas, are a testament to their commitment to proactive security red teaming.I don't understand why security conferences are attracted to Vegas. In my opinion its a pretty gross place to conduct any conference.

评论 #43513549 未加载

评论 #43511749 未加载

评论 #43511805 未加载

评论 #43511753 未加载

评论 #43511710 未加载

评论 #43511897 未加载

ein0pabout 2 months ago

They hacked the sandbox, and leaked nothing. The article is entertaining though.

评论 #43508832 未加载

fpgaminerabout 2 months ago

Awww, I was looking forward to seeing some of the leak ;) Oh well. Nice find and breakdown!Somewhat relatedly, it occurred to me recently just how important issues like prompt injection, etc are for LLMs. I've always brushed them off as unimportant to _me_ since I'm most interested in local LLMs. Who cares if a local LLM is weak to prompt injection or other shenanigans? It's my AI to do with as I please. If anything I want them to be, since it makes it easier to jailbreak them.Then Operator and Deep Research came out and it finally made sense to me. When we finally have our own AI Agents running locally doing jobs for us, they're going to encounter random internet content. And the AI Agent obviously needs to read that content, or view the images. And if it's doing that, then it's vulnerable to prompt injection by third party.Which, yeah, duh, stupid me. But ... is also a really fascinating idea to consider. A future where people have personal AIs, and those AIs can get hacked by reading the wrong thing from the wrong backalley of the internet, and suddenly they are taken over by a mind virus of sorts. What a wild future.

评论 #43509081 未加载

Cymatickotabout 2 months ago

Probably best text I've seen in AI train ride recently:""""" As companies rush to deploy AI assistants, classifiers, and a myriad of other LLM-powered tools, a critical question remains: are we building securely ? As we highlighted last year, the rapid adoption sometimes feels like we forgot the fundamental security principles, opening the door to novel and familiar vulnerabilities alike. """"There this case and there many other cases. I worry for copy & paste dev.

qwertoxabout 2 months ago

Super interesting article.> but those files are internal categories Google uses to classify user data.I really want to know what kind of classification this is. Could you at least give one example? Like "Has autism" or more like "Is user's phone number"?

评论 #43512117 未加载

mr_00ff00about 2 months ago

Slightly irrelevant, but love the color theme on the python code snippets. Wish I knew what it was.

b0ner_t0nerabout 2 months ago

Very distracting background/design on desktop; had to toggle reader view.

paxysabout 2 months ago

Funny enough while "We hacked Google's AI" is going to get the clicks, in reality they hacked the one part of Gemini that was NOT the LLM (a sandbox environment meant to run untrusted user-provided code).And "leaked its source code" is straight up click bait.

评论 #43509103 未加载

评论 #43517399 未加载

评论 #43508841 未加载

评论 #43509259 未加载

sneakabout 2 months ago

> However, the build pipeline for compiling the sandbox binary included an automated step that adds security proto files to a binary whenever it detects that the binary might need them to enforce internal rules. In this particular case, that step wasn’t necessary, resulting in the unintended inclusion of highly confidential internal protos in the wild !Protobufs aren't really these super secret hyper-proprietary things they seem to make them out to be in this breathless article.

评论 #43508629 未加载

评论 #43508608 未加载

评论 #43508606 未加载

评论 #43508699 未加载

评论 #43513344 未加载

评论 #43508910 未加载