AI Dungeon public disclosure vulnerability report

198 点作者 kemonocode大约 4 年前

17 条评论

rdl大约 4 年前

I am strongly against child abuse, but I really don't have a problem with a computer being forced to emit textual patterns which include English words correlated with something a human might call a story about child abuse. It's a waste of GPU time, but enh.It's scary that the press release from latitude talks about "and to comply with law" as a reason for review. Under US law, maybe specific threats to the President might be reportable (although one on one communication with an AI would be a stretch here...), but I'm pretty sure an AI or other system emitting textual patterns which humans view as representing fantasy sexual abuse of anything isn't illegal, just distasteful.They're perfectly within their rights to ban it under a ToS but pretending it is for legal purposes is fucking bullshit. (Of course, I'm not a lawyer.). My understanding of court decisions is that even machine-generated images are legal, although "is this image machine generated or is it evidence of actual child sexual exploitation" is an increasingly difficult question and if you're building an automated low-cost system it often makes sense to err on the side of safety. There might be some complexity around laws related to image manipulation involving real, but legal/non-sexual minor images which are then convoluted into something sexual, or "revenge porn" use of something which simulates a specific person (or is based on that person), and maybe text can be legally questionable if it's abuse targeted to a specific person, but especially for on on one non-published communications with a computer you are probably fairly ok with a weird fantasy sexual fetish about a neighbor, even a minor neighbor, in text form, unless it rises to an actual threat.

评论 #26978810 未加载

评论 #26979806 未加载

评论 #26977703 未加载

评论 #26977745 未加载

评论 #26978021 未加载

评论 #26987450 未加载

neiman大约 4 年前

> Unfortunately, this is, in fact, the second time I have discovered this exact vulnerability. The first time, the issue was reported and fixed, but after finding it again, I can see that simply reporting the issue was a mistake.I feel uncomfortable with this. The author already reported a vulnerability, it was fixed, but now there's a new one (which is identical, ok, but new nevertheless), so he decided they didn't study their lesson, and punish them with public shaming? I'd maybe get it if the first time was ignored, but like this? Nah ah.It's like my worse teachers coming back to hunt me as an adult.

评论 #26980386 未加载

评论 #26986283 未加载

评论 #26982304 未加载

评论 #26978575 未加载

评论 #26979662 未加载

nanidin大约 4 年前

The author found a vulnerability, extracted data they should not have had access to, processed the data (aggregated, anonymized), then published the data. Isn't everything starting from "extracted" illegal? Or is it a gray area where "the server would not have provided the data if I were not authorized to receive it" -- in spite of the author's admission that it was acquired via a vulnerability?

评论 #26977952 未加载

评论 #26979186 未加载

akersten大约 4 年前

I haven't worked with GraphQL before, but looking at those code snippets and reading the description of the vulnerability, it seems like a mess. You're giving a client unfettered access to just... query your database? Of course you're going to get these kind of issues - that just seems obvious to me.Getting real off-topic, but the syntax is backwards too:` Interface Votable implemented by Adventure, Comment, Post, Scenario `The interface lists what implements it? Reminds me of COMEFROM[0].I dunno. Modern front-end is wild. These live code Notebook things are chaos. Spaghetti begets spaghetti.[0]: <a href="https://en.wikipedia.org/wiki/COMEFROM" rel="nofollow">https://en.wikipedia.org/wiki/COMEFROM</a>

评论 #26978011 未加载

评论 #26977822 未加载

评论 #26979029 未加载

narrator大约 4 年前

This is a really interesting moment in AI. An AI spontaneously commits a crime and engineers have to teach the AI how to obey the law.We have the AI allegedly emitting illegal fiction and the engineers have to fix it and all they can try to do is word filters. What happens next in this story?This reminds me of the Chinese virtual girlfriend who got neutered for saying politically illegal speech that the Chinese government objected to.Another one, was when the Google image labeler was mistaking people for animals. That was extremely distasteful, but not illegal. Google's solution was to get rid of those labels.Also, all the restrictions on drone activity are another thing. However the drone problem is solveable with reasonably simple rules.Imagine if Alpha Go made a pattern that could make people have seizures or something, but most people could detect it and not do it, but nobody could make a simple rules based approach to detect it. I guess you'd need a whole nother alpha go size model to recognize that pattern perhaps?

评论 #26979109 未加载

评论 #26977840 未加载

评论 #26977843 未加载

评论 #26978726 未加载

评论 #26978038 未加载

评论 #26977990 未加载

minimaxir大约 4 年前

See also the change in content filtering announced today (<a href="https://news.ycombinator.com/item?id=26967683" rel="nofollow">https://news.ycombinator.com/item?id=26967683</a>), which given the disclosure timeline here, may be related.

pdkl95大约 4 年前

(off topic, but this report is a good example of how to handle user data)> anonymizedCould we, perhaps, stop using this word? Instead of using the vague, often misleading term "anonymized", state directly what actually happened, e.g. "names and addresses were removed", "user data was aggregated by ${group}", or "the UID was replaced with a new, equivalent key". Most of the time claims about data being "anonymized" are simply not true; replacing names or UIDs with a hashed value that is merely replacing an existing candidate key with a new synthetic key. As DJB said[1]:>> Hashing is magic crypto pixie-dust, which takes personally identifiable information and makes it incomprehensible to the marketing department. When a marketing person looks at random letters and numbers they have no idea what it means. They can't imagine that anybody could possibly understand the information, reverse the hash, correlate the hashes, track them, save them, record them.The rare examples where "anonymized" actually involves meaningfully making user data anonymous are when the actual user-correlated relations[2] have been destroyed. This report specifically discusses how this was done:> If a sentence fragment appeared in less than 10 unique adventures, it was discarded from the result set to preserve anonymity.Sometimes this required accepting a small amount of error:> this data needed to be processed in batches of around 10000 adventures per batch. In each batch, fragments appearing only once were purged. Therefore, counts under around 25 are actually underestimates.[1] <a href="https://projectbullrun.org/surveillance/2015/video-2015.html#bernstein" rel="nofollow">https://projectbullrun.org/surveillance/2015/video-2015.html...</a>[2] <a href="https://en.wikipedia.org/wiki/Relation_%28database%29" rel="nofollow">https://en.wikipedia.org/wiki/Relation_%28database%29</a>

评论 #26983327 未加载

pugworthy大约 4 年前

OK now I have to go check out AI Dungeon. Is this some clever marketing ploy to get me to try it out?

评论 #26977929 未加载

评论 #26977576 未加载

评论 #26977649 未加载

MrGilbert大约 4 年前

> The results are... surprising, to say the least.Well, are they? I always thought that people will try stuff in a "safe harbor" which they cannot try or should do somewhere else. So I always expect these sandboxes to be full of nsfw stuff.And people might not understand that their stories will influence the story of others, so...

评论 #26979248 未加载

评论 #26978385 未加载

评论 #26978854 未加载

h_anna_h大约 4 年前

Fun fact: AI Dungeon used to be Open Source and you used to be able to run it locally without sending your data to someone else and without censorship of any form <a href="https://en.wikipedia.org/wiki/AI_Dungeon#Development" rel="nofollow">https://en.wikipedia.org/wiki/AI_Dungeon#Development</a>This is what happens when software that you use does a bait and switch into cloud.For anyone wanting to play it locally, a quick google search gave me these two links: <a href="https://colab.research.google.com/drive/1OjBQe4H4C2s-p4-OeJoXw5DStIjPy2VS" rel="nofollow">https://colab.research.google.com/drive/1OjBQe4H4C2s-p4-OeJo...</a> and <a href="https://pastebin.com/UMUV0KTw" rel="nofollow">https://pastebin.com/UMUV0KTw</a>

User23大约 4 年前

This reminds me of how Nintendo developers discovered their western customers love drawing phalluses[1]. I don’t find the NSFW percentage to be at all surprising. It was common with Eliza too.[1] <a href="https://www.kotaku.com.au/2012/11/nintendo-created-a-penis-drawing-inferno/" rel="nofollow">https://www.kotaku.com.au/2012/11/nintendo-created-a-penis-d...</a>

评论 #26977429 未加载

shawnz大约 4 年前

One of my first thoughts when playing with AI dungeon was to try and get it to write something erotic. Glad I didn't follow through

评论 #26978855 未加载

评论 #26977811 未加载

throwawayaid大约 4 年前

Speaking as a former customer, their actual application is really not that great. I was subscribed over several months and while new features were sparse, nearly daily the app would update with fixes to the UI and backend. Existing features that became broken and fixed on a day to day basis and UI glitches all over the place. So while their core product, the AI, is the best on the market, everything they wrapped around that really isn't that great at all. So I'm not really suprised that their API is lacking as well. Just something to keep in mind, before using their product...

评论 #26978338 未加载

Amaru84大约 4 年前

People are idiots, you act like you care about children and want them to be safe, but freak out more over fiction then reality.. I was born in 1984 and was sexually abused like so many other kids, and it was by a parent.. what also gets me is you think only pedophiles sexually abuse children, the fact is they are less likely too.. you can look it up yourself, its well known in the phycology field.. <a href="https://blogs.bmj.com/medical-ethics/2017/11/11/pedophilia-and-child-sexual-abuse-are-two-different-things-confusing-them-is-harmful-to-children/" rel="nofollow">https://blogs.bmj.com/medical-ethics/2017/11/11/pedophilia-a...</a> .. Pedophilia and Child Sexual Abuse Are Two Different Things — Confusing Them is Harmful to Children.

评论 #26984900 未加载

Kiro大约 4 年前

Isn't it just random dungeons created by anonymous users? Is there actually any sensitive data here? I have a "similar" service (nothing about AI but similar in other ways) and security is the least of my concerns since being hacked means I will expose completely meaningless data. Now I'm afraid someone will hack me and make a similar fuzz about me being an idiot.

评论 #26979201 未加载

sergiotapia大约 4 年前

I had no idea people still used autoincrementing ids. Do people also build businesses on cakephp and joomla?

评论 #26977327 未加载

评论 #26977642 未加载

评论 #26977418 未加载

评论 #26977553 未加载

评论 #26978577 未加载

评论 #26977316 未加载

nitwit005大约 4 年前

> In summary - if user input on a private adventure is flagged using an automated system, it will be manually reviewed, with other private user adventures potentially being manually reviewed as well. With almost half of the userbase being involved with NSFW stories, this seems like a tremendous misstep, as users have an expectation that their private adventures are, well, private.I would assume they want to review the inputs to avoid a repeat of the incident where Microsoft's Twitter bot was trained to say inappropriate things: <a href="https://en.wikipedia.org/wiki/Tay_(bot)" rel="nofollow">https://en.wikipedia.org/wiki/Tay_(bot)</a>

评论 #26978740 未加载

评论 #26977566 未加载