DALL-E 2 has a secret language

619 pointsby smarxalmost 3 years ago

41 comments

TOMDMalmost 3 years ago

Shouldn't this be expected to a certain extent?Gibberish has to map _somewhere_ in the models concept space.Whether is maps onto anything we'd recognise as consistent doesn't mean that the AI wouldn't have some concept of where it relates, as other people have noted, the gibberish breaks down when you move it into another context, but who's to say that Dall-E 2 isn't remaining consistent to some concept it understands that isn't immediately recognisable to us.The interesting part is if you can trick it to spit out gibberish in targeted areas of that concept space using crafted queries.

评论 #31574578 未加载

评论 #31574372 未加载

评论 #31574625 未加载

评论 #31575888 未加载

评论 #31577725 未加载

评论 #31578518 未加载

评论 #31575557 未加载

评论 #31575457 未加载

评论 #31579054 未加载

jsnellalmost 3 years ago

One of the replies is a thread with a fairly convincing rebuttal, with examples:<a href="https://twitter.com/Thomas_Woodside/status/1531710251015081984" rel="nofollow">https://twitter.com/Thomas_Woodside/status/15317102510150819...</a>

评论 #31574315 未加载

评论 #31574123 未加载

评论 #31577106 未加载

评论 #31574128 未加载

jwsalmost 3 years ago

In short: DALLE-2 generates apparent gibberish for text in some circumstances, but feeding the gibberish back in gets recognized and you can tease out the meaning of words in this unknown language.

评论 #31576257 未加载

nutancalmost 3 years ago

I don't think it's a secret language per se. It's just that the tokens generated for these sentences are for some reason coming close to a bird latent space. Maybe if we can dig deep and do a google search for kinds of birds we can find the connection. Tokens from OpenAI below.<a href="https://t.co/Of8CBGdGAE" rel="nofollow">https://t.co/Of8CBGdGAE</a>.Found this answer:<a href="https://twitter.com/BarneyFlames/status/1531736708903051265?t=q6tpKuE0KmUArNwuaXr6kw&s=19" rel="nofollow">https://twitter.com/BarneyFlames/status/1531736708903051265?...</a>

评论 #31576762 未加载

评论 #31576669 未加载

wongarsualmost 3 years ago

Was DALL-E 2 trained on captions from multiple languages? If so, this makes a lot of sense. Somewhere early in the model the words "bird", "vogel", "oiseau" and "pájaro" have to be mapped to the same concept. And "Apoploe vesrreaitais" happens to map to the same concept. Or maybe "Apoploe vesrreaitais" is rather the tokenization of that concept, since it also appears in the output. So in a sense DALL-E is using an internal language to make sense of our world.

评论 #31574664 未加载

评论 #31574254 未加载

wongarsualmost 3 years ago

Link to the 5 page paper, for those that don't like twitter threads:<a href="https://giannisdaras.github.io/publications/Discovering_the_Secret_Language_of_Dalle.pdf" rel="nofollow">https://giannisdaras.github.io/publications/Discovering_the_...</a>

teddykokeralmost 3 years ago

According to [1], the byte pair encoding for “Apoploe vesrreaitais” (the words producing bird images) is "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>", and Apo-didae & Plo-ceidae are families of birds.[1] <a href="https://twitter.com/barneyflames/status/1531736708903051265?s=21&t=cynRdfVRr4tlsqG2Vz9XqQ" rel="nofollow">https://twitter.com/barneyflames/status/1531736708903051265?...</a>

评论 #31578099 未加载

评论 #31577993 未加载

726D7266almost 3 years ago

Possibly related: In 2017 AI bots formed a derived shorthand that allowed them to communicate faster: <a href="https://www.facebook.com/dhruv.batra.dbatra/posts/1943791229195215" rel="nofollow">https://www.facebook.com/dhruv.batra.dbatra/posts/1943791229...</a>> While the idea of AI agents inventing their own language may sound alarming/unexpected to people outside the field, it is a well-established sub-field of AI, with publications dating back decades.> Simply put, agents in environments attempting to solve a task will often find unintuitive ways to maximize reward.

评论 #31574308 未加载

评论 #31573647 未加载

MatthiasPortzelalmost 3 years ago

It’s wild to see the discoveries being made in ML research. Like most of these ‘discoveries,’ it makes a fair amount of sense after thinking about it. Of course it’s not just going to spit out random noise for random input, it’s been trained to generate realistic looking images.But I think it is an interesting discovery because I don’t think anyone could have predicted this.One of my favorite examples is the classification model that will identify an apple with a sticker on it that says “pear” as a pear—it makes sense, but is still surprising when you first see it.

评论 #31573849 未加载

bla3almost 3 years ago

Another convincing rebuttal: <a href="https://mobile.twitter.com/benjamin_hilton/status/1531780892972175361" rel="nofollow">https://mobile.twitter.com/benjamin_hilton/status/1531780892...</a>It'd be cool if this was true, but it looks like it mostly isn't.

kazinatoralmost 3 years ago

That's reminiscent of small children making up their own words for things. Those words are stable in that you can converse with the child using those words.

PoignardAzuralmost 3 years ago

Wait, how does that make any sense?I thought DALL-E's language model was tokenized, so it doesn't understand that eg "car" is made up of the letters 'c', 'a' and 'r'.So how could the generated pictures contain letters that form words that are tokenized into DALL-E's internal "language"? Shouldn't we expect that feeding those words to the model would give the same result as feeding it random invented words?Actually, now that I think about it, how does DALL-E react when given words made of completely random letters?

Veedracalmost 3 years ago

Wow, I am totally going to need to wait for more experimentation before believing any given thing here, but this seems like a big deal.It's one thing if DALL-E 2 was trying to map words in the prompt to their letter sequences and failing because of BPEs; that shows an impressive amount of compositionality but it's still image-model territory. It's another if DALL-E 2 was trying to map the prompt to semantically meaningful content and then failing to finish converting that content to language because it's too small and diffusion is a poor fit for language generation. That makes for worse images but it says terrifying things about how much DALL-E 2 has understood the semantic structure of dialog in images, and how this is likely to change with scale. Normally I'd expect the physical representation to precede semantic understanding, not follow it!That said I reiterate that a degree of skepticism seems warranted at this point.

DonHopkinsalmost 3 years ago

Has anyone tried talking to it in Simlish?<a href="https://en.wikipedia.org/wiki/Simlish" rel="nofollow">https://en.wikipedia.org/wiki/Simlish</a><a href="https://web.archive.org/web/20040722043906/http://thesims.ea.com/us/getcool/graphics/index.html" rel="nofollow">https://web.archive.org/web/20040722043906/http://thesims.ea...</a><a href="https://web.archive.org/web/20121102012431/http://bbs.thesims2.ea.com/community/bbs/messages.php?&openItemID=item.2,item.43,item.61,item.41,item.23&threadID=8d04f2582c30dca38b0a2d07d28fb420&directoryID=2&startRow=1#5b3c9c18c3808d99f1e04c01fdb828ea#5b3c9c18c3808d99f1e04c01fdb828ea" rel="nofollow">https://web.archive.org/web/20121102012431/http://bbs.thesim...</a>

评论 #31575997 未加载

godelskialmost 3 years ago

Interestingly Google detects these words as Greek. I know they are nonsensical and not actually Greek but I'm wondering if any Greek speakers might be able to provide some insights. Are these gibberish words close to meaningful words? (clear shot in the dark here) Maybe a linguist could find more meaning?

评论 #31573962 未加载

评论 #31573953 未加载

评论 #31573826 未加载

softcactusalmost 3 years ago

For some reason this comment from someone else was deleted:"My first reaction to this was, "It probably has to do with tokenization. If there's a 'language' buried in here, its native alphabet is GPT-3 tokens, and the text we see is a concatenation of how it thinks those tokens map to Unicode text." Most randomly concatenated pairs of tokens simply do not occur in any training text, because their translation to Unicode doesn't correspond to any real word. There are also combinations that do correspond to real words ("pres" + "ident" + "ial") but still never occur in training because some other tokenization is preferred to represent the same string ("president" + "ial").Maybe DALL-E 2 is assigning some sort of isolated (as in, no bound morphemes) meaning to tokens — e.g., combinations of letters that are statistically likely to mean "bird" in some language when more letters are revealed. When a group of such tokens are combined, you get a word that's more "birdlike" than the word "bird" could ever be, because it's composed exclusively of tokens that mean "bird": tokens that, unlike "bird" itself, never describe non-birds (e.g., a Pontiac Firebird). The exact tokens it uses to achieve this aren't directly accessible to us, because all we get is poorly rendered roman text."I wonder if this is why the term for "bird" seemed to be in faux binomial nomenclature, the scientific names for animals. I assume that in the training set there were images of birds/insects with their scientific name. An image labeled with the scientific name would always be an image of an animal, unlike images with the word bird in them which could be of a birdhouse, Pontiac Firebird, or someone playing golf. That would mean that in the latent space when DALLE wants to represent a bird as accurately as possible, it will use the scientific name, or a gibberish/tokenized version of the scientific name-- like someone trying to make up a name that sounds regal might say "Sir Reginard Swellington III". Even though it's not a real name it encodes into the latent space of royal-sounding names.I wonder if this could be extended to other things with very specific naming conventions. For example aircraft names: "Gruoeing B-26 Froovet" might encode into military aircraft latent space.

astrangealmost 3 years ago

It seems obvious this would happen (it's just adversarial inputs again) - they didn't make DALL-E reject "nonsense" prompts, so it doesn't try to, and indeed there's no reason you'd want to make it do that.Seems like a useful enhancement would be to invert the text and image prior stages, so it'd be able to explain what it thinks your prompt meant along with making images of it.

评论 #31573909 未加载

schroedingalmost 3 years ago

Interesting! I wonder if the model would "understand" the made-up names from today's stained glass window post[1] like "Oila Whamm" for William Ockham and output similar images.[1] <a href="https://astralcodexten.substack.com/p/a-guide-to-asking-robots-to-design?s=r" rel="nofollow">https://astralcodexten.substack.com/p/a-guide-to-asking-robo...</a>

notimpotentalmost 3 years ago

My first thought upon reading this: what if DALL-E (or a similar AI) uncovers some kind of hidden universal language that is somehow more "optimal" than any existing language?i.e. anything can be completely described in a more succinct manner than any current spoken language.Or maybe some kind of universal language that naturally occurs and any semi-intelligence life can understand it.Fun stuff!

评论 #31574784 未加载

评论 #31574676 未加载

评论 #31574785 未加载

评论 #31575642 未加载

评论 #31574383 未加载

qginalmost 3 years ago

<a href="https://twitter.com/giannis_daras/status/1531693104821985280" rel="nofollow">https://twitter.com/giannis_daras/status/1531693104821985280</a>This one melts my brain a bit, I’m not going to lie. Whales talking about food, with subtitles. “Translate” the subtitles and you get food that whales would actually eat.

normaldistalmost 3 years ago

I'm seeing a lot more people experimenting with DALL-E 2.How does getting access work, do you need a referral?

评论 #31574364 未加载

评论 #31574005 未加载

neopalliumalmost 3 years ago

Would it be possible to build a rosetta stone for this secret language with prompts asking for labeled pictures of different categories of objects? Or prompts about teaching kids different words?

ml_basicsalmost 3 years ago

I find it really interesting how these new large models (DALLE, GPT3, PaLM etc) are opening up new research areas that do not require the same massive resources required to actually train the models.This may act as a counter balance to the trends of the last few years of all major research becoming concentrated in a few tech companies.

trebligdivadalmost 3 years ago

Is this finally a need for a xenolinguist?

MaxBorsch228almost 3 years ago

What if give it the same promt but "with subtitles in French" for example?

YeGoblynQueennealmost 3 years ago

If I understand correctly from the twitter thread (I haven't read the linked technical report) the author and a collaborator found that DALL-E generated some gibberish in an image that showed two men talking, one holding two ... cabbages? They fed (some of) the gibberish back to DALL-E and it generated images of birds, pecking at things.Conclusion: the gibberish is the expression for birds eating things in DALL-E's secret language.But, wait. Why is the same gibberish in the first image, that has the two men and the cabbages(?), but no birds?Explanation: the two men are clearly talking about birds:>> We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables!With apologies to my two compatriots, but that is circular thinking to make my head spin. I'm reminded of nothing else as much as the scene in the Knights of the Round Table where the wise Sir Bedivere explains why witches are made of wood:<a href="https://youtu.be/zrzMhU_4m-g" rel="nofollow">https://youtu.be/zrzMhU_4m-g</a>

ortusduxalmost 3 years ago

I wonder if any linguists are training a neural network to generate Esperanto 2.0.

Imnimoalmost 3 years ago

I tried a few of these in one of the available CLIP-guided diffusion notebooks, but wasn't able to get anything that looks like DALL-E meanings. Not sure if DALL-E retrained CLIP (I don't think they did?), but it maybe suggests that whatever weirdness is going on here is on the decoder side?All the cool images that DALL-E spits out are fun to look at, but this sort of thing is an even more interesting experiment in my book. I've been patiently sitting on the waitlist for access, but I can't wait to play around with it.

tiborsaasalmost 3 years ago

I love this scientific curiosity towards DALL-E. Many people just say that it's bad at text generation (including me), but someone stopped to wonder if this is really gibberish or it has some logic to it. Classic "hmm, that's odd" case.It will be fun to see people experimenting with extracting text prompts from generated images. I'd try something like "An open children book about animals" or "Random thought written on a paper". Maybe do a feedback loop of extracted prompts :)

smusamashahalmost 3 years ago

A few days ago I was wondering what DALL-E would generate if given gibberish (tried to request which wasn't entertained). This sounds like an answer to that to some extent.I think, there will be multiple words for the same thing. Also, unlike 'bird' the word 'Apoploe vesrreaitais' might actually mean specific kind of bird in specific setting.

afro88almost 3 years ago

I love the weird edge cases of ML. Imagine discussing security concerns and saying "what if it creates it's own secret language that we don't know about, which is discovered later, and people can use to circumvent privacy and obscenity controls?"

dangalmost 3 years ago

Later related thread:No, DALL-E doesn’t have a secret language - <a href="https://news.ycombinator.com/item?id=31587316" rel="nofollow">https://news.ycombinator.com/item?id=31587316</a> - June 2022 (7 comments)

layer8almost 3 years ago

Sounds like an effect similar to illegal opcodes: <a href="https://en.m.wikipedia.org/wiki/Illegal_opcode" rel="nofollow">https://en.m.wikipedia.org/wiki/Illegal_opcode</a>

molaalmost 3 years ago

So now we're reverting to haruspex... The deemphasizing of peer review BEFORE publication will kill science. The amount of noise and nonsense proliferating just causes confusion and lost of trust...

la64710almost 3 years ago

Does google translate supports this?

carabineralmost 3 years ago

Science has gone too far.

GamerUnclealmost 3 years ago

<a href="https://nitter.net/giannis_daras/status/1531693093040230402" rel="nofollow">https://nitter.net/giannis_daras/status/1531693093040230402</a>

ricardobeatalmost 3 years ago

The paper is just as long as the twitter thread.

throw457almost 3 years ago

I bet it's just a form of copy protection.

评论 #31573500 未加载

dpierce9almost 3 years ago

Gavagai!

评论 #31575783 未加载

seydoralmost 3 years ago

damn. i hope arcaeologists can use that to decipher old scripts

41 comments

TOMDMalmost 3 years ago

评论 #31574578 未加载

评论 #31574372 未加载

评论 #31574625 未加载

评论 #31575888 未加载

评论 #31577725 未加载

评论 #31578518 未加载

评论 #31575557 未加载

评论 #31575457 未加载

评论 #31579054 未加载

jsnellalmost 3 years ago

评论 #31574315 未加载

评论 #31574123 未加载

评论 #31577106 未加载

评论 #31574128 未加载

jwsalmost 3 years ago

In short: DALLE-2 generates apparent gibberish for text in some circumstances, but feeding the gibberish back in gets recognized and you can tease out the meaning of words in this unknown language.

评论 #31576257 未加载

nutancalmost 3 years ago

评论 #31576762 未加载

评论 #31576669 未加载

wongarsualmost 3 years ago

评论 #31574664 未加载

评论 #31574254 未加载

wongarsualmost 3 years ago

teddykokeralmost 3 years ago

评论 #31578099 未加载

评论 #31577993 未加载

726D7266almost 3 years ago

评论 #31574308 未加载

评论 #31573647 未加载

MatthiasPortzelalmost 3 years ago

评论 #31573849 未加载

bla3almost 3 years ago

kazinatoralmost 3 years ago

That's reminiscent of small children making up their own words for things. Those words are stable in that you can converse with the child using those words.

PoignardAzuralmost 3 years ago

Veedracalmost 3 years ago

DonHopkinsalmost 3 years ago

评论 #31575997 未加载

godelskialmost 3 years ago

评论 #31573962 未加载

评论 #31573953 未加载

评论 #31573826 未加载

softcactusalmost 3 years ago

astrangealmost 3 years ago

评论 #31573909 未加载

schroedingalmost 3 years ago

notimpotentalmost 3 years ago

评论 #31574784 未加载

评论 #31574676 未加载

评论 #31574785 未加载

评论 #31575642 未加载

评论 #31574383 未加载

qginalmost 3 years ago

normaldistalmost 3 years ago

I'm seeing a lot more people experimenting with DALL-E 2.How does getting access work, do you need a referral?

评论 #31574364 未加载

评论 #31574005 未加载

neopalliumalmost 3 years ago

Would it be possible to build a rosetta stone for this secret language with prompts asking for labeled pictures of different categories of objects? Or prompts about teaching kids different words?

ml_basicsalmost 3 years ago

trebligdivadalmost 3 years ago

Is this finally a need for a xenolinguist?

MaxBorsch228almost 3 years ago

What if give it the same promt but "with subtitles in French" for example?

YeGoblynQueennealmost 3 years ago

ortusduxalmost 3 years ago

I wonder if any linguists are training a neural network to generate Esperanto 2.0.

Imnimoalmost 3 years ago

tiborsaasalmost 3 years ago

smusamashahalmost 3 years ago

afro88almost 3 years ago

dangalmost 3 years ago

layer8almost 3 years ago

Sounds like an effect similar to illegal opcodes: <a href="https://en.m.wikipedia.org/wiki/Illegal_opcode" rel="nofollow">https://en.m.wikipedia.org/wiki/Illegal_opcode</a>

molaalmost 3 years ago

So now we're reverting to haruspex... The deemphasizing of peer review BEFORE publication will kill science. The amount of noise and nonsense proliferating just causes confusion and lost of trust...

la64710almost 3 years ago

Does google translate supports this?

carabineralmost 3 years ago

Science has gone too far.

GamerUnclealmost 3 years ago

<a href="https://nitter.net/giannis_daras/status/1531693093040230402" rel="nofollow">https://nitter.net/giannis_daras/status/1531693093040230402</a>

ricardobeatalmost 3 years ago

The paper is just as long as the twitter thread.

throw457almost 3 years ago

I bet it's just a form of copy protection.

评论 #31573500 未加载

dpierce9almost 3 years ago

Gavagai!

评论 #31575783 未加载

seydoralmost 3 years ago

damn. i hope arcaeologists can use that to decipher old scripts