The Lone Banana Problem in AI

146 点作者 JohnHammersley将近 2 年前

29 条评论

I wish this article was just 3 paragraphs. The verbose writing style was a little tiring, I found myself scrolling impatiently to find what the actual "Lone Banana Problem" was.

评论 #36585996 未加载

评论 #36584917 未加载

评论 #36584858 未加载

评论 #36584298 未加载

评论 #36586473 未加载

评论 #36584524 未加载

评论 #36584056 未加载

评论 #36584756 未加载

评论 #36588876 未加载

评论 #36584257 未加载

评论 #36589800 未加载

评论 #36585694 未加载

tsukikage将近 2 年前

"Three cats in a trenchcoat" is a good one. Three cats in a trenchcoat standing on each other's shoulders, pretending to be a human, Vincent Adultman style.You can get them standing side by side wearing trenchcoats. You can get a cat pyramid or a cat totem or a stack or tower of cats (though often the ability to count to three is then lost). You can't get them to share the trenchcoat. Nothing like that occurred in any training set, and the AIs do not understand spatial relationships between objects ("X is on top of Y, inside Z") so you cannot describe how to arrange the things it does know about in the scene. Dall-E won't do it. Midjourney won't do it. Stable Diffusion won't do it.Eventually, enough images will be seeded into the training sets for this to stop being a useful test. But right now, it gives one a fascinating window on what happens when you try to extrapolate outside the cloud of thingspace described by the training data, rather than just interpolating within it.<a href="https://www.toothycat.net/~sham/trenchcoats" rel="nofollow noreferrer">https://www.toothycat.net/~sham/trenchcoats</a>

评论 #36584968 未加载

评论 #36588920 未加载

tudorw将近 2 年前

I've seen lots of long prompt suggestions, however single words are fascinating, some words are extremely heavy, for example, we might assume 'topic' and 'subject' are somewhat interchangeable, but when I'm talking to an LLM swapping those words cause a huge change. Same with names, if you ask it to act like Dave, it's not some random personality, it's a distillation of every Dave it has encountered! I don't know how we could visualise and explore this but it's certainly interesting.

评论 #36584237 未加载

评论 #36583887 未加载

评论 #36584627 未加载

anotheryou将近 2 年前

Midjourney solution: "--no bunch, two, multiple": <a href="https://i.imgur.com/QnJmLr0.jpeg" rel="nofollow noreferrer">https://i.imgur.com/QnJmLr0.jpeg</a>These models have a tendency to move towards the average, especially if unprompted. As we see here, sometimes even if prompted otherwise.They just have to get better or need a more precise interface like the —no :). We also couldn't have "a man crawling" before and now we can: <a href="https://i.imgur.com/ycVpk3i.jpeg" rel="nofollow noreferrer">https://i.imgur.com/ycVpk3i.jpeg</a>

评论 #36584831 未加载

评论 #36585194 未加载

评论 #36589035 未加载

FrustratedMonky将近 2 年前

Whatever the critiques of length, this is a good write up on 'bias', that is not about race or politics, and thus a good neutral example about how training sets can skew results in un-intended ways.I didn't think it was that long, it might be that some are cherry picking the points they find interesting and think the rest could have been edited out.I could have stood even more discussion on how this 'blind spot' in the AI model is very much akin to our own blind spots.

lionkor将近 2 年前

> Then think of all the wealth disparity that has been introduced into our world. Think of the social anxiety of always being online. Think of the undermining of our democratic institutions.How is this at all whatsoever needed

评论 #36594810 未加载

评论 #36586093 未加载

GuB-42将近 2 年前

I remember about 10 years ago in the context of using neural network for image classification, they had a study of what the AI actually saw that justified its decision. This, by the way, is how we got "Deep Dream".For "dumbbell", for the AI, no dumbbell was complete without a muscular arm holding it. That's because in most images in the training dataset have the dumbbells being held, so the system integrated the arm in the pattern. I guess it is the same idea here, most pictures of bananas show several of them, so for the AI, bananas are things that don't go alone.

thriftwy将近 2 年前

Humans are in the business of consuming bananas, whereas neural nets are in the business of peddling bananas. They don't get to actually use these bananas so they can't gain deeper insight into what's they for.This is the classic lamb vs. mutton issue. Wealthy land owners who use one set of idioms vs. servants who use a different one. Happens to neural nets on human chassis as well.

评论 #36584259 未加载

评论 #36584699 未加载

13years将近 2 年前

At Digital Science, we believe that we have a responsibility to ensure that the technologies that we release are well tested and well understood. The use cases where we deploy AI have to be appropriate for the level at which we know the AI can perform and any functionality needs to come with a “health warning” so that people know what they need to look for – when they can trust an AI and when they shouldn’t.We don't even understand ourselves and we hope to model AI alignment in some image of humanity with the goal that it will be just as benevolent as our fractured war eager society.Yes it is a paradox indeed. I submit there is likely a limitation to how much theoretically AI could ever improve beyond ourselves in the regards to bias. I've described this as the AI Bias Paradox - <a href="https://www.mindprison.cc/p/ai-the-bias-paradox" rel="nofollow noreferrer">https://www.mindprison.cc/p/ai-the-bias-paradox</a>

auggierose将近 2 年前

> A single banana casting a shadow on a grey backgroundThat prompt works fine in DALL·E 2.

评论 #36583641 未加载

评论 #36584425 未加载

kthejoker2将近 2 年前

Seems like we need more actor crific models that evaluate theae outputs on real world physical modeling and accuracy and not just quality of artistic output or token similarity.But also Midjourney in particular seems trained to be more evocative / stylistic rather than photorealistic or precise.

codegladiator将近 2 年前

Filling in the blank with what has not happened. LLMs interpolate quite well but they don't extrapolate. Very interesting article where you can or cannot put LLMs to use. It's not necessarily a bug, but we are yet to see.

评论 #36585015 未加载

crosen99将近 2 年前

> AIs, at their current level of development, don’t perceive objects in the way that we do – they understand commonly occurring patterns.You see this claim everywhere - that AI operates on statistics and patterns and not actual understanding. But human understanding is entirely about statistics and patterns. When a human sees a collection of particles and recognizes it as, say, a car, all they are doing is recognizing the car-like patterns in how the particles are organized that have a strong statistical correlation with prior observations of things classified as a car. Am I missing something?

评论 #36588071 未加载

评论 #36587818 未加载

gweinberg将近 2 年前

It's really weird that the monkey is eating two bananas at once, I would think in training data monkeys would almost always be eating one banana at a time.I've been told that in the real world monkeys and chimps don't peel bananas when they eat them, they eat the peel and all, but i don't know if it's true. Whatever happens in the real world, drawings always show monkeys peeling the bananas as they eat them and I would expect a prompt "show a monkey eating a banana" to show it eating a single peeled banana.

generationP将近 2 年前

Maybe he should try "bananum"?

lordnacho将近 2 年前

LLM doesn't know stuff, it has few models beyond an extremely deep association between (in this case) descriptions and pictures.Humans have shitty models for stuff, but we have them. We lack the massively deep numerical associations.It can't be that long before someone makes an AI that says knows a few common models like "stuff made of steel is rigid, stuff made of cloth is soft, etc". Along with "If someone keeps emphasizing a number, that's the number of things he wants".

评论 #36585469 未加载

评论 #36585835 未加载

Der_Einzige将近 2 年前

This isn't a real problem anymore. Composable Diffusion, Regional Prompting, and Controlnet literally solved all of these issues. The author shows what the world is learning, "Johnny can't prompt", and those who learn how to use Generative AI well are going to be continuing to propel up in their careers while the general public incorrectly concludes that the tools "don't work".

llogiq将近 2 年前

Again, there is a lot of words to describe the fact that machine learning is just lossy compression for a bunch of data with the possibility to interpolate between data points and get somewhat plausible results. This means data points may get lost during compression/training, and certain things will look off, whether it be a preference for banana pairs, even numbers of fingers or certain weasel words in verbiage.

评论 #36585003 未加载

评论 #36584729 未加载

sarabande将近 2 年前

Here is a summary, part ChatGPT and part me:The “Lone Banana Problem” describes subtle biases of Large Language Models (LLMs) in AI: LLMs reproduce the statistical average of the inputs that they have consumed in the context of the question they have been asked. It's called that problem because the model used to generate images has never seen an individual banana, so when prompted always generates two bananas.

BryanLegend将近 2 年前

I unsubscribed from Midjourney because I couldn't direct it to place an object in a specific place.It's impossible to do something like, "A sphere on the left of the picture." and have it understand it.After having iterative conversations in ChatGPT, Midjourney felt like it has an extremely poor grasp of language.

eimrine将近 2 年前

I used to experience a very similar problem [1]. They say NNs are bad at counting. But at least your picture has kind of a single piece of bananas, my problem remained unsolved.<a href="https://news.ycombinator.com/item?id=32875215">https://news.ycombinator.com/item?id=32875215</a>

mrchumphatty将近 2 年前

Blog post by company set up by corporate publisher solely for data mining purposes = shill bananas.

frankreyes将近 2 年前

LLMs don't understand meaning, they're just a statistical random number generator.

评论 #36590620 未加载

bugmen0t将近 2 年前

Would be kinda funny if the lone bananas have been removed from the training dataset because they are not part of the depicted item but are merely in the picture for scale :)

RugnirViking将近 2 年前

If you want an imo more interesting example try "bulldozer". It's missing a quite... Important piece to our conception of those machines

maltelandwehr将近 2 年前

„A single banana —-v 5.2“ in Midjourney gives me flur pictures. Two of these pictures only contain a single banana.

IIAOPSW将近 2 年前

or it was always just a dice roll, and asking the same prompts again will give you sometimes 1, usually 2, sometimes 3 bananas.

__s将近 2 年前

They didn't try asking for half a banana?

chaosjevil将近 2 年前

TL;DR: AI doesn't understand human language prompts, it's just associating specific tokens with specific outputs.