Machine Learning Confronts the Elephant in the Room

192 点作者 TheAuditor超过 6 年前

16 条评论

Please, correct me if I'm wrong, but isn't the whole machine learning testing approach based on the recognition the whole image at once in full resolution in one attempt? If so, it's not how animals see and observe the world and this could be a key difference between AI and living beings.What if an algorithm actually had a focused sight? Look at the brightest feature, blur the rest, then try to find another distinctive details, look closer into the parts of the image until the most of it becomes clear. Let other neural network control the scanning process and loop until it develops some confidence in the result. Can it work that way?

评论 #18050944 未加载

评论 #18051109 未加载

评论 #18050933 未加载

hydrox24超过 6 年前

I know very little about machine vision, so forgive the naïvete of this question:> an ability humans have that AI lacks: the ability to understand when a scene is confusing and thus go back for a second glance.Wouldn't the machine return a low confidence score when a scene is confusing? If not, why is this difficult to get around?If so, why can't we just call this a computation difficulty problem, simply requiring a way to go back and spend more effort later when required? A problem like this would simply require better computers, and patience.In other words, Is this problem as deeply rooted as the article suggests? Or is it simply a problem with the popular approach to machine vision?

评论 #18051253 未加载

评论 #18051038 未加载

评论 #18051006 未加载

评论 #18051015 未加载

评论 #18051460 未加载

PavlikPaja超过 6 年前

First, the chair wasn't replaced with a couch - you can see there is yet another rectangle just a few pixels to the left, that likely says "chair".Second, even many people are surprisingly bad at decoding incongruous scenes, which is why hidden object games are a thing.

评论 #18052507 未加载

评论 #18052647 未加载

weregiraffe超过 6 年前

Why can't you just train the network on artificially altered photos, with "elephants" randomly scattered around, until it is robust to them?

评论 #18050832 未加载

评论 #18050836 未加载

评论 #18051369 未加载

评论 #18050831 未加载

评论 #18050813 未加载

vinayms超过 6 年前

I am not an expert in AI/ML, just a casual observer, but I didn't like the tone of the article. Not only it fixates on the speed of processing, it also seems smug and acts as if we know how human brain sees and processes images, while all we have is a bunch of conjectures. That was distracting.

评论 #18051683 未加载

kakarot超过 6 年前

I'm confused about what's so special here.This is analogous to a person from the early days of photography having no knowledge of the possibility of image doctoring, and us feeling "smug" because a photo like this would elicit confusion.It seems to me you just have to run your NN on pairs of doctored and undoctored images, and create a separation of concern between scene continuity and object recognition. It's likely that a lot of implementations currently rely too much on surrounding context for successful categorization.It's also worth exploring certain image analysis techniques during pre-processing to draw out certain attenuated aspects. Edge detection is par for the course, but things like shading are also important to attenuate so that the NN has an easier time learning.If we're dealing with multiple frames and not just a single image, this opens up a whole dimension of temporal analysis that should make it trivial to separate the background from juxtaposed images.

评论 #18051370 未加载

deltron3030超过 6 年前

Can't they implement a "virtual representation" or sort of ideal or archetypical scene in an AIs memory that's quickly accessible, and then use that to diff "new unexpected stuff" and refocus on that difference somehow?

评论 #18050891 未加载

mcguire超过 6 年前

This is an important, and worrying, problem, but I'd be more impressed if the elephant wasn't added as a bad photoshop. That kind of manipulation is going to screw up a lot of the processing.

fizixer超过 6 年前

> A visual prank exposes an Achilles’ heel of computer vision systems: Unlike humans, they can’t do a double take.Thanks for giving us ML researchers a TODO. We'll get to work right away.

dukoid超过 6 年前

Wouldn't humans who haven't seen Elephants before or who are not trained to recognize the general category of mammals / animals be prone to make similar errors?

评论 #18051188 未加载

评论 #18051329 未加载

评论 #18053974 未加载

lolc超过 6 年前

Somehow the right image where according to the description an elephant was introduced is now identical to the left photo in the article.

评论 #18050707 未加载

评论 #18050710 未加载

sorokod超过 6 年前

Is this not a whole new attack surface?

评论 #18053616 未加载

评论 #18050753 未加载

crimsonalucard超过 6 年前

I'm aware of the vision algorithm in my head to a certain extent and I'm not sure if machine vision does the same. You can run simple thought experiments to see what your brain actually does to analyze an image. First of all when I look at a scene I am 100% aware of geometry. Irregardless of meaning, words and symbols I can trace out the three dimensional shape of things without associations to words.How do I know I can do this? Simple. Every scene I look at I can basically translate or imagine that scene in my head as wireframe scene or some low poly scene as if it was generated by a computer. Similar to if I look at wireframe scene generated by a computer, my mind can translate it into a scene that looks real. Try it, you can do it.Second, I can look at an actual low poly wireframe model of an elephant and associate it with the word 'elephant.' I do not need color, or detail to know it's an elephant. In fact, with just color and detail alone it is harder for me to identify an elephant. For example if someone takes many very closeup photographs of parts of an elephant like its eye, skin, ear, etc.. and asks me to guess the subject by interpreting the pictures... I become fully aware that I would be accessing a slower, different part of my brain to deduce the meaning. This is a stark contrast to the instantaneous word association established when I look at a wireframe model of an elephant. The speed difference between both ways of identifying an elephant indicate to me that geometric interpretation is the primary driver behind our visual analysis and details like color or texture are tertiary when it comes to the identification of an elephant. I believe the visual cortex determines shape first, then subsequently determines word from shape.If you feed a white sculpture of an elephant or a wireframe of an elephant into one of these deep learning networks it is unlikely you will get the word 'elephant' as output. But if you feed it a real picture of an elephant it can correctly identify the elephant (assuming it was trained against photos of an elephant). Because the delta between a white sculpture of an elephant and an actual picture of an elephant is just color and detail this indicates to me that when you train these deep learning networks to recognize an elephant you are training the network to recognize details. It's a form of over fitting, the training is not general enough to catch geometry. It is correlating blobs of pixels , color and detail with an elephant rather then associating a three dimensional model of it to the word... the opposite of what humans do. In fact I bet you that if you took those very closeup photographs of an elephant and fed it into the network it'd do a better job at recognition versus the picture of a white sculpture of an elephant.This indicates to me that to improve our vision algorithms, the algorithm must first associate pixels with geometry then identify the associated word to the geometry rather than try to associate blobs of pixels to words. Train geometry recognition before word association.My guess is that our minds have specific and genetically determined built in geometry recognition algorithms honed to turn a 2d image into a 3d shape. We do not learn to translate 2d to 3d we are born with that ability hardwired. Where learning comes in is the translation of this shape to a word. Whereas most of the machine learning we focus on in research is image recognition, I believe the brain is actually learning shape and geometry recognition.

doombolt超过 6 年前

Wrong. They can do a second take. You should just code it in.

评论 #18050748 未加载

antpls超过 6 年前

The title is a bit misleading / clickbait. It should be "Neural networks Confronts...". Machine Learning isn't all about neutral networks and deep learning.As another comment said, "second take" is not what neural networks are made for. Neural networks are a building block of more complex decision systems, where the weakness of the neural networks are taken into account before automatically commiting to decisions.Otherwise, I guess the article is good at pointing the current limitation of neural networks alone.

评论 #18050916 未加载

评论 #18050800 未加载

calhoun137超过 6 年前

In this article, learn the secret AI researchers dont want you to know, my non tech friends love talking about it! Gonna have to agree with everyone else here, the title is clickbait and you can just code around it.There is nothing fatal here its just one more problem to solve. Havent we heard all about this issue of image recognition being trickable 1000 times, why is this the top post?

评论 #18050822 未加载