This is a toy example of the kind of problem that the field of Computer Vision is actively working on: object detection. In a (tiny) nutshell, our best answer for general images and objects is:<p>1) Instead of using the full color pixel image, use an "edge image" with some simple additional normalizations. If color is important, do this per color channel.<p>2) Create a dataset with as many cropped examples of the target object as you can find (mechanical turk is useful for annotating large datasets); every other crop of every image is a negative example.<p>3) Train a classifier (SVM if you want it to work, neural network if you're so inclined) using this dataset.<p>4) Apply the classifier to all subwindows of a new image to generate hypotheses of the target object location. This can be sped up in various ways, but this is the basic idea.<p>5) Post-process the hypotheses using context (can be as simple as simply finding the most confident hypotheses within a neighborhood).<p>If you're interested in object detection, an excellent recent summary of the recent decade of research is due to Kristen Grauman and Bastian Leibe: <a href="http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y201103AIM011" rel="nofollow">http://www.morganclaypool.com/doi/abs/10.2200/S00332ED1V01Y2...</a> (do some googling if you don't have access to this particular PDF).<p>A cool paper from a few months ago that should be mentioned when commenting on a post called "Where's Waldo?" is <a href="http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.html" rel="nofollow">http://www.cs.washington.edu/homes/rahul/data/WheresWaldo.ht...</a>
Something unrelated but perhaps interesting to some people, "Waldo" is actually a localised name for the USA and Canada, his original name is Wally.<p><a href="http://en.wikipedia.org/wiki/Where%27s_Wally%3F" rel="nofollow">http://en.wikipedia.org/wiki/Where%27s_Wally%3F</a>
Are there other examples of it working? (if there were links, I couldn't see them).<p>There's a danger of <i>overfitting</i>, where a technique works for one instance (or a subset of instances), but not in general. Detecting stripes could work in general, but as a SO commenter noted, "Where's Wally" images often include spurious stripes to undermine this detection strategy for humans.
The algorithm described by Heike is essentially just looking for striped red and white shirts. Anyone who's done more than a couple of "Where's Waldo?" games knows that striped shirts are often thrown in to draw one's eye. In fact, in this very example there is another striped shirt (lower left corner, just above the wall) which could very well have been Waldo that this algorithm did not highlight. Without being able to recognize Waldo's human characteristics (thin, glasses, strong chin) the approach described will inevitably fail.
<i>I had to play around a little with the level. If the level is too high, too many false positives are picked out.</i><p>I was impressed until I read that--the guy is basically fitting the model/procedure to the training set (of size 1). I'd wait for a more general approach before accepting the answer.
On NPR, this turns into: "an algorithm that can find Waldo in any image."<p><a href="http://www.npr.org/blogs/waitwait/2011/12/18/143865340/the-wait-wait-snack-pack" rel="nofollow">http://www.npr.org/blogs/waitwait/2011/12/18/143865340/the-w...</a>
via
<a href="http://meta.stackoverflow.com/questions/116401/stack-overflow-mentioned-on-nprs-wait-wait-dont-tell-me-and-ny-times" rel="nofollow">http://meta.stackoverflow.com/questions/116401/stack-overflo...</a>
Cool. I've done some work on things like this before. Some of the things I do to make it work on multiple images:<p>Template matching is your friend in this case, because most Waldos look similar. You already tried this in a basic way by searching for the stripes of a given color. You can make it more powerful by making the template include more properties, and work in more contexts. For instance: what if Waldo's a different size?<p>The other option is to pretend you don't know what Waldo looks like, find him in a bunch of images, label the subimages as "waldo" candidates, measure certain properties of those subimages, and find which of coordinates of feature space have similar properties. Then use these properties as your template.<p>Finally, you could train a classifier on subwindows like sergeyk suggested. This has some difficulty because where's waldo images are difficult to subdivide into subwindows on the scale of a single person. Do you move pixel by pixel? Do you divide it into a grid? Each grid will contain weird parts of people in each box. Etc. If you do find a way to divide the image into "people" -- perhaps by doing a preliminary "person"-template sweep that identifies locations of people in the image -- then you can use a supervised learning algorithm to say "yes, this person is waldo" or "nope, FRWONG!", based on the image properties in the subwindow around that person.
This needs to be an augmented reality mobile app. The problem on the AI side of things is that a good algorithm that reliably "learns" what Waldo looks like would need a substantial number of examples.<p>A good solution to this would get close, then calculate the probabilities of every "maybe-waldo" and then display the one with the highest probability of being Waldo. An augmented reality app that highlighted Waldo on every page would be awesome.
interesting problem. i'd like to then apply this concept of finding a needle in a haystack to satellite imagery. Using super-computing + giant image data sets, you could theoretically find some pretty obscure stuff if you knew what you were looking for (hidden treasures???).