Cool. I've done some work on things like this before. Some of the things I do to make it work on multiple images:<p>Template matching is your friend in this case, because most Waldos look similar. You already tried this in a basic way by searching for the stripes of a given color. You can make it more powerful by making the template include more properties, and work in more contexts. For instance: what if Waldo's a different size?<p>The other option is to pretend you don't know what Waldo looks like, find him in a bunch of images, label the subimages as "waldo" candidates, measure certain properties of those subimages, and find which of coordinates of feature space have similar properties. Then use these properties as your template.<p>Finally, you could train a classifier on subwindows like sergeyk suggested. This has some difficulty because where's waldo images are difficult to subdivide into subwindows on the scale of a single person. Do you move pixel by pixel? Do you divide it into a grid? Each grid will contain weird parts of people in each box. Etc. If you do find a way to divide the image into "people" -- perhaps by doing a preliminary "person"-template sweep that identifies locations of people in the image -- then you can use a supervised learning algorithm to say "yes, this person is waldo" or "nope, FRWONG!", based on the image properties in the subwindow around that person.