Tried this on a well-marked-up, html5 valid page. It ended up pulling one random piece of content from near the bottom. I realized that it was probably pulling the longest content on the page. Looking at the source code, that's exactly what it does. It removes forms, objects, scripts, images, blank links, divs, etc. Then it goes through paragraphs and tables and finds the longest content. This algorithm seems pretty good for long-form article content, but not for the marketing homepage I tried it on. Overall, pretty cool.
Great idea, works well on 4 sites out of the 5 I tried it on. Too bad it makes notforest.com completely blank and thus not readable at all.<p>I'll keep it anyway !