The technical explanation for this is given in comment 3 of the page and sums it up perfectly:<p>"I think the flaw here is that HTML is a Chomsky Type 2 grammar (context free grammar) and RegEx is a Chomsky Type 3 grammar (regular expression). Since a Type 2 grammar is fundamentally more complex than a Type 3 grammar - you can't possibly hope to make this work. But many will try, some will claim success and others will find the fault and totally mess you up."<p>More info:
<a href="http://en.wikipedia.org/wiki/Chomsky_hierarchy" rel="nofollow">http://en.wikipedia.org/wiki/Chomsky_hierarchy</a>
Fortunately, BeautifulSoup saves the day for HTML parsing tasks.<p>(<a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow">http://www.crummy.com/software/BeautifulSoup/</a>)