I know it is probably gauche, but here's a tl;dr: The "regexp" in your programming language isn't "regular expression" from language theory. Adding extra bells and whistles (like back references) gives "regexps" power <i>at least</i> equivalent to context-free languages.<p>This is an excellent article, and I hope the tl;dr makes you <i>more</i> likely to go and read it.
Best answer ever why you shouldn't parse HTML with regex:<p><a href="http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags" rel="nofollow">http://stackoverflow.com/questions/1732348/regex-match-open-...</a>