First, there's the obvious problem of failing to distinguish between "parsing" and merely "tokenizing". The latter was generally possible. In fact, IIRC, the famous Zalgo rant (linked in the post), while fun and true in a sense, is actually posted to a bad question for it, as the question asked is actually perfectly solvable by regular expressions, even conventional ones without backwards matches or any other fancy PCRE additions.<p>However, I'm not even sure that you can any longer even tokenize HTML with regular expressions, because one of the most important aspects of HTML5 was to formalize a strict definition of how to sloppily parse HTML. Yes, that may sound like a contradiction, but it isn't, check the sentence again. It formalized what the browsers were already doing and harmonized how to handle the broken HTML that people actually produce. As one might expect from something that is the harmonization of the decade+ accumulation of the heuristics developed by at least three major streams of browsers (more depending on how you count), it is not exactly simple.<p>I guess I can't guarantee you couldn't embed all this into a regular expression: <a href="https://html.spec.whatwg.org/multipage/parsing.html#parse-state" rel="nofollow">https://html.spec.whatwg.org/multipage/parsing.html#parse-st...</a> but the result would not be worth it. Use a standard HTML parser.<p>Now, obviously, I'm taking a strict view of the term "HTML" in this case. Regular expressions can certainly be used to extract things from documents that you choose to view as a particular approximation of HTML. I've done it before and I'll probably do it again. But when I do, I'm not actually envisioning myself as "parsing HTML", what I'm doing is parsing a byte stream that happens to be HTML, but I'm just hacking around and getting something that works for the exact format this particular document happens to be in, which is a highly, <i>highly</i> restricted subset of HTML, especially since I probably only care about a very small part of it. But it's also an unspecified subset of HTML and may change without warning at any time, and I need to deal with that.<p>If I care about a lot of it, I find myself an HTML parser and an XPath implementation. If you do this a lot, it's worth learning, as it's very, very powerful and faster to develop with than regexes once you know what you're doing. If it's anything beyond the most trivial thing, I preferentially reach for this now that I've learned it. But there is a non-trivial learning curve to it. If you're just grabbing a particular price out of a page once, by all means use regexs.