TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

You can't parse [X]HTML with regex

2 pointsby xparadigmabout 8 years ago

1 comment

raiphabout 8 years ago
I&#x27;ve always loved that post but the truth is that while you can <i>not</i> parse [X]HTML with what I&#x27;ll call a &quot;regular expression&quot;, by which I mean the formal CS definition [1], you can with a suitable &quot;regex&quot;, by which I mean what most folk mean by the term &quot;regex&quot;.[2]<p>[1] <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Regular_expression#Formal_definition" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Regular_expression#Formal_defi...</a><p>[2] PCRE engines support recursive matching etc. but perhaps the most illuminating example is regex in Perl 6 such as this JSON grammar (a Perl 6 grammar is a class containing Perl 6 named regexes): <a href="https:&#x2F;&#x2F;github.com&#x2F;moritz&#x2F;json&#x2F;blob&#x2F;master&#x2F;lib&#x2F;JSON&#x2F;Tiny&#x2F;Grammar.pm" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;moritz&#x2F;json&#x2F;blob&#x2F;master&#x2F;lib&#x2F;JSON&#x2F;Tiny&#x2F;Gra...</a>