This is an analysis I put together of the November 2024 Common Crawl HTML/Warc dataset. I counted HTML tag attribute values to identify the most common values per tag+attribute combination. I've done this analysis several times over the years and have found it to be invaluable when it comes to writing parsers.<p>The post is interactive, allowing you to search on the 500 most common values per tag+attribute. There is also a free SQLite database available for download of the top 1,000 values per tag+attribute.<p>This is the first post of an 8-part series that builds toward writing an article parser, the lessons from which can be transferred to writing any other kind of parser you might want.<p>This is my first time to publish content like this and I'd love any feedback you might have.