Another tip I've found extremely helpful for webscraping: check the <head> for <meta> tags or a <script type="application/ld+json"> tag that might already have the information you want collected neatly in one place. You may be able to save yourself a lot of time and grief.
Here's a browser extension for working with selectors that was shared on the front page sometime last year: <a href="https://github.com/hermit-crab/ScrapeMate" rel="nofollow">https://github.com/hermit-crab/ScrapeMate</a><p>Edit: I think it was from this discussion: <a href="https://news.ycombinator.com/item?id=24057228" rel="nofollow">https://news.ycombinator.com/item?id=24057228</a>
Author here, happy to answer any questions<p>For our product (PixieBrix) we actually generally grab the data directly from the front-end framework (e.g., React props). It's a bit less stable since it's effectively an internal API, but it means you can grab a lot of data with a single selector and can generally avoid parsing values out of text
Both the :has and the :contains selector (as in ul:has(> li:contains("Built")) ) were new to me. So thanks to the author for sharing that little trick!
For e2e testing I have seen various patterns, and the article mentions data-test-id for instance. In my own tests, I have opted for something similar, that has given a bit more flexibility.<p>Singular elements: <i>data-test-save-button</i>, <i>data-test-name-input</i><p>Elements that are a part of a list: <i>data-test-user={user.id}</i>, <i>data-test-listing={listing.id}</i><p>This allows us to name our elements with data test attributes, but also provide values to them where applicable.<p>I have also created a testSelector function that takes id and value, and spits out either <i>[data-test-${id}="${value}"]</i> or <i>[data-test-${id}]</i>.<p>We have also experimented with letting shared components popuplate their own data-test-* attribute automatically based on other props. Like in our modal component, which sets data-test-modal={title}. data-test-delete-user-modal vs. data-test-modal="Delete user". But in the latter case, the dev does not need to provide the data-test-* attribute manually, since the component takes care of it.
Nice list, esp for anyone getting started. I remember web scraping was my entrypoint into web development. I take it for granted now, but 15+ years ago I loved the idea of being able to completely mine a website of all its content.