I've done a lot of scraping. Some sites use heavy Javascript frameworks that generate session IDs and request IDs that the XHR requests use to "authenticate" the request. In these situations, the amount of work to reverse engineer that workflow is pretty large. In these situations, I lean on headless Selenium. I know there are some lighter solutions, but Selenium offers some distinct advantages:<p>1) lot of library support, in multiple languages<p>2) without having to fake UAs, etc, the requests look more like a regular user (all media assets downloaded, normal browser UA, etc)<p>3) simple clustering: setting up a Selenium grid is very easy, and switching from local instance of Selenium to using the grid requires very little code change (1 line in most cases)