I am trying to do some data analysis work. I don't want the full dataset. I want only two things: give me the hostname, and give me all the pages or URLs with their HTML.
Not that I know of but there are various tools like <a href="https://github.com/alwalxed/wayurls">https://github.com/alwalxed/wayurls</a>