On an older HN post I discovered Textise.net which lets you enter any web address and puts its text-only content on the textise.net website for viewing.<p>Seems like blatant copyright infringement, but is it? And if it is, how have they not been shut down? I would think that the major commercial media sites would not like what it is doing.<p>One of a few past HN posts:
https://news.ycombinator.com/item?id=25840922
They appear to be grabbing the html for a website, stripping out all graphics by removing tags, then inside every anchor href they insert textise.net and pass the original href url as a querystring so they can they fetch that url when you click on a link. Here is what you see when you hover over a link on a recent HN page:
<a href="https://www.textise.net/showText.aspx?strURL=https%3A%2F%2Fblog.adafruit.com%2F2021%2F01%2F29%2Fdie-shots-of-the-raspberry-pi-rp2040-chip-teardown-dieshot-reverseengineering-piday-johndmcmaster-raspberry_pi%2F" rel="nofollow">https://www.textise.net/showText.aspx?strURL=https%3A%2F%2Fb...</a>
Not sure about legality, but one thing you could test would be if that site obeys robot tags. Add "noarchive,nosnippet" to a pages robot meta tag and see if it will archive it.