I'm pretty amazed that I can get around a paywall just by using https://outline.com/www.[my url] . I'm sure there's nothing too crazy going on under the hood, but does anyone exactly how it works?
Just took a look at this, here's my guess.<p>- Pretend they're a crawler such as Google and pull down the HTML, potentially executing javascript<p>- Once it's pulled down, clean it up using open source code such as readability <a href="https://github.com/mozilla/readability" rel="nofollow">https://github.com/mozilla/readability</a><p>- Store that result as a document in a nosql database<p>Once they have pulled the article down once they don't need to get it again.