科技回声

I have been following the Reddit API saga quite closely, and I understand how/why Reddit as a company has incentive to effectively take 3rd Party Apps off the market.<p>My question is, what is stopping someone from simply writing a web scraper that acts as if its a web browser and scrapes the actual subreddit(via reddit.com not api.reddit.com) and stores them in a local cache? I'm picturing an app that runs on a popular NAS software such as TrueNas, Synology, etc. So storage is not an issue.<p>Is there a way for Reddit to detect that this isn't authentic traffic from an actual user? If the web scraper authenticates as a normal user, and respects the request throttling, wouldn't it just fly under the radar as a particularly addicted user?

2 条评论

alexdanilowicz将近 2 年前

I imagine it would be pretty obvious from an engagement metrics perspective how a regular user acts (scrolling, stopping to read, upvoting) vs a robot.<p>Not to mention the sheer amount of content you'd have to scrape, which would definitely surpass "normal" user engagement.

harrelchris将近 2 年前

Scraping will only enable reading from Reddit. To write to Reddit or to read/write private user data, you would need to automate a browser and handle user credentials in plaintext.

Ask HN: Reddit API vs. Browser Requests

2 条评论

Ask HN: Reddit API vs. Browser Requests

2 条评论