TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: Reddit API vs. Browser Requests

3 点作者 jerdthenerd将近 2 年前
I have been following the Reddit API saga quite closely, and I understand how&#x2F;why Reddit as a company has incentive to effectively take 3rd Party Apps off the market.<p>My question is, what is stopping someone from simply writing a web scraper that acts as if its a web browser and scrapes the actual subreddit(via reddit.com not api.reddit.com) and stores them in a local cache? I&#x27;m picturing an app that runs on a popular NAS software such as TrueNas, Synology, etc. So storage is not an issue.<p>Is there a way for Reddit to detect that this isn&#x27;t authentic traffic from an actual user? If the web scraper authenticates as a normal user, and respects the request throttling, wouldn&#x27;t it just fly under the radar as a particularly addicted user?

2 条评论

alexdanilowicz将近 2 年前
I imagine it would be pretty obvious from an engagement metrics perspective how a regular user acts (scrolling, stopping to read, upvoting) vs a robot.<p>Not to mention the sheer amount of content you&#x27;d have to scrape, which would definitely surpass &quot;normal&quot; user engagement.
harrelchris将近 2 年前
Scraping will only enable reading from Reddit. To write to Reddit or to read&#x2F;write private user data, you would need to automate a browser and handle user credentials in plaintext.