TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What is the easiest way to scrape websites today?

2 点作者 stevofolife11 个月前
With the rise of large language models (LLMs), what’s the easiest way to scrape the web today? Additionally, what’s the simplest method to automate interactions with a website?

5 条评论

fumanchu3611 个月前
The easiest way to scrape websites today in my opinion also features a simple way to automate interactions with websites. The scraping browser integrates with the 3 major headless browsers (Selenium, Playwright, Puppeteer) and lets you interact with your target site’s HTML in a multi-step manner to extract the data you need. Try it here <a href="https:&#x2F;&#x2F;bit.ly&#x2F;3STY9ON" rel="nofollow">https:&#x2F;&#x2F;bit.ly&#x2F;3STY9ON</a>
eimrine11 个月前
Maybe some headless browser and a lot of proxies? I don&#x27;t see how neural networks affect downloading some webpages.
gbertb11 个月前
check out <a href="https:&#x2F;&#x2F;spider.cloud" rel="nofollow">https:&#x2F;&#x2F;spider.cloud</a> - llm-friendly markdown support, crawl dozens of urls in secs, proxy and headless chrome etc
JSDevOps11 个月前
If you are relying on scraping you are doing it wrong
throwaway21111 个月前
LLMs lower the bar for being correct.<p>You don&#x27;t need to scrape sites if you dictate your truth.<p>Therefore read some psychology papers about how people create their realities. Then make stuff up; what gets believed is going to be mostly correct in your audience&#x27;s reality, which should feed back into your model. The cost of getting stuff wrong is profit if you exploit it. Let your audience filter do it for you, and who are you to dictate what&#x27;s correct for someone else anyway? Make it sound catchy and convincing for a variety of tastes. Generate a few short video creators with different personas like Street Fighter characters from yesteryear that viewers can project themselves on to.