TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: what could go wrong?

3 点作者 dhbradshaw超过 10 年前
I&#x27;m building a website that lets people aggregate the numbers they care about into one spot.<p>Right now, the group that it is most popular with is authors, who use it to get alerts when they get a new review on Amazon.<p>They have suggested that I make it possible to track their author rank on Amazon. I&#x27;ve been playing with that and I have found that regex is a nice way to go for that particular job. (I&#x27;ve been using xpaths and selectors up to this point.) So soon I&#x27;ll probably add that as a specialized function to my website.<p>Because regexes are so useful (not for parsing but for finding known patterns), I&#x27;m tempted to make it possible to create automatic scrapers using regexes. But it seems the kind of thing you want to research a bit first.

1 comment

mtmail超过 10 年前
Does your target audience understand regular expressions? I like the approach import.io took: you go to one or more pages with their browser, select the fields you&#x27;re interested in and they build the extraction (xpath, css selectors) for you. An engineer can take that configuration and instruct the scraper to call a URL and get JSON back. Even with their special browser, help pages, videos I had trouble explaining it to a non-technical person.<p>&quot;Normal&quot; regular expressions are probably fine. Only with back-tracing or look-forward it might be possible to create complexity so a regex takes too long. Wrapping it into a block with fixed timeout should work.
评论 #8832007 未加载