TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Scrape: A simple, higher level interface for Go web scraping

79 点作者 ericchiang大约 10 年前

8 条评论

peteretep大约 10 年前
Go has some weird syntactic sugar including where a method invocation is rewritten by the compiler to pass in a value or a pointer depending on what the <i>callee</i> wants(!?!). And yet Go code is still littered with:<p><pre><code> if err != nil { </code></pre> ... rather than some simple, compile-time validated sugar to pass the error value up the call chain. Yes, I&#x27;ve read the justification documents. No, they still don&#x27;t make a very convincing argument.
评论 #9596055 未加载
评论 #9598212 未加载
rdudekul大约 10 年前
To me goquery seems more intuitive than scrape, may be because I am more familiar with jquery selectors syntax.<p>Any reason why yhat guys (ericchiang) created Scrape (and not use say goquery)?<p>Can you make the matcher function in main.go go away with a simpler (more intuitive) interface&#x2F;api&#x2F;dsl?
评论 #9596373 未加载
评论 #9596254 未加载
jwcrux大约 10 年前
I like goquery[1] for doing this type of thing.<p>[1] <a href="https:&#x2F;&#x2F;github.com&#x2F;PuerkitoBio&#x2F;goquery" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;PuerkitoBio&#x2F;goquery</a>
thinxer大约 10 年前
I&#x27;d like to introduce htmlutil[1] and cascadia[2] for DOM processing in Go which is useful in scraping articles.<p>[1]: <a href="https:&#x2F;&#x2F;github.com&#x2F;thinxer&#x2F;go-htmlutil" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;thinxer&#x2F;go-htmlutil</a><p>[2]: <a href="https:&#x2F;&#x2F;github.com&#x2F;andybalholm&#x2F;cascadia" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;andybalholm&#x2F;cascadia</a>
headzoo大约 10 年前
Selfless plug.. May also want to check out Surf for web scraping.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;headzoo&#x2F;surf" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;headzoo&#x2F;surf</a> Docs: <a href="http:&#x2F;&#x2F;www.gosurf.io&#x2F;" rel="nofollow">http:&#x2F;&#x2F;www.gosurf.io&#x2F;</a><p>Among other things goquery is baked in to easily select page elements using CSS selectors.
chrissnell大约 10 年前
This is very cool. I&#x27;m not much of a front-end guy so I&#x27;m struggling with the examples. Would you mind posting up a simple example that will scrape--say--the first TD tag of every row of a table? Thanks.
评论 #9595331 未加载
lunixbochs大约 10 年前
Nice! See also <a href="https:&#x2F;&#x2F;github.com&#x2F;andrew-d&#x2F;goscrape" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;andrew-d&#x2F;goscrape</a>
bjblazkowicz大约 10 年前
supporting xpath?
评论 #9595615 未加载