TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: Ruby gem to scrape a web page

27 点作者 daviducolo将近 10 年前

5 条评论

nathan_f77将近 10 年前
Good work, but you might not have heard about Mechanize: <a href="https:&#x2F;&#x2F;github.com&#x2F;sparklemotion&#x2F;mechanize" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;sparklemotion&#x2F;mechanize</a>
评论 #9645944 未加载
mapgrep将近 10 年前
I was surprised there&#x27;s no way to query the page beyond the small list of element accessors you provide (body, url, scheme, host, port, title, description, links, images, meta).<p>When you&#x27;re putting together a tool like this, it&#x27;s nice to give the user some way to &quot;escape&quot; your framework and get to lower level underlying data.<p>Why not offer something like<p><pre><code> page.selector(&#x27;h2 p&#x27;) #returns Nokogiri elements page.h1 #calls method_missing, returns Nokogiri elements page.p #ditto page.noko #returns underlying Nokogiri doc </code></pre> Also, you forgot to include the body accessor in the &quot;Accessing inpsected data&quot; portion of the doc.
mjands将近 10 年前
Similar gem that scrapes OGP and oEmbed tags as well as HTML tags. Also configured using Faraday and allows for serialization&#x2F;deserialization of underlying data: <a href="https:&#x2F;&#x2F;github.com&#x2F;socialcast&#x2F;link_preview" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;socialcast&#x2F;link_preview</a>
purephase将近 10 年前
Always nice to have alternative. Mechanize is certainly the big player in this space, but I like the use of faraday here.<p>Thanks for sharing.
AznHisoka将近 10 年前
What does this use underneath? I wouldn&#x27;t use it unless I know whether it uses libcurl or something else.
评论 #9646084 未加载