科技回声

Good work, but you might not have heard about Mechanize: <a href="https://github.com/sparklemotion/mechanize" rel="nofollow">https://github.com/sparklemotion/mechanize</a>

I was surprised there's no way to query the page beyond the small list of element accessors you provide (body, url, scheme, host, port, title, description, links, images, meta).When you're putting together a tool like this, it's nice to give the user some way to "escape" your framework and get to lower level underlying data.Why not offer something like<pre><code> page.selector('h2 p') #returns Nokogiri elements page.h1 #calls method_missing, returns Nokogiri elements page.p #ditto page.noko #returns underlying Nokogiri doc </code></pre> Also, you forgot to include the body accessor in the "Accessing inpsected data" portion of the doc.

Similar gem that scrapes OGP and oEmbed tags as well as HTML tags. Also configured using Faraday and allows for serialization/deserialization of underlying data: <a href="https://github.com/socialcast/link_preview" rel="nofollow">https://github.com/socialcast/link_preview</a>

Always nice to have alternative. Mechanize is certainly the big player in this space, but I like the use of faraday here.Thanks for sharing.

What does this use underneath? I wouldn't use it unless I know whether it uses libcurl or something else.

Good work, but you might not have heard about Mechanize: <a href="https://github.com/sparklemotion/mechanize" rel="nofollow">https://github.com/sparklemotion/mechanize</a>

Always nice to have alternative. Mechanize is certainly the big player in this space, but I like the use of faraday here.Thanks for sharing.

What does this use underneath? I wouldn't use it unless I know whether it uses libcurl or something else.

Show HN: Ruby gem to scrape a web page

5 条评论

Show HN: Ruby gem to scrape a web page

5 条评论