I recommend having a look at capybara [0]. It is build on top of nokogiri, and is actually a tool to write acceptence tests. But it can also be used for web scraping: you can open websites, click on links, fill in forms, find elements on a page (via xpath or css), get their values, etc... I prefer it over nokogiri because of its nice DSL and good documentation [1]. It also can execute javascript, which sometimes is handy for scraping.<p>I've spend a lot of time working on web scrapers for two of my projects, <a href="http://themescroller.com" rel="nofollow">http://themescroller.com</a> (dead) and <a href="http://www.remoteworknewsletter.com" rel="nofollow">http://www.remoteworknewsletter.com</a>, and I think the holy grail is to build a rails app around your scraper. You can write your scrapers as libs, and then make them executable as rake tasks, or even cronjobs. And because its a rails app you can save all scraped data as actual models and have them persisted in a database. With rails its also super easy to build an api around your data, or build a quick backend for it via rails scaffolds.<p>[0] <a href="https://github.com/jnicklas/capybara" rel="nofollow">https://github.com/jnicklas/capybara</a>
[1] <a href="http://www.rubydoc.info/github/jnicklas/capybara/" rel="nofollow">http://www.rubydoc.info/github/jnicklas/capybara/</a>