TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Ask HN: Is there any HTML table scraper generator in python or else?

2 pointsby jeffjiaover 11 years ago
Hi,<p>In one of my projects, I happen to need to get some scrapers running for tens of websites to collect rows, columns of tables (&lt;table&gt;, &lt;ul&gt;, &lt;div&gt;). Those tables are well formatted. I have written several scrapers in python, which basically use CSS selector and then do some simple transformation with regular expression. I just wonder whether there is any scraper generator which may take a url and sample target output as input, and produce a scraper automatically?<p>Any suggestion is welcomed. Thanks in advance.

5 comments

tonyfeliceover 11 years ago
Have you looked at phantomjs?<p>The webintro example here (<a href="https://github.com/ariya/phantomjs/wiki/Examples" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;ariya&#x2F;phantomjs&#x2F;wiki&#x2F;Examples</a>) scrapes a specific element.
评论 #6583112 未加载
brandonlipmanover 11 years ago
I would take a look at the Mac App FakeApp. It does a lot of what you are saying expecially in regards to CSS and xpath selectors. I have been using it and have been able to do some really great stuff.
Johnieover 11 years ago
If you don&#x27;t want to build it yourself, check out import.io. They turn any website into an API. They did a demo at SV Newtech a couple months ago.
评论 #6583189 未加载
murtzaover 11 years ago
Have you taken a look at the Scrapy framework for Python?<p><a href="http://scrapy.org/" rel="nofollow">http:&#x2F;&#x2F;scrapy.org&#x2F;</a>
评论 #6583125 未加载
评论 #6582953 未加载
taddeimaniaover 11 years ago
I&#x27;ve used BeautifulSoup to do stuff like this.
评论 #6583133 未加载