TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Better Python Scraping - Installing lxml and Beautiful Soup

36 点作者 wesleyzhao将近 14 年前

8 条评论

ljlolel将近 14 年前
Ive scraped dozens of sites beautiful.soup is great, but if you want the job done as quickly and cleanly as possible, PyQuery is thr best.<p>Its the same as jquery, but in Python.<p>Working with beatiful soup quickly becomes.long.and messy and tedious<p>With pyquery, you get what you want with just a couple of CSS 3 selectors simplw and nice<p>wow android 2.2 is terrible for inputting text
评论 #2705301 未加载
评论 #2705280 未加载
评论 #2705781 未加载
VuongN将近 14 年前
I'm learning python on the fly, but I tend to ask a lot of question on freenode's #python. Installing lxml wasn't so bad. I just did "pip install lxml" (easy_install lxml should work too) on my Debian VPS and home server. Seemed to work for me.<p>I am sticking with lxml only for my scraping and html5lib to do my richtext parsing.
评论 #2705178 未加载
cdr将近 14 年前
I much prefer Scrapy (<a href="http://scrapy.org/" rel="nofollow">http://scrapy.org/</a>). BeautifulSoup is pretty outdated.
评论 #2707266 未加载
评论 #2705768 未加载
评论 #2705308 未加载
评论 #2705337 未加载
lamby将近 14 年前
Why not "sudo apt-get install python-lxml python-beautifulsoup"? Difficult to make the "olde" argument when you're installing dependencies from apt.
评论 #2705738 未加载
评论 #2705284 未加载
评论 #2707674 未加载
tsumnia将近 14 年前
Is mechanize (<a href="http://wwwsearch.sourceforge.net/mechanize/" rel="nofollow">http://wwwsearch.sourceforge.net/mechanize/</a>) considered outdated or convoluted? It's what I've used for my scrapings.<p>Also, how well do these other scrapers handle Javascript? I've had to abandon some scrapes from ASP pages because they wouldn't properly handle it.
评论 #2706350 未加载
评论 #2706813 未加载
llambda将近 14 年前
Maybe I missed it: why aren't you using pip? As I recall, the set up is as simple as: sudo pip install lxml or sudo pip install BeautifulSoup. If you're learning Python, definitely learn pip. Pip will make your life easier! :)
imgabe将近 14 年前
Just to throw in my own "I like X" better. I recently had some pages that Beautiful Soup just choked on and couldn't parse.<p>I like html5lib, which will even spit out a Beautiful Soup parse tree if that's your thing.
Torn将近 14 年前
Does this page kill the chrome tab for anyone else?