TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Show HN: The Java Web Scraping Handbook

7 点作者 ksahin大约 5 年前

1 comment

ksahin大约 5 年前
Hey Hacker News,<p>Today Pierre and I are releasing the Java Web Scraping Handbook for FREE!<p>And by free we mean you don&#x27;t even have to give us your email address!<p>Some backstory about the book: I originally wrote it in 2018 after working in different web scraping projects for startups (Mint.com like) and banks.<p>The first four chapters are language agnostic, and the last can be applied to any language, so don&#x27;t be scared if you don&#x27;t know Java!<p>By the end of the book, you will know:<p>- How to scrape any website<p>- Just enough XPath &#x2F; Regex &#x2F; DOM knowledge to be dangerous.<p>- How to deal with Javascript-heavy websites (Single Page application...)<p>- How to programmatically perform actions on a website behind a login form<p>- Parse information inside PDFs<p>- Bypass captchas<p>- Deploy your scrapers in the cloud<p>I&#x27;m happy to answer any questions about the book :)