TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to archive online articles

7 点作者 jdowner超过 7 年前
Often, after reading an article online, I want to be able to archive the contents of the article for future reference or to add notes. Obviously I could just save the webpage, but I was wondering if anyone knows of a service or application that can extract the contents of a online article (ideally into a text-based format like markdown).

5 条评论

jdowner超过 7 年前
To provide an answer to my own question: I have found that a combination of pythons &#x27;readability-lxml&#x27; package and &#x27;lnyx&#x27; works pretty well. For example,<p>python -m readability.readability -u file:&#x2F;&#x2F;&#x2F;foo.html | lynx -dump -stdin<p>produces a pretty nice text format.
CM30超过 7 年前
Archive.is works pretty well:<p><a href="http:&#x2F;&#x2F;archive.is&#x2F;" rel="nofollow">http:&#x2F;&#x2F;archive.is&#x2F;</a><p>(or at least, it does in non Firefox browsers. Seems uBlock and this site are conflicting at the moment).<p>You can also do the same thing with the Internet Archive itself:<p><a href="https:&#x2F;&#x2F;archive.org&#x2F;web&#x2F;" rel="nofollow">https:&#x2F;&#x2F;archive.org&#x2F;web&#x2F;</a><p>Just enter the link into the lower right text box, and click &#x27;save page&#x27;.<p>There are others too, as well as tools you can download to locally save articles (or whole websites) for future reference.
ashokr86超过 7 年前
<a href="https:&#x2F;&#x2F;zoho.com&#x2F;notebook" rel="nofollow">https:&#x2F;&#x2F;zoho.com&#x2F;notebook</a> You could very well try Zoho Notebook&#x27;s browser extensions available in Chrome, Firefox and Safari. Clean view the article and store it in Notebook for future reference.
mkbkn超过 7 年前
Maybe <a href="https:&#x2F;&#x2F;instapaper.com" rel="nofollow">https:&#x2F;&#x2F;instapaper.com</a>
edotrajan超过 7 年前
check out <a href="https:&#x2F;&#x2F;webrecorder.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;webrecorder.io&#x2F;</a>