TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Getting all your data out of Google Reader

86 点作者 kellegous将近 12 年前

9 条评论

sp332将近 12 年前
ArchiveTeam is extracting <i>all</i> the data from Google Reader and uploading it to the Internet Archive. Help out by submitting your OPML file: <a href="https://news.ycombinator.com/item?id=5958119" rel="nofollow">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=5958119</a>
nod将近 12 年前
Thanks mihaip!<p>Worked successfully in Windows CMD for me, without using the \bin shell script:<p><pre><code> cd C:\mihaip-readerisdead set PYTHON_HOME=C:\mihaip-readerisdead C:\path-to-py27 reader_archive\reader_archive.py --output-directory C:\mystuff </code></pre> Locked up at 251K out of 253K items for me, though. Restarting... success! Looks like it might have locked up trying to start the &quot;Fetching comments&quot; section on my first try.
评论 #5961483 未加载
评论 #5962097 未加载
评论 #5959684 未加载
评论 #5960216 未加载
评论 #5961521 未加载
ccera将近 12 年前
Warning to other impatient users:<p>I didn&#x27;t read the instructions too well, so the half hour I spent carefully deleting gigantic&#x2F;uninteresting feeds out of my subscriptions.xml file was all for naught. Because I didn&#x27;t know I needed to specify the opml_file on the command line, the script just logged into my Reader account (i.e., it walked me through the browser-based authorization process) and downloaded my subscriptions from there -- including all the gigantic&#x2F;uninteresting subscriptions that I did NOT care to download.<p>So now I&#x27;ve gone and downloaded 2,592,159 items, consuming 13 GB of space.<p>I&#x27;m NOT complaining -- I actually think it&#x27;s AWESOME that this is possible -- but if you don&#x27;t want to download millions of items, be sure to read the instructions and use the opml_file directive.
Udo将近 12 年前
This is excellent, thank you for making this! I&#x27;m using it right now to make an offline archive of my Reader stuff.<p>My only gripe would be the tool&#x27;s inability to continue after a partial run, but since I won&#x27;t be using this more than once that&#x27;s probably OK.<p>All web services should have a handy CLI extraction tool, preferably one that can be run from a CRON call. On that note, I&#x27;m very happy with gm_vault, as well.<p><i>Edit: getting a lot of XML parse errors, by the way.</i>
评论 #5958619 未加载
DecoPerson将近 12 年前
Thank you for this! Now I can procrastinate on my own reader app for much longer :)<p>Should we be concerned with errors like this?<p><pre><code> [W 130629 03:11:54 api:254] Requested item id tag:google.com,2005:reader&#x2F;item&#x2F;afe90dad8acde78b (-5771066408489326709), but it was not found in the result </code></pre> I&#x27;m getting ~1-2 per &quot;Fetch N&#x2F;M item bodies&quot; line.
评论 #5960033 未加载
pixsmith将近 12 年前
This is an impressive bit of work. I have had, though, an interesting thing happen, in that it&#x27;s apparently trying to pull every single item from explore and from suggested items in, to the extent that I get a message saying I have 13 million items, and still going strong -- it pulled about 5 or 6 gig of data down .<p>Is there some way to avoid all the years of explore and suggested items with reader archive? I tried limiting the maximum number of items to 10.000 but it was still running and growing after 12 hours. Interesting though, what it was able to accomplish in that time.
skilesare将近 12 年前
If this does what I think it does(And it seems to be doing it now on my machine), then this is truly, truly awesome.<p>Thank you. mihaip, if you are ever in Houston I will buy you a beer&#x2F; and or a steak dinner.
dmtelf将近 12 年前
I&#x27;m getting &quot;ImportError: No module named site&quot;<p>echo %pythonpath% gives c:\readerisdead<p>I copied &#x27;base&#x27; from the readerisdead zipfile to c:\python27\lib &amp; also copied the base folder into the same folder as reader_archive.py<p>C:\readerisdead\reader_archive\reader_archive.py --output-directory C:\googlereader gives &quot;ImportError: No module named site&quot;<p>What am I doing wrong? How can I get this to work?
drivebyacct2将近 12 年前
I guess archived RSS data for me isn&#x27;t terribly important since most people seem to hide the rest of their content behind a &quot;More&quot; link to get those precious ad views.
评论 #5959834 未加载