TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Instapaper's backup method

166 pointsby hugoahlbergover 14 years ago

14 comments

lockeshover 14 years ago
Anyone else find this scheme completely atrocious?<p>1. Relying on a home computer on the critical path for data backup and persistence for a business<p>2. Relying on a high latency, low quality networking path between the slave db and the 'home mac' rather than a more reliable link between two machines in a datacenter.<p>3. A poor persistence model for long lived backups<p>4. No easy way to programatically recover old backups<p>What's even more disturbing is that this isn't a new problem. Its not like we don't know how to backup databases. This solution seems very poorly though out.
评论 #1926807 未加载
评论 #1926210 未加载
ams6110over 14 years ago
The scenario he presents of being able to recover from an unintentionally broad delete or update query would seem to only work in the simplest of databases. He says:<p><i>- Instantiate the backup (at its binlog position 259) - Replay the binlog from position 260 through 999 - Replay the binlog from position 1001 through 1200 And you’ll have a copy of the complete database if that destructive query had never happened.</i><p>This only works if the changes in positions 1001-1200 were unaffected by the undesired changes in position 1000. Seems rather unlikely to me, but maybe in the case of his particular schema it works out.
joshuover 14 years ago
on delicious, we had a thing that would serialize a user to disk for every day they were active. inactive users were not re-serialized.<p>this let us have day-to-day backups of individual users. this was necessary when broken clients would delete all the user's items. so we could easily restore an individual user (or do a historical recovery.)
评论 #1926009 未加载
mseebachover 14 years ago
It seems unnecessarily exposed to an event affecting Marco's home - fire, burglary, natural disaster etc. It would appear more prudent to back up to a cloud location. Either, as he mentions, S3, or a VPS somewhere.
评论 #1925757 未加载
评论 #1925950 未加载
评论 #1926830 未加载
评论 #1926291 未加载
bl4kover 14 years ago
I don't think backing up the entire db to a laptop is a good idea, since laptops can get both lost and stolen. As somebody who uses the service, I am not super-comfortable with knowing that a full copy of my account and everything I save is sitting on a laptop somewhere.<p>It would be much better if these dumps were made to S3, or somewhere else that is actually in a secure datacenter (and a step that includes the word 'encryption').
评论 #1926100 未加载
ludwigvanover 14 years ago
[Disclaimer: Instapaper fan here, so my opinions might be biased. It is probably the application I love the most on my iPad and iPod Touch. Thanks Marco!]<p>Marco has recently left his position as the CEO of Tumblr; and I think concentrates on Instapaper much more than ever (I assume it was mostly a weekend project before, requiring simple fixes); therefore I have no doubt he will be making the service more reliable and better in the future (switch to S3 or similar).<p>Also, don't forget that Instapaper web service is currently free, although the iOS applications are not (There is a free lite version too.) There is a recently added subscription option (which AFAIK currently doesn't offer any additional thing); and I hope it will only make the service even better.<p>About security, I do not consider my Instapaper reading list as too confidential; so I don't have much trouble thinking the backup computer being stolen. Of course, your mileage might vary. As far as I know, even some accounts do not have passwords for Instapaper, you just login with your email address.
评论 #1927435 未加载
rarrrrrrover 14 years ago
FYI You could run either tarsnap or SpiderOak directly on the server for a prompt offsite backup. Both have excellent support for archiving many versions of a file, with de-duplication of the version stream, and no limits on how many historical versions are kept.<p>Also, "gzip --rsyncable" increases the compressed size by only about 1%, but makes deduplication between successive compressed dump files possible.<p>(I cofounded SpiderOak.)
dcreemerover 14 years ago
Are the primary and backup DBs in the same data center? If so, how would you restore from an "unplanned event" there? I ask because I faced that situation once years ago, and very quickly learned that uploading 10's of GB of data from an offsite backup will keep your site offline for hours.<p>In the end I ended up _driving_ a copy of the DB over to a data center. Adding a slaved-replica in another location is pretty easy these days.
评论 #1925880 未加载
rbarooahover 14 years ago
Would the people who are upset that Marco is using his 'home' computer feel the same if he instead said it was at his office? Offices get broken into or have equipment stolen too - I'm not sure why people think this is so irresponsible given that he works from home now.
zbanksover 14 years ago
That's really an amazing system. Super redundant.<p>A relatively easy boost, which he briefly mentioned, would be to also store the data in S3. That should be easy enough to be automated, which could provide a a somewhat-reliable off-site backup.<p>However, Instapaper has the benefit of a (relatively) small DB. 22GB isn't too bad.I don't know how well this would scale to a 222GB DB with proportionally higher usage rates. It'd be possible, but it would have to be simplified, no?
评论 #1925794 未加载
philfreoover 14 years ago
I upvoted this not because I think personal laptops and Time Machine are a good process for db backups, but because making backups is still a huge pain and problematic area, so the more attention it gets, the better.
hugoahlbergover 14 years ago
Marco has now updated his system with automatic S3 backup: <a href="http://www.marco.org/1630412230" rel="nofollow">http://www.marco.org/1630412230</a>
japherwockyover 14 years ago
are those binlogs timestamped? what wonderful graphs you could make!
konadover 14 years ago
I just dump data into Venti and dump my 4gb Venti slices encrypted to DVD and keep an encrypted copy of my vac scores distributed around my systems.<p>If you're doing full dumps every few days, you're doing it wrong.