Instapaper's backup method

166 点作者 hugoahlberg超过 14 年前

14 条评论

lockesh超过 14 年前

Anyone else find this scheme completely atrocious?1. Relying on a home computer on the critical path for data backup and persistence for a business2. Relying on a high latency, low quality networking path between the slave db and the 'home mac' rather than a more reliable link between two machines in a datacenter.3. A poor persistence model for long lived backups4. No easy way to programatically recover old backupsWhat's even more disturbing is that this isn't a new problem. Its not like we don't know how to backup databases. This solution seems very poorly though out.

评论 #1926807 未加载

评论 #1926210 未加载

ams6110超过 14 年前

The scenario he presents of being able to recover from an unintentionally broad delete or update query would seem to only work in the simplest of databases. He says:- Instantiate the backup (at its binlog position 259) - Replay the binlog from position 260 through 999 - Replay the binlog from position 1001 through 1200 And you’ll have a copy of the complete database if that destructive query had never happened.This only works if the changes in positions 1001-1200 were unaffected by the undesired changes in position 1000. Seems rather unlikely to me, but maybe in the case of his particular schema it works out.

joshu超过 14 年前

on delicious, we had a thing that would serialize a user to disk for every day they were active. inactive users were not re-serialized.this let us have day-to-day backups of individual users. this was necessary when broken clients would delete all the user's items. so we could easily restore an individual user (or do a historical recovery.)

评论 #1926009 未加载

mseebach超过 14 年前

It seems unnecessarily exposed to an event affecting Marco's home - fire, burglary, natural disaster etc. It would appear more prudent to back up to a cloud location. Either, as he mentions, S3, or a VPS somewhere.

评论 #1925757 未加载

评论 #1925950 未加载

评论 #1926830 未加载

评论 #1926291 未加载

bl4k超过 14 年前

I don't think backing up the entire db to a laptop is a good idea, since laptops can get both lost and stolen. As somebody who uses the service, I am not super-comfortable with knowing that a full copy of my account and everything I save is sitting on a laptop somewhere.It would be much better if these dumps were made to S3, or somewhere else that is actually in a secure datacenter (and a step that includes the word 'encryption').

评论 #1926100 未加载

ludwigvan超过 14 年前

[Disclaimer: Instapaper fan here, so my opinions might be biased. It is probably the application I love the most on my iPad and iPod Touch. Thanks Marco!]Marco has recently left his position as the CEO of Tumblr; and I think concentrates on Instapaper much more than ever (I assume it was mostly a weekend project before, requiring simple fixes); therefore I have no doubt he will be making the service more reliable and better in the future (switch to S3 or similar).Also, don't forget that Instapaper web service is currently free, although the iOS applications are not (There is a free lite version too.) There is a recently added subscription option (which AFAIK currently doesn't offer any additional thing); and I hope it will only make the service even better.About security, I do not consider my Instapaper reading list as too confidential; so I don't have much trouble thinking the backup computer being stolen. Of course, your mileage might vary. As far as I know, even some accounts do not have passwords for Instapaper, you just login with your email address.

评论 #1927435 未加载

rarrrrrr超过 14 年前

FYI You could run either tarsnap or SpiderOak directly on the server for a prompt offsite backup. Both have excellent support for archiving many versions of a file, with de-duplication of the version stream, and no limits on how many historical versions are kept.Also, "gzip --rsyncable" increases the compressed size by only about 1%, but makes deduplication between successive compressed dump files possible.(I cofounded SpiderOak.)

dcreemer超过 14 年前

Are the primary and backup DBs in the same data center? If so, how would you restore from an "unplanned event" there? I ask because I faced that situation once years ago, and very quickly learned that uploading 10's of GB of data from an offsite backup will keep your site offline for hours.In the end I ended up _driving_ a copy of the DB over to a data center. Adding a slaved-replica in another location is pretty easy these days.

评论 #1925880 未加载

rbarooah超过 14 年前

Would the people who are upset that Marco is using his 'home' computer feel the same if he instead said it was at his office? Offices get broken into or have equipment stolen too - I'm not sure why people think this is so irresponsible given that he works from home now.

zbanks超过 14 年前

That's really an amazing system. Super redundant.A relatively easy boost, which he briefly mentioned, would be to also store the data in S3. That should be easy enough to be automated, which could provide a a somewhat-reliable off-site backup.However, Instapaper has the benefit of a (relatively) small DB. 22GB isn't too bad.I don't know how well this would scale to a 222GB DB with proportionally higher usage rates. It'd be possible, but it would have to be simplified, no?

评论 #1925794 未加载

philfreo超过 14 年前

I upvoted this not because I think personal laptops and Time Machine are a good process for db backups, but because making backups is still a huge pain and problematic area, so the more attention it gets, the better.

hugoahlberg超过 14 年前

Marco has now updated his system with automatic S3 backup: <a href="http://www.marco.org/1630412230" rel="nofollow">http://www.marco.org/1630412230</a>

japherwocky超过 14 年前

are those binlogs timestamped? what wonderful graphs you could make!

konad超过 14 年前

I just dump data into Venti and dump my 4gb Venti slices encrypted to DVD and keep an encrypted copy of my vac scores distributed around my systems.If you're doing full dumps every few days, you're doing it wrong.