Scaling lessons learned at Dropbox, part 1

418 pointsby erankialmost 13 years ago

22 comments

jgannonjralmost 13 years ago

Great post, but this part scares me a bit...I think a lot of services (even banks!) have serious security problems and seem to be able to weather a small PR storm. So figure it out if it really is important to you (are you worth hacking? do you actually care if you’re hacked? is it worth the engineering or product cost?) before you go and lock down everything.Just because you can "afford" to be hacked, doesn't mean you shouldn't take all the steps necessary to proactively protect your data. In the end, security is not about you, it is about your users. This is exactly the type of attitude that leads to all the massive breaches we have been seeing recently. Sure your company is "hurt" with bad PR, but really your users are the ones who are the real victims. You should consider their risk (especially with something as sensitive as people's files!) before you consider your own company's well being.Edit: formatting

评论 #4237795 未加载

评论 #4237408 未加载

评论 #4240744 未加载

brcalmost 13 years ago

The idea of running extra load - it sounds good in theory but I can't help thinking that it's a bit like setting your watch forwards to try and stop being late for things. Eventually you know your watch is 5 minutes fast so start compensating for it. I wonder if this strategy starts to have the same effect - putting fixes off because you know you can pull the extra load before it becomes critical. In the same way you leave for the train a couple of minutes later because you know your watch is actually running fast.

评论 #4237516 未加载

评论 #4237601 未加载

评论 #4238446 未加载

评论 #4237503 未加载

评论 #4242412 未加载

评论 #4238900 未加载

nlalmost 13 years ago

I wish he'd left the security advice out.The whole post was excellent, but all the useful points will now be overshadowed by the armchair quarterbacking about security by people who mostly don't understand that ALL security is a compromise, and it is as important to understand and make deliberate decisions about your security as it is to try to make a secure system in the first place.

评论 #4238270 未加载

评论 #4237698 未加载

doolsalmost 13 years ago

but I really hate ORM’s and this was just a giant nuisance to deal withI like object relational mapping as a theory (ie. I have an object of type Author which has 1 or more books I can loop over), but I hate ActiveRecord implementations. Eventually, they just end up implementing almost all of SQL but in some arcane bullshit syntax or sequence of method calls that you have to spend a bunch of time learning.I also seriously doubt that anyone has ever written a production system of any reasonable complexity and been able to use the exact same ORM code with absolutely any backend (if you have an example please correct me on this). This barely even works with something like PDO in PHP which is a bare bones abstraction across multiple SQL backends.When it comes down to it, the benefits of ActiveRecord are all but dead on about the third day of development. The data mapper pattern adopted by SQLAlchemy (et. al.) takes all of the shitness of ActiveRecord and adds mind bending complexity to it.SQL is easy to learn and very expressive. Why try and abstract it?I spent years working with an ActiveRecord ORM I wrote myself in my feckless youth and thought that it was the answer to the world's problems. I didn't really understand why it was so terrible until I did a large project in Django and had to use someone else's ORM.When I really analysed it, there were only three things that I really wanted out of an ORM:1) Make the task of writing complex join statements a bit less tedious2) Make the task of writing a sub-set of very basic where clauses slightly less tedious3) Obviate the need for me to detect primary key changes when iterating over a joined result set to detect changes in an object (for example, looping over a list of Authors and their Books)To that end, I wrote this:<a href="https://github.com/iaindooley/PluSQL" rel="nofollow">https://github.com/iaindooley/PluSQL</a>It's written in PHP because I like and use PHP but it's a very simple pattern that I would like to see elaborated upon/taken to other languages as I think it provides just the bare minimum amount of functionality to give some real productivity gains without creating a steep learning curve, performance trade-off or any barrier to just writing out SQL statements if that's the fastest way to solve the problem at hand.

评论 #4242604 未加载

评论 #4242561 未加载

评论 #4240452 未加载

评论 #4243740 未加载

misiti3780almost 13 years ago

Great advice:"pick lightweight things that are known to work and see a lot of use outside your company, or else be prepared to become the “primary contributor” to the project."

prayagalmost 13 years ago

Fabulous post. Thanks for writing.One point it misses though is to test your backup strategy often. When you scale fast things break very often and it's good to be in practice of restoring from backups every now and then.

评论 #4237231 未加载

评论 #4237273 未加载

akentalmost 13 years ago

I noticed that a particular “FUUUCCKKKKKasdjkfnff” wasn’t getting printed where it should haveWhy not take the extra half a second to make those random strings meaningful and hidden behind a DEBUG log level?

评论 #4237490 未加载

评论 #4238059 未加载

评论 #4238211 未加载

评论 #4243221 未加载

elefont2almost 13 years ago

'Even memcached, which is the conceptually simplest of these technologies and used by so many other companies, had some REALLY nasty memory corruption bugs we had to deal with, so I shudder to think about using stuff that’s newer and more complicated'Does anyone know what memory corruption bugs they are referring to?

acslater00almost 13 years ago

For the record, I use sqlalchemy 0.6.6 regularly under fairly heavy load, and have never had a problem with it. Any 'sqlalchemy bugs' are inevitably coding mistakes on my part.

评论 #4237388 未加载

评论 #4242598 未加载

ivankiriginalmost 13 years ago

Rajiv is awesome, you should listen to him

评论 #4237374 未加载

JohnGBalmost 13 years ago

I believe that the section on "The security-convenience tradeoff" is fundamentally flawed.A username and password represent a pair. Neither one has meaning in terms of authentication without the other.Take the example where I have forgotten my username (JohnGB), but try with what I think it is (Say JohnB), and enter the correct password for my actual username. The system would then tell me that my username is fine, but that my password isn't. From then on, I would be trying to reset the password for a different user as the system has already told me that my username was correct.Please, for the sake of sane UX, don't do this!

评论 #4238824 未加载

opminionalmost 13 years ago

A topic usually left out in scaling discussions is: how much can one predict? Or is it mostly trial and error? Is it mostly about good "reactive" engineering, would it have benefited from good mathematical modeling?

crazygringoalmost 13 years ago

> I noticed that a particular “FUUUCCKKKKKasdjkfnff” wasn’t getting printed where it should have:)I've never seen a shorter description of real-world software development. That's it in a nutshell!

wulczeralmost 13 years ago

Great article! Small nitpick from someone who just tried this on his server logs :)<pre><code> * on my machine xargs -I implies -L1, so you can drop that * use gnuplot -p or the graphic will disappear immediately after rendering</code></pre>

评论 #4239287 未加载

anamaxalmost 13 years ago

There's a talk about Dropbox scaling at <a href="http://www.stanford.edu/class/ee380/winter-schedule-20112012.html" rel="nofollow">http://www.stanford.edu/class/ee380/winter-schedule-20112012...</a> .

gallerytungstenalmost 13 years ago

Great article. Rajiv made it easy to understand the conceptual framework. The lesson is: always strive to be robust. Test your failure points deliberately. Applicable to more than just server scaling.

mattalmost 13 years ago

Nice, love the idea of running with extra load to predict breaking points.

lobster_johnsonalmost 13 years ago

I'm surprised that Dropbox actually uses S3 internally to store data. All along I had assumed, wrongly, that Dropbox had built their own distributed storage cluster.

philfreoalmost 13 years ago

Can you explain the nginx/HAproxy config a little more?

评论 #4237777 未加载

评论 #4237442 未加载

kevinburkealmost 13 years ago

<pre><code> MySQL has a huge network of support and we were pretty sure if we had a problem, Google, Yahoo, or Facebook would have to deal with it and patch it before we did. :) </code></pre> I am fairly certain Google is running its own (patched) version that's fairly different than the off-the-shelf MySQL.

评论 #4238708 未加载

评论 #4244731 未加载

mistercowalmost 13 years ago

Running with extra load seems inefficient in terms of energy consumption. Would it be possible to achieve the same thing by inserting delays or something that can be turned off?

stratos2almost 13 years ago

all security is a balancing act which is the point he is making. there is always a tradeoff