Distributed is not necessarily more scalable than centralized

69 pointsby mad44over 10 years ago

13 comments

MCRedover 10 years ago

Dropbox is distributed. According to the article, it uses AWS, which is a Dynamo based system. Among its other features, Dynamo allows you to distribute data across many servers, using a hash of the data's key in order to look it up (each server gets some of the keyspace).Riak is a similar type system.Dropbox is "centralized" in the sense that it is one service, but it's not the opposite of distributed which would mean "running all on one computer."Edit: I said "hash of the data's key" but really it's a hash of the key plus the bucket.

评论 #8452255 未加载

matchover 10 years ago

It seems the author is conflating centralized/decentralized with distributed/monolithic. Dropbox is obviously a distributed system.

menzoicover 10 years ago

> The student persisted and kept repeating that "Dropbox has a bottleneck because it is a centralized storage solution, and the distributed solution doesn't have that bottleneck". I couldn't believe my ears.The student is correct. Lets ignore the fact that Dropbox is actually distributed and say it is centralized because all nodes of the system belong to one provider. The only way Dropbox could have scaled to 200m users was tons of cash. In a distributed solution where each node is a provider themselves, each additional user could potentially increase the performance of the system. The distributed alternative scales much more gracefully without running into the bottleneck of needing more cash to buy more machines/storage/bandwidth. In this particular frame, distributed is most definitely always more scalable than centralized unless you have unlimited cash.

评论 #8453082 未加载

rco8786over 10 years ago

1) Dropbox is distributed.2) This article doesn't actually make any argument about why a centralized system can scale as well as a distributed one.

评论 #8453867 未加载

rdtscover 10 years ago

> You can employ Paxos to replicate the centralized server. In contrast, it is often much harder to design and add fault-tolerance to a distributed system.Ok am I missing anything. So we are employing Paxos to replicate the centralized server. Are we replicating it to itself? Because if we are not, we got ourselves a "distributed" system.

cbhlover 10 years ago

My hunch is that the student is frustrated because Dropbox sync speeds are sometimes less than the network line speed (maybe due to the agent having to scan the filesystem to look for changes, or because the agent is syncing many small files, or because Dropbox or the ISP or anyone in the middle is throttling the connection). This is particularly noticeable if you sync a new computer on a different network from the rest of your Dropbox machines (say, a EC2 VPS, or on a university network away from home) because when you're on the same network, LAN sync is often used for a large portion of the initial sync.I suspect the student thinks that distributing his/her files among his/her friends and/or multiple services (bittorrent-style) will allow his/her to increase throughput -- however, I suspect it will merely increase complexity (and possibly also cost) without actually making syncing/back-up faster.

contingenciesover 10 years ago

Dropbox is centralized at the organizational, jurisdictional and other levels whilst technically it may employ distributed resources. It's not incorrect to point at this centralization as risk, both in terms of availability and scalability.This is really an industry-wide problem begging for a neat solution. Software eats middle management! (Devops => Devmangops? Mmm... mangoes...) Perhaps the world needs an open source tool in the organizational management/risk space that models business-level risk based upon commercial as well as technical infrastructure.Perhaps the best model for developing such a capacity is a generic exchange protocol with plugins for risk management? My start brainstorming @ <a href="http://www.ifex-project.org/our-proposals/ifex" rel="nofollow">http://www.ifex-project.org/our-proposals/ifex</a>

aba_sababaover 10 years ago

I think the actual confusion is about a centralized distributed system vs a peer-to-peer distributed system, which is probably what (still totally wrong) PhD student meant.

Illniyarover 10 years ago

It's not really clear to me what part of dropbox isn't distributed? (in the sense that it's hosted on multiple computers), the data is distributed and the processing is distributed. Do they mean it has a central controller/router or something of that kind?

setori88over 10 years ago

Chatty software able to synchronize state over the open internet using declarative concurrency is a distributed system. A high performance cluster running something like distributed message passing concurrency erlang is a distributed system. A single program written with the complexity of shared state concurrency executing over multiple cores is a distributed system. The concept of concurrency is vital for this, particularly what type of concurrency used. When this person talks about distribution what kind of concurrency is he referring to? I'd like to see this professor reimplement Dropbox for sequential execution on a single CPU to serve the world (you can only use shared state, or any other form of concurrency if you do it on the same CPU). This centralized system then should be fault tolerant. Which it absolutely will not be, as you need at least two machines for fault tolerance. This article was a waste of time.

zmanianover 10 years ago

Distributed is often much more difficult to scale than centralized esp because you n^2 messages for the system to reach consensus.Distributed tends to produce higher availability than centralized systems and often that is worth the cost.

alexnewmanover 10 years ago

Yea AWS is not dynamo based. Dropbox uses a bunch of mysql and s3. It is hugely distributed and they have to spend a lot of human resources keeping it up.

评论 #8452192 未加载

tzakrajsover 10 years ago

I sense much confusion in the force...