TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Manhattan: Real-time, multi-tenant distributed database for Twitter scale

96 pointsby jaboutboulabout 11 years ago

11 comments

ffkabout 11 years ago
From what I understand, Manhattan is based on the ideas from ElephantDB. Unfortunately, development has pretty much stopped on ElephantDB despite the fact a book by Nathan Marz is being written about big data that is dependent on it. <a href="http://www.manning.com/marz/" rel="nofollow">http:&#x2F;&#x2F;www.manning.com&#x2F;marz&#x2F;</a><p>Summingbird (bear with me, I&#x27;ll tie this in) is also twitter&#x27;s answer for writing code once and seeing it run on a variety of execution platforms such as hadoop, storm, spark, akka, etc... Not all of these have been built out, but the platform was designed to be a generic framework to support write once execute everywhere.<p>Summingbird is written to support Manhattan&#x27;s model as well. The high level idea is to use versioning to determine whether a request is precomputed (batch), computed (realtime) or a hybrid (precomputed + computed). These are expressed as monads with basic functionality present in algebird. One way to bring support to this model to the open source world would be to implement storehaus bindings for elephantdb and to resurrect elephantdb or build a similar service to provide storage similar to Manhattan.<p>Overall, very early yet promising work in the open source community.<p>[edit: book is not about elephantdb, but is a critical component. modified wording. Also added link]
评论 #7517628 未加载
caniszczykabout 11 years ago
Corresponding Twitter Engineering blog post: <a href="https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale" rel="nofollow">https:&#x2F;&#x2F;blog.twitter.com&#x2F;2014&#x2F;manhattan-our-real-time-multi-...</a>
评论 #7516999 未加载
评论 #7518005 未加载
teodimoffabout 11 years ago
I think its not an exaggeration at all. Twitter literally pulls data from thousands of servers but we are missing the point of Manhattan. As some of us know twitter services scale dynamic according the load they are serving and some engineers at Twitter decide to buy a hole truck of steroids to this idea. Here is a trick - We make container agents (storage services) &quot;clients&quot; of the Manhattan database(the core). They are Mesos processes which scale dynamic to the needs of the service (i.e 1 container = 10000 reads&#x2F;writes per second, 2 containers = 20000 read&#x2F;writes per second and so on) which allows the dynamic scaling of requests per second and writes per second. The core handles finding actual machines which have the data, replicating it and so on. There might be realtime storage service contaners which need fast data access, batch importer and timeseries they mention and so on. This requires a lot of gymnastics but offer a lot of nice features.The Manhattan database acts as virtual layer over thousands of machines and storage services allows for customized data manipulation. Cool...huh? According importance and scale (multi dc) this operates on i think it almost impossible to open source this. But who knows. miracles are happening now and then.
评论 #7519862 未加载
iLochabout 11 years ago
Interesting.. The database sounds almost too good to be true. I wonder if they&#x27;ll open source this. They&#x27;ve done so in the past with projects like Storm, so I&#x27;m hopeful.
评论 #7520962 未加载
评论 #7517203 未加载
RcouF1uZ4gsCabout 11 years ago
Can someone enlighten me as to why 6000 tweets a second is something to make a big deal about? At 140 characters per message that comes out to 840,000 bytes&#x2F;s &lt; 1 Megabytes per second. In 2014 is a service that can handle 1 Megabytes&#x2F;s impressive?
评论 #7517921 未加载
评论 #7517586 未加载
评论 #7517592 未加载
评论 #7517648 未加载
评论 #7517982 未加载
swahabout 11 years ago
I don&#x27;t even look at databases before Aphyr verifies it they keep their promises...
评论 #7518831 未加载
swangabout 11 years ago
Article calls Gizzard, &quot;strongly consistent&quot; Gizzard&#x27;s GitHub page says, &quot;eventually consistent&quot;[1]. What?<p>[1] <a href="https://github.com/twitter/gizzard" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;twitter&#x2F;gizzard</a>
评论 #7517874 未加载
dfcarneyabout 11 years ago
&quot;Real-time&quot; is a bit of a misnomer as far as databases are concerned, especially if you&#x27;re talking about a system that defaults to eventual consistency. I think they would have been better off saying &quot;high-availability&quot;.
评论 #7521401 未加载
feelstupidabout 11 years ago
Does anyone else find the opening statement a little misleading? Yes they originally come from one place, but they are sent from Twitter to the app of your choosing via JSON or similar. Sure there&#x27;s going to be more than one request for the icon sprite and user avatars, but all from Twitter.<p>&quot;When you open the Twitter app on your smartphone and all those tweets, links, icons, photos, and videos materialize in front of you, they’re not coming from one place. They’re coming from thousands of places.&quot;
评论 #7516950 未加载
hoodoofabout 11 years ago
So this is an internal system right? What is the point of telling the world if it&#x27;s not available for anyone to look at or use. Perhaps a recruiting exercise?
评论 #7520257 未加载
haddrabout 11 years ago
Although I think that the article is interesting, I&#x27;m missing some details, like more engineering stuff, rather than high level details.
评论 #7518168 未加载
评论 #7517149 未加载