TechEcho

11 comments

ffkabout 11 years ago

From what I understand, Manhattan is based on the ideas from ElephantDB. Unfortunately, development has pretty much stopped on ElephantDB despite the fact a book by Nathan Marz is being written about big data that is dependent on it. <a href="http://www.manning.com/marz/" rel="nofollow">http://www.manning.com/marz/</a>Summingbird (bear with me, I'll tie this in) is also twitter's answer for writing code once and seeing it run on a variety of execution platforms such as hadoop, storm, spark, akka, etc... Not all of these have been built out, but the platform was designed to be a generic framework to support write once execute everywhere.Summingbird is written to support Manhattan's model as well. The high level idea is to use versioning to determine whether a request is precomputed (batch), computed (realtime) or a hybrid (precomputed + computed). These are expressed as monads with basic functionality present in algebird. One way to bring support to this model to the open source world would be to implement storehaus bindings for elephantdb and to resurrect elephantdb or build a similar service to provide storage similar to Manhattan.Overall, very early yet promising work in the open source community.[edit: book is not about elephantdb, but is a critical component. modified wording. Also added link]

评论 #7517628 未加载

caniszczykabout 11 years ago

Corresponding Twitter Engineering blog post: <a href="https://blog.twitter.com/2014/manhattan-our-real-time-multi-tenant-distributed-database-for-twitter-scale" rel="nofollow">https://blog.twitter.com/2014/manhattan-our-real-time-multi-...</a>

评论 #7516999 未加载

评论 #7518005 未加载

teodimoffabout 11 years ago

I think its not an exaggeration at all. Twitter literally pulls data from thousands of servers but we are missing the point of Manhattan. As some of us know twitter services scale dynamic according the load they are serving and some engineers at Twitter decide to buy a hole truck of steroids to this idea. Here is a trick - We make container agents (storage services) "clients" of the Manhattan database(the core). They are Mesos processes which scale dynamic to the needs of the service (i.e 1 container = 10000 reads/writes per second, 2 containers = 20000 read/writes per second and so on) which allows the dynamic scaling of requests per second and writes per second. The core handles finding actual machines which have the data, replicating it and so on. There might be realtime storage service contaners which need fast data access, batch importer and timeseries they mention and so on. This requires a lot of gymnastics but offer a lot of nice features.The Manhattan database acts as virtual layer over thousands of machines and storage services allows for customized data manipulation. Cool...huh? According importance and scale (multi dc) this operates on i think it almost impossible to open source this. But who knows. miracles are happening now and then.

评论 #7519862 未加载

iLochabout 11 years ago

Interesting.. The database sounds almost too good to be true. I wonder if they'll open source this. They've done so in the past with projects like Storm, so I'm hopeful.

评论 #7520962 未加载

评论 #7517203 未加载

RcouF1uZ4gsCabout 11 years ago

Can someone enlighten me as to why 6000 tweets a second is something to make a big deal about? At 140 characters per message that comes out to 840,000 bytes/s < 1 Megabytes per second. In 2014 is a service that can handle 1 Megabytes/s impressive?

评论 #7517921 未加载

评论 #7517586 未加载

评论 #7517592 未加载

评论 #7517648 未加载

评论 #7517982 未加载

swahabout 11 years ago

I don't even look at databases before Aphyr verifies it they keep their promises...

评论 #7518831 未加载

swangabout 11 years ago

Article calls Gizzard, "strongly consistent" Gizzard's GitHub page says, "eventually consistent"[1]. What?[1] <a href="https://github.com/twitter/gizzard" rel="nofollow">https://github.com/twitter/gizzard</a>

评论 #7517874 未加载

dfcarneyabout 11 years ago

"Real-time" is a bit of a misnomer as far as databases are concerned, especially if you're talking about a system that defaults to eventual consistency. I think they would have been better off saying "high-availability".

评论 #7521401 未加载

feelstupidabout 11 years ago

Does anyone else find the opening statement a little misleading? Yes they originally come from one place, but they are sent from Twitter to the app of your choosing via JSON or similar. Sure there's going to be more than one request for the icon sprite and user avatars, but all from Twitter."When you open the Twitter app on your smartphone and all those tweets, links, icons, photos, and videos materialize in front of you, they’re not coming from one place. They’re coming from thousands of places."

评论 #7516950 未加载

hoodoofabout 11 years ago

So this is an internal system right? What is the point of telling the world if it's not available for anyone to look at or use. Perhaps a recruiting exercise?

评论 #7520257 未加载

haddrabout 11 years ago

Although I think that the article is interesting, I'm missing some details, like more engineering stuff, rather than high level details.

评论 #7518168 未加载

评论 #7517149 未加载

11 comments

ffkabout 11 years ago

评论 #7517628 未加载

caniszczykabout 11 years ago

评论 #7516999 未加载

评论 #7518005 未加载

teodimoffabout 11 years ago

评论 #7519862 未加载

iLochabout 11 years ago

Interesting.. The database sounds almost too good to be true. I wonder if they'll open source this. They've done so in the past with projects like Storm, so I'm hopeful.

评论 #7520962 未加载

评论 #7517203 未加载

RcouF1uZ4gsCabout 11 years ago

评论 #7517921 未加载

评论 #7517586 未加载

评论 #7517592 未加载

评论 #7517648 未加载

评论 #7517982 未加载

swahabout 11 years ago

I don't even look at databases before Aphyr verifies it they keep their promises...

评论 #7518831 未加载

swangabout 11 years ago

评论 #7517874 未加载

dfcarneyabout 11 years ago

评论 #7521401 未加载

feelstupidabout 11 years ago

评论 #7516950 未加载

hoodoofabout 11 years ago

So this is an internal system right? What is the point of telling the world if it's not available for anyone to look at or use. Perhaps a recruiting exercise?

评论 #7520257 未加载

haddrabout 11 years ago

Although I think that the article is interesting, I'm missing some details, like more engineering stuff, rather than high level details.

评论 #7518168 未加载

评论 #7517149 未加载

Manhattan: Real-time, multi-tenant distributed database for Twitter scale

11 comments

Manhattan: Real-time, multi-tenant distributed database for Twitter scale

11 comments