We reduced the cost of building Mastodon at Twitter-scale by 100x

954 点作者 tekacs超过 1 年前

82 条评论

> ...10k lines of code. This is 100x less code than the ~1M lines TwitterI wish I didn't see this comparison, which is not fair at all. Everyone in their right mind understands that the number of features is much less, that's why you have 10k lines.Add large-scale distributed live video support at the top of that, and you won't get any close to 10k lines. It's only one of many many examples. I really wish you compare Mastodon to Twitter 0.1 and don't do false advertising> 100M bots posting 3,500 times per second... to demonstrate its scaleI'm wondering why 100M bots post only 3500 times per second? Is it 3500 per second for each bot? Seems like it's not, since https termination will consume the most of resources in this case. So I'm afraid it's just not enough.When I worked in Statuspage, we had support of 50-100k requests per second, because this is how it works - you have spikes, and traffic which is not evenly distributed. TBH, if it's only 3500 per second total, then I have to admit it is not enough.

评论 #37139018 未加载

评论 #37140063 未加载

评论 #37139331 未加载

评论 #37142999 未加载

评论 #37141130 未加载

评论 #37141135 未加载

评论 #37140045 未加载

评论 #37142819 未加载

评论 #37141009 未加载

评论 #37144105 未加载

评论 #37139720 未加载

dataangel超过 1 年前

I do C++ backend work in a non-web industry and this entire post is Greek to me. Even though this is targeted at developers, you need a better pitch. I get "we did this 100x faster" but the obvious followup question is "how" but then the answer seems to be a ton of flow diagrams with way too many nodes that tell me approximately nothing and some handwaving about something called P-States that are basically defined to be entirely nebulous because they are any kind of data structure.I'm not saying there's nothing here, but I am adjacent to your core audience and I have no idea whether there is after reading your post. I think you are strongly assuming a shared basis where everybody has worked on the same kind of large scale web app before; I would find it much more useful to have an overview of, "This what you would usually do, here are the problems with it, here is what we do instead" with side by side code comparison of Rama vs what a newbie is likely to hack together with single instance postgres.

评论 #37139649 未加载

评论 #37139479 未加载

评论 #37140453 未加载

评论 #37144954 未加载

评论 #37139505 未加载

评论 #37141855 未加载

buro9超过 1 年前

Measuring "Twitter Scale" by tweets per second seems to be not how I would measure it.Updates per second to end users who follow the 7K tweets per second seems more realistic, it's the timelines and notifications that hurt, not the top of ingest tweets per second prior to the fan out... and then of course it's whether you can do that continuously so as not to back up on it.

评论 #37137989 未加载

mping超过 1 年前

Congrats on the (kinda) launch. I was curious to see what you guys were up to. The blog post is pretty detailed, and with good insights. Reducing modern app development complexity to mixing data structures sounds like a good abstraction. I'm sure you thought really hard about the building blocks of Rama and you know your problems better than most of the hn crowd.Now, the really hard part becomes selling. If companies start using your product to get ahead, that will be the real proof, otherwise its "just" tech that is good on paper.On a side note, did you guys got any inspiration from clojure? I see lots of interesting projects propping up from clojure people...Best of luck!

评论 #37137779 未加载

Pxtl超过 1 年前

I've seen many people describe frameworks like this - you know, first you have the slow back-end event-driven master database that you don't query live against, then you've got eventual-consistency flows against the various data-warehouses and data-stores and partitioned sharded databases in useful query-friendly layouts that you actually read live from... and I never see it clearly explained: how do you read a change back to the user literally just after they made the change? How do you say "other views eventual-consistency is fine but for this view of this bit of info we need it updated now".This write-up is very detailed but I couldn't find that explanation.

评论 #37137617 未加载

评论 #37137563 未加载

评论 #37137543 未加载

评论 #37138816 未加载

评论 #37149919 未加载

评论 #37139017 未加载

评论 #37142965 未加载

评论 #37137560 未加载

softwaredoug超过 1 年前

It’s a massive ask, even if the platform was 100x better, for all developers to give up every programming language and database they’ve ever used to depend on a startups closed source platform for all functionality.It’s hard enough trusting Google or Amazons cloud offerings won’t change.It seems that’s what they’re proposing right? What am I missing?

评论 #37137593 未加载

afro88超过 1 年前

Looks amazing and incredibly smart. But I found the LOC and implementation time comparisons to Twitter and Threads very disingenuous. It makes me wonder what other wool will be pulled over our eyes with Rama in future (or important real world details missed / future footguns).Still super impressive. Reminds me of when I discovered Elixir while building a social-ish music discovery app. Switching the backend from Rails to Elixir felt like putting on clothes that actually fit after wearing old sweats. Rama looks like a similar jump, but another layer up, encompassing system architecture.

评论 #37138270 未加载

评论 #37138461 未加载

sharms超过 1 年前

The performance on the example Mastodon instance is very responsive - almost anywhere I clicked loaded nearly instantly. I created an account and the only thing I found missing was it doesn't implement full text search unless my user was tagged, but that might be a Mastodon specific item.I think they have thought a lot about typical hard problems, such as having the timeline processing happen along side the pipeline, taking network / storage etc out of the picture. Nice work!

评论 #37137368 未加载

jitl超过 1 年前

This architecture seems very similar to existing offerings in the "in-memory data grid" category, like Apache Ignite and Hazelcast. I'm more familiar with Ignite (I built a toy Notion backend with it over a few afternoons in 2020).The way Ignite works overall is similar. You make a cluster of JVM processes, your data partitioned and replicated across the cluster, and you upload some JARs of business logic to the cluster to do things. Your business logic can specify locality so it runs on the same nodes as the relevant data, which ideally makes things a lot faster compared to systems where you need to pull all your data across the wire from a DB. Like Rama, Ignite uses a Java API for everything, including serializing and storing plain 'ol java objects.Ignite's architecture isn't focused on "ETL" into "PStates". Instead it's more about distributed "caches" of data. It does have streaming for ingestion (<a href="https://ignite.apache.org/docs/latest/data-streaming" rel="nofollow noreferrer">https://ignite.apache.org/docs/latest/data-streaming</a>), but you can transactionally update the datastore directly (<a href="https://ignite.apache.org/docs/latest/key-value-api/transactions" rel="nofollow noreferrer">https://ignite.apache.org/docs/latest/key-value-api/transact...</a>). It also has a "continuous query" feature for those reactive queries to retrieve data (<a href="https://ignite.apache.org/docs/latest/key-value-api/continuous-queries" rel="nofollow noreferrer">https://ignite.apache.org/docs/latest/key-value-api/continuo...</a>).Rama's data-structure oriented PState index seems easier to work with than building indexes yourself on top of Ignite's KV cache, but Ignite also offers an SQL language, so you can insert your data into the KV cache however, add some custom SQL functions, and then accept more flexible SQL querying of your data compared to the very purpose-built PCache things, but still be able to do lower-level or more performance-oriented logic with data locality.Anyways, if you like some of this stuff but want to use an existing, already battle-tested open source project, you can look for these "in-memory data grid", "distributed cache", kind of projects. There's a few more out there that have similar JVM cluster computing models.

评论 #37138823 未加载

clusterhacks超过 1 年前

I'm excited to see the docs for Rama. But I am also a little scared of the comment " I came to suspect a new programming paradigm was needed" from Nathan.It's not so much that I think the comment is wrong or anything, but rather that it seems so similar to what I have heard in the past from power-lisp (or Clojure in this case) super-smart engineers.I feel like we have reached a point in software development where "better" paradigms don't necessarily gain much adoption. But if Rama wins in the marketplace, that will be interesting. And I am quite excited to see what a smart tech leader and good team have been able to grind out given a years-long timeframe in this programming platform space . . .

评论 #37138158 未加载

ThinkBeat超过 1 年前

I am confused.This is meant to be hyped to sell your Rama platform/product/framework? That you have spent 10 years building in secret? During that time you have built a datastore and a Kafke competitor and ?Should not those 10 years be factored into the time it took to develop this technical demo?Is it 100x less code including every LOC in all of Rama?I mean I am sure you picked a use cast that is well suited to creating a Twitterish architecture implementation.If I went off and wrote a ThinkBeat platform for creating Twitterish systems and then created a Twitterish implementation on top if it, its real easy to reach low LOCs.

评论 #37140787 未加载

评论 #37141326 未加载

skybrian超过 1 年前

It sounds like interesting technology for someone, but I wonder more about scaling down. What does a developer instance running on a laptop look like?

评论 #37137753 未加载

failuser超过 1 年前

Is there a breakdown of effort Twitter spent doing the mastodon-level service (serving a feed of the accounts you are subscribed to) vs everything else like ads, algorithmic feed, moderation, fighting spam, copyright claims, localization, GR, PR, safety, etc?

miki123211超过 1 年前

Is this just me, or does the code in the post feel like they've implemented what should have been a new programming language on top of Java?Their "variables" have names that you have to keep as Java strings and pass to random functions. If you want composable code, you don't declare a function, you call .macro(). For control flow and loops, you don't use if and for, but a weird abstraction of theirs.I feel like this code could have been a lot simpler if it was written in a specialized language (or a mainstream language with a specialized transpiler and/or Macro capabilities.)I'd quote the old adage about every big program containing a slow and buggy implementation of Common Lisp, but considering that this thing is written in Clojure, the authors have probably heard it before.

评论 #37140818 未加载

kyle-rb超过 1 年前

Kinda disappointed by the simulation, where are all the viral posts?I've been digging around for a while and haven't found any posts with more than 20 faves. The accounts I've found with ~1 million followers have little to no engagement. I want to see how a post with a million faves holds up to the promises of "fast constant time".I'm especially curious about these queries — fave-count and has-user-faved — since a couple years ago Twitter stopped checking has-user-faved when rendering posts more than a month or so old, so I imagine it was expensive at scale.

评论 #37139734 未加载

NoraCodes超过 1 年前

I would argue that this is not "a Mastodon instance", since it is not running Mastodon - other than that, very very neat work! I'm excited for that "Source Code" link to be live :)

评论 #37137349 未加载

评论 #37138250 未加载

评论 #37137375 未加载

评论 #37137339 未加载

评论 #37137721 未加载

评论 #37138028 未加载

评论 #37137712 未加载

gfodor超过 1 年前

Something I'm immediately thinking about with this is change management and inertia at the early stages of a new, underdefined project. Less code is great, the big question is how such a system compares to the usual hack-and-slash method of getting a v1 up and running as you search for PMF from the perspectives of ops, cost, data migrations, rapid deployments, and so on. Presumably, the idea here is to start from the beginning with Rama, skipping over the usual "monolith fetches from RDBMS" happy paths, even for your basic prototype, this way you don't slip into a situation like Twitter did where that grew slowly into an unscalable monstrosity requiring a rewrite. So an article focused on the "easy" part that's required in the beginning of rapid change, as much as it's not as important as the "simple" part that shines later at scale, seems useful.

评论 #37138299 未加载

yayitswei超过 1 年前

For context, nathanmarz created what is now Apache Storm, which is used for stream processing at some of the world's largest companies, so he knows a thing or two about scale.

duped超过 1 年前

This is what they've been hyping on Twitter for a week?FWIW, why hype at all? Why "We'll more in a week. Then more in two weeks." Show the code today!

评论 #37138975 未加载

jvans超过 1 年前

I have often thought along similar lines, that the effort involved in building software seems to indicate a level of abstraction that is missing. The general theme of the comments seems roughly what you should expect from a very bold, paradigm shifting proposal. Good luck with your efforts and don't let this discourage you!I will make one minor suggestion that I hope is constructive. I found the post difficult to read, largely because you rapid fire introduce a bunch of completely new concepts and propose a solution to many problems at once. You make a passing comparison to "just event sourcing and materialized views", although this was the easiest way for me to understand what you are doing. Starting from event sourcing and materialized views puts the reader on a ground they already understand, and moving on from there to why rama is better/what it adds on top, would be an easier transition.

评论 #37153526 未加载

ltr_超过 1 年前

i always had this question: how realistically is to, having an standard spec and interoperable protocols, for toxic apps of big international tech companies that provides """services""", so instances of implementations can be maintained by municipalities or local tech business and talent with 100x less employees and money? what policies should be in place to achieve that? what would be the challenges? it would be better/healthier? is someone researching such things like transition to sustainable digital services? (sustainable in terms of local labor, privacy, economy, accountability, etc...)i mean if you think about this as public services not as a business, profit is secondary, and first is just to make the thing better and better for the users, no need for spying , no advertisement, no need for a rich piece of shit somewhere getting a piece of the money paid in your city for every taxi drive, food delivery or to give up privacy to a soulless/faceless entity just because you want to say something publicly or keep in touch with people. there is no disruption from their part, its just an old thing put on the internet, they are just in the middle of everyone's life, just sucking everything they can. is the actual state of affairs "efficient"?there must be fed up engineers and tech people everywhere with the sad state of IT industry.

评论 #37139164 未加载

endisneigh超过 1 年前

I don’t really see the point of the comparison. They should show something you could only make with Rama or show how much faster it is to iterate with Rama.Saying this is 100 or even a million times cheaper is like saying taking a picture of Sistine chapel and printing out copies is a trillion times cheaper than making it originally.Many of us on this site could make a number of products very efficiently and cheaply given a static and fixed set of requirements as well as an existing implementation for reference.That being said it was a very detailed post, so kudos for that, but it’s far too vague to be actionable. Why not just release the code and post simultaneously instead of just bragging about how little code was required?

sixo超过 1 年前

The comparisons to Twitter are completely goofy, but the architecture is nothing short of enlightened. Nice work.

beefnugs超过 1 年前

I think the marketing idea of this is amazing : I would probably never even consider learning and reading about such a framework if I heard of it straight up. But if you are really releasing a usable open source implementation of something performant that actually federates properly, that is a huge selling point that buys you a ton of respect up front.

评论 #37149248 未加载

rubiquity超过 1 年前

Noticeably missing are any details about concurrency control and replication or recovery protocols. A Twitter clone is one thing but any sort of application needing ACID Transactions is a whole other beast.

评论 #37138042 未加载

elisbce超过 1 年前

The real reason why we can't easily replicate Twitter/Facebook/Google is because we don't have the distributed storage/caching/logging/data processing/serving/job scheduling/... infrastructures that they have built internally that are designed to provide some level of guaranteed SLAs for the desired scale, performance, reliability and flexibility, not because it is hard to replicate the application logic like posting to timelines. That's also why Threads were built by a small team rather quickly -- they already have the battle-tested infras that can scale.Any attempt to build a simplified version of the ecosystem will face the same fundamental distributed system tradeoffs like consistency/reliability/flexibility/... For example, one of the simplifications may be mixing storage/serving/ETL workloads on the same node. And the consequence is that without certain level of performance isolation, it could impact the serving latency during expensive ETL workload.For Rama to be adopted successfully, I think it is important to identify areas where it has the most strengths, and low LOCs might not be the only thing that matters. For example, demonstrating why it is much better/easier than setting up Kafka/Spark and a database and build a Twitter clone on top of that while providing similar/better performance/reliability/extensibility/maintainability/... is a much stronger argument.

jauntywundrkind超过 1 年前

> The instance has 100M bots posting 3,500 times per second at 403 average fanout to demonstrate its scale.Mastodon has to send messages to each instance with a recipient. That server can then fan out to all it's subscribers. The way this point is worded makes me think all the bits are on just a single instance, meaning all the fan out can be dealt with internally without having to do any server-to-server at all.That is a fair comparison to Twitter, which is single instance. But it sounds like a much reduced ambition versus the task Mastodon has to do.

评论 #37138060 未加载

cduzz超过 1 年前

I would hazard the guess that twitter's "show tweets to other people" is 1/40th of the functionality piled into twitter; some other large functions would be things like "track ad sales" or "improve engagement" or "allow random law enforcement organizations to engage in whatever access is needed for any particular part of the world" Each of those is going to be a huge pile of code and all of it working together is going to N! your complexity.

raverbashing超过 1 年前

They deserve congrats for that since they built the load test to prove thisOf course, for actual production use, there's probably a lot of things still, but this is a very nice works nonetheless

评论 #37137443 未加载

gexla超过 1 年前

This is fascinating, and I'm glad to have come across it. And your story is inspiring. Thank you!One question, why Google Groups rather than something like Discord? Not sure I would trust Google Groups to be around long.

primitivesuave超过 1 年前

The "N bots posting X times/second" isn't a very meaningful statistic. A system's reliability is mostly characterized by its performance under stress.

NoahTheDuke超过 1 年前

Congrats! This looks super cool.Are there any plans for exposing a Clojure API? Given that it's implemented in Clojure, seems like it would be a natural fit. Interop with Java is nice but can be cumbersome compared to the more natural calling conventions and idioms (threading macros instead of `..` builder patterns, etc).

评论 #37139830 未加载

donavanm超过 1 年前

I appreciate the inversion/melding of the data model and compute. Im curious to know your perspective on two parts: How would multitenancy fit in to rama? Using your mastodon example, providing “hosted mastodon instances as a service”, where you _also_ allow for data governance, per customer encryption at rest, user IDP support, etc. Is it multiple single tenant rama deployments, running independent customers? Multitenant rama clusters with shared depos and each pstate includes “tenant id”? Something else?Second whats the product/business angle on customer confidence, technical novelty, and your business core competency? A dated example but Im thinking of somewhere like basho with riak. Super cool tool, takes some mental adjustment to “get”, challlenges selling hosting vs software vs pro services.

评论 #37153669 未加载

nevi-me超过 1 年前

It sounds like the authors spent 10 years building a Twitter factory, and now they can produce a Twitter faster than Twitter could produce it "by hand".

zubairq超过 1 年前

As someone who has worked in both startups and Enterprise IT for over 30 years (including large Java based systems) I see a use case for Rama in large companies who have a lot of difficulty glueing many different systems together to achieve scalability. So I think that Red Planet Labs could get several contracts in the 100k USD and over range in large Enterprises. This is for enterprises who have the problem of integrating many systems to achieve scalability and who are already large Java shops.However, I do not see Rama's initial market being startups, since they just want the simplest way possible to build UI + backend and want to iterate super fast with tech that their developers already know in the initial stages.

doublepg23超过 1 年前

HN seems to be putting you through the wringer, I for one am excited you guys made this and plan to open source it- it looks like a fantastic project.

评论 #37142125 未加载

评论 #37142939 未加载

runeks超过 1 年前

> To demonstrate the scale of our instance, we’re also running 100M bot accounts which continuously post statuses (Mastodon’s analogue of a “tweet”), replies, boosts (“retweet”), and favorites. 3,500 statuses are posted per second, the average number of followers for each post is 403, and the largest account has over 22M followers. As a comparison, Twitter serves 7,000 tweets per second at 700 average fanout (according to the numbers I could find).Is Twitters 7k tweets per second the average? If so, what’s the peak rate, and have you tested your system under this load?

wink超过 1 年前

Look, I don't want to defend Twitter but ignoring 15 years of changes and the whole journey of scaling and then using the cost op just building a snapshot of the 15y old version is pretty disingenuous.That's a bit like starting an Oracle clone now and summing up what they spent on developer salaries in the last 40 years. You basically can't not "reduce costs".And no "the original consumer product" is not a real cop-out, you probably still have tons of people building iterations.

beders超过 1 年前

This is a lovely and very detailed showcase in how to combine streaming+ETL+materalized-view+query!That said: You need better advisors. Your investors and/or the board gave you bad advice on how to publish these accomplishments and talk about them.I hope your go-to-market strategy works out a little better. Hyperbole is fine, but at least on hacker news, the audience is a bit careful with regards to grandiose statements.What might work well on an investor presentation might backfire when you target engineers as audience.

debadyutirc超过 1 年前

This is very interesting.I saw the Twitter post first and the blog next. The premise is compelling but it's been a promise made to the data and software world for decades together.The architecture and the core primitives are something that we agree with a lot. Use cases and business value are a whole different ballgame.We have invested the past 5 years at InfinyOn building Fluvio our open source rust implementation of core event streaming primitives which is implementing this architecture to orchestrate data as efficiently as computationally possible today. I am happy to see this project as an effort in the same direction.

_dwt超过 1 年前

Hmmm, "Rama is programmed entirely with a Java API – no custom languages or DSLs" according to the landing page, but this sure looks like an embedded DSL for dataflow graphs to me - Expr and Ops everywhere. Odd angle to take.

评论 #37137525 未加载

评论 #37137591 未加载

评论 #37137931 未加载

prepor超过 1 年前

Based on what I read it's very similar to Kafka Streams + batteries ([semi]automatic workload orchestration, reactive queries, higher-level/slicker/"smarter" API (?))Could you please compare Rama with Kafka Streams, especially from the point of view, if I would try to reimplement Rama API on top of Kafka Streams? What fundamental difficulties I'd face?

dustingetz超过 1 年前

Summarizing, now edited down with some editorializing for clarity:What is it? build web-scale reactive backends with an expressive java dataflow API. Instead of a database you develop your own custom app-specific indexes which are reactive, distributed and durable. It's like event sourcing and materialized views but integrated in a linearly scalable way.> I cannot emphasize enough how much interacting with indexes as regular data structures instead of magical “data models” liberates backend programming> It allows for true incremental reactivity from the backend up through the frontend. ... enable UI frameworks to be fully incremental instead of doing expensive diffs to find out what changed.Ok, so in my mind I am positioning this against Materialized / differential dataflow, whose key primitive is a efficient streaming incremental join that works across very large relational tables. Materialized makes SQL reactive, Rama gives you a java dataflow DSL for developing purpose-built reactive database indexes.How it works? 4 concepts: Depot, ETLs, PState, QueryDepots: "distributed, durable, and replicated logs of data." [Event streams?] "like Kafka except integrated" "All data coming into Rama comes in through depot appends."ETLs: data arrives via depots, and is ETLed to PStates via "a Java dataflow API for coding topologies that is extremely expressive". "Most of the time spent programming Rama is spent making ETLs."PStates seem like reactive data structures that are also durable/replicated, these are meant to supersede your database and indexes, letting you build custom purpose-built indexes that contain 100M elements:> “partitioned states” are how data is indexed in Rama ... Unlike existing databases, which have rigid indexing models (e.g. “key-value”, “relational”, “column-oriented”, “document”, “graph”, etc.), PStates have a flexible indexing model. In fact, they have an indexing model already familiar to every programmer: data structures. A PState is an arbitrary combination of data structures. ... nested data structures can efficiently contain hundreds of millions of elements. For example, a “map of maps” is equivalent to a “document database”, and a “map of subindexed sorted maps” is equivalent to a “column-oriented database”. Any [composition] is valid – e.g. you can have a “map of lists of subindexed maps of lists of subindexed sets”.Query: once you develop PStates to aggregate relevant data into a custom index of the right ... shape?, query seems sorta like GraphQL selectors over your custom index:> Queries in Rama take advantage of the data structure orientation of PStates with a “path-based” API that allows you to concisely fetch and aggregate data from a single partition> “query topologies” ... real-time distributed querying and aggregation over an arbitrary collection of PStates. These are the analogue of “predefined queries” in traditional databases, except programmed via the same Java API as used to program ETLs and far more capable.

评论 #37144983 未加载

mariusor超过 1 年前

But this is not a Mastodon instance.It's something else that maybe speaks the Mastodon API and/or ActivityPub, but we don't know since it doesn't really federate with anyone.I commend the effort to try to make happen a non-open fediverse service, but appropriating the Mastodon name is just wrong. You should know better.

alexcpn超过 1 年前

Why choose Java of all languages. Why not something more modern and less verbose like Go or Rust. Just asking as I have worked enough in Java and then spend a lot of time in GC tunining. Granted the code was not that great and from a diverse team with different skill levels causing all the leaks.. But still

评论 #37144434 未加载

Huhuhn超过 1 年前

Would this framework also be useful for building a Lemmy instance?

samsquire超过 1 年前

Thank you for sharing this!I am actually really impressed. Well done! Good work!There's lots of interesting lessons and knowledge in the design of this platform.I also like how you've decided to use Java as your API rather than Clojure.I hope you're not discouraged by HN's reaction to your hard work. Don't be discouraged!

评论 #37153685 未加载

whateverman23超过 1 年前

ctrl+f "ads"ctrl+f "monetization"ctrl+f "moderation"ctrl+f "existing infrastructure"ctrl+f "personalization"etc etcYeah about what I expect from a "we rebuilt twitter for cheap" post. There's no point to the comparisons with the Twitter codebase size/cost. It completely distracts from what is probably a perfectly fine project.

评论 #37137719 未加载

评论 #37138010 未加载

FridgeSeal超过 1 年前

Semi-related: Their homepage (<a href="https://redplanetlabs.com/" rel="nofollow noreferrer">https://redplanetlabs.com/</a>) has to be one of the best looking websites I’ve seen in a while, buttery smooth as well. I love it.

评论 #37139628 未加载

jonstewart超过 1 年前

How does Rama differ from Flink and timely-/differential-dataflow/Materialize?I see “microbatching” in the diagram and, maybe this isn’t a fair take, but it feels more 2013 than 2023.

Ryan_HD超过 1 年前

Another try of Event Sourcing + CQRS. I thought it was great but after so many years it's still out of main stream. Lack of an integrated platform may contribute to, but can't explain everything.I guess most people can't accept things which is fundamentally harder in such architecture than normal ones.

j45超过 1 年前

Just on system design alone this was enjoyable to read.Clever architecture can help as much if not more than clever coding especially when keeping it simple but scalable is needed.

DigitalSea超过 1 年前

One of these posts. Dig into the numbers and claims, and you'll see that they're not building something anywhere near Twitter scale.

RHSman2超过 1 年前

Easy to copy something that has been done before and knowing what you want, need and responding to expected market traffic.

romgrk超过 1 年前

Really nice ideas here. The crucial advantage is having the storage+computation run as close as possible, which is a big advantage over a regular DB+app backend.But I won't ever consider investing in it unless it's some form of open-source. It's too much of a risk to have a closed-source core.

elwell超过 1 年前

No Clojure?EDIT: Oh, I see in comments: "The customer API in Java, and the implementation of that API is in Clojure"

runeks超过 1 年前

Sounds like Rama is also useful for small scale applications (where high scalability isn’t needed), since it simplifies how they’re implemented.Is this the case — ie. would a TodoMVC app implemented in Rama also be much simpler than a traditional frontend/backend/database CRUD implementation?

评论 #37151297 未加载

chiefalchemist超过 1 年前

"We stood on the shoulders of giants..."X years from now "We reduced the cost of building _____ at Mastodon-scale by 1000x".It's certainly interesting, certainly an accomplishment, but it's also the nature of the game. The present eating the past, to be eaten by the future. Rinse. Repeat.

stuaxo超过 1 年前

Lovely, I could see this paradigm spreading to other languages, something was definitely needed.

crenwick超过 1 年前

Congrats on the launch. Great read.Is there any rough infrastructure cost comparison?Excluding the cost of engineering effort, which I understand is the major pitch.

2Gkashmiri超过 1 年前

whats the server specs of this demo running at?is it baremetal?vps?how about doing a comparison on consumer grade vps like 1 vcpu/4GB ram setup comparison between your product and mastodon or pleroma for example?i mean sure you can build a twitter scale product but federation means people can do that on their own and with your tech, they dont have to worry about scaling issues.

boredumb超过 1 年前

neat read but I was expecting to read about twitter migrating and literally 100x savings being had.

评论 #37138607 未加载

kennydude超过 1 年前

You didn't build Mastodon at Twitter-scale.You built a Mastodon-compatible clone in Spring/Reactor.

freecodyx超过 1 年前

Less code excluding dependencies

ketang超过 1 年前

It's like Apache Camel if camels had 5 humps and 11 legs.

rugina超过 1 年前

Can you implement a web shop like Amazon using Rama?

lionkor超过 1 年前

Very interesting, looking forward to reading the docs once they come out.Why Java?

评论 #37138327 未加载

评论 #37142449 未加载

throwaway892238超过 1 年前

Are people still thinking social media is a good thing? Hasn't every study showed that it's terrible for everybody?

mlindner超过 1 年前

But it’s not at Twitter’s scale?

say_it_as_it_is超过 1 年前

need to port this to Go...

itissid超过 1 年前

TL;DR: Chat GPT summary of 5 "pages" of the thing: <a href="https://chat.openai.com/share/bd6eac38-5bac-4c6f-b405-7ca7d8a9213e" rel="nofollow noreferrer">https://chat.openai.com/share/bd6eac38-5bac-4c6f-b405-7ca7d8...</a>

ceejayoz超过 1 年前

Headline: "building Twitter at Twitter-scale"Article: "building Mastodon at sub-Twitter-scale"

评论 #37137407 未加载

评论 #37137406 未加载

评论 #37137384 未加载

phillipcarter超过 1 年前

In the year of our lord 2023, people are still launching immature products with "we built a clone of a tiny subset of Twitter" as their use case? Come on. Twitter is huge because they have to support a huge number of use cases. Using this proprietary framework won't magically make complex use cases go away.

评论 #37137634 未加载

评论 #37137914 未加载

polishdude20超过 1 年前

"We spent nine person-months building our scalable Mastodon instance. "Nono, you can't say that when later on you say it's built on top of Rama. You literally spent 10 years building the framework to even make this.And yes, you built this in 10k lines of code but how many lines of code is Rama? This seems disingenuous.

评论 #37137804 未加载

评论 #37138044 未加载

评论 #37139176 未加载

评论 #37137847 未加载

评论 #37143037 未加载

评论 #37138739 未加载

评论 #37137900 未加载

sandGorgon超过 1 年前

nice! is this is cloudflare worker & block storage built in Java ?

reset2023超过 1 年前

Don't worry, chatGPT will do the same think in 1000 lines

sourcecodeplz超过 1 年前

Who cares. Mastodon was/is destined to fail. Trigger happy mods ban you from a server, then you're banned from a bunch.

评论 #37140426 未加载

评论 #37139847 未加载

LeifCarrotson超过 1 年前

> How is it possible that we’ve reduced the cost of building scalable applications by multiple orders of magnitude?> You can begin to understand this by starting with a simple observation: you can describe Mastodon (or Twitter, Reddit, Slack, Gmail, Uber, etc.) in total detail in a matter of hours. It has profiles, follows, timelines, statuses, replies, boosts, hashtags, search, follow suggestions, and so on. It doesn’t take that long to describe all the actions you can take on Mastodon and what those actions do. So the real question you should be asking is: given that software is entirely abstraction and automation, why does it take so long to build something you can describe in hours?> At its core Rama is a coherent set of abstractions...This conclusion is alarming to read from a company that's trying to sell a new platform. The vast majority of the work in building Twitter or Reddit is not about building a coherent set of abstractions, it's working with an often incoherent reality, dealing with a myriad of laws that describe, as if your web app were a human clerk at a post office, how to handle PII and credit cards and CSAM filters and audits and copyright claims and on and on...I'm honestly shocked that the technical implementation of a simplified, coherent platform took a full 9 person-months. That shouldn't be the hard part. What I'd want to know as a prospective customer is how you handle exceptions to your beautiful, idealized architecture, when some foreign country requires that you only store comments posted by their citizens within their borders or something like that.

评论 #37137417 未加载

评论 #37137568 未加载

评论 #37137539 未加载

评论 #37137469 未加载

riffic超过 1 年前

the group involved here may want to be mindful of the Mastodon gGmbH trademarks. Using the Mastodon logo on redplanetlabs.com to pitch a reimplementation of ActivityPub might be seen as infringing.<a href="https://joinmastodon.org/trademark" rel="nofollow noreferrer">https://joinmastodon.org/trademark</a>removed part about the mastodon subreddit since this is clearly not about the Mastodon software per se.

评论 #37137887 未加载

throwaway7382超过 1 年前

Their big reveal after 10 years is "keep waiting".Move along, nothing to see here.

评论 #37138099 未加载

trollied超过 1 年前

> We spent nine person-months building our scalable Mastodon instance.+ the time spent creating Rama, the platform that enables it.Very dishonest leaving that out.

评论 #37137496 未加载

评论 #37137524 未加载

MisterBastahrd超过 1 年前

What one finds useful from a web application and what the web application actually is are usually two entirely different things.I work in marketing automation, and I guess I have in one way or another my entire career. The clients who need to use the platform to communicate with their own clients over social networking may never touch our print delivery system, but that doesn't mean that print delivery doesn't exist or isn't important.If you are unwilling to recreate the totality of the application in terms of functionality, then you are lying if you say that you have recreated it.

评论 #37137690 未加载

82 条评论

RomanPushkin超过 1 年前

评论 #37139018 未加载

评论 #37140063 未加载

评论 #37139331 未加载

评论 #37142999 未加载

评论 #37141130 未加载

评论 #37141135 未加载

评论 #37140045 未加载

评论 #37142819 未加载

评论 #37141009 未加载

评论 #37144105 未加载

评论 #37139720 未加载

dataangel超过 1 年前

评论 #37139649 未加载

评论 #37139479 未加载

评论 #37140453 未加载

评论 #37144954 未加载

评论 #37139505 未加载

评论 #37141855 未加载

buro9超过 1 年前

评论 #37137989 未加载

mping超过 1 年前

评论 #37137779 未加载

Pxtl超过 1 年前

评论 #37137617 未加载

评论 #37137563 未加载

评论 #37137543 未加载

评论 #37138816 未加载

评论 #37149919 未加载

评论 #37139017 未加载

评论 #37142965 未加载

评论 #37137560 未加载

softwaredoug超过 1 年前

评论 #37137593 未加载

afro88超过 1 年前

评论 #37138270 未加载

评论 #37138461 未加载

sharms超过 1 年前

评论 #37137368 未加载

jitl超过 1 年前

评论 #37138823 未加载

clusterhacks超过 1 年前

评论 #37138158 未加载

ThinkBeat超过 1 年前

评论 #37140787 未加载

评论 #37141326 未加载

skybrian超过 1 年前

It sounds like interesting technology for someone, but I wonder more about scaling down. What does a developer instance running on a laptop look like?

评论 #37137753 未加载

failuser超过 1 年前

miki123211超过 1 年前

评论 #37140818 未加载

kyle-rb超过 1 年前

评论 #37139734 未加载

NoraCodes超过 1 年前

I would argue that this is not "a Mastodon instance", since it is not running Mastodon - other than that, very very neat work! I'm excited for that "Source Code" link to be live :)

评论 #37137349 未加载

评论 #37138250 未加载

评论 #37137375 未加载

评论 #37137339 未加载

评论 #37137721 未加载

评论 #37138028 未加载

评论 #37137712 未加载

gfodor超过 1 年前

评论 #37138299 未加载

yayitswei超过 1 年前

For context, nathanmarz created what is now Apache Storm, which is used for stream processing at some of the world's largest companies, so he knows a thing or two about scale.

duped超过 1 年前

This is what they've been hyping on Twitter for a week?FWIW, why hype at all? Why "We'll more in a week. Then more in two weeks." Show the code today!

评论 #37138975 未加载

jvans超过 1 年前

评论 #37153526 未加载

ltr_超过 1 年前

评论 #37139164 未加载

endisneigh超过 1 年前

sixo超过 1 年前

The comparisons to Twitter are completely goofy, but the architecture is nothing short of enlightened. Nice work.

beefnugs超过 1 年前

评论 #37149248 未加载

rubiquity超过 1 年前

评论 #37138042 未加载

elisbce超过 1 年前

jauntywundrkind超过 1 年前

评论 #37138060 未加载

cduzz超过 1 年前

raverbashing超过 1 年前

They deserve congrats for that since they built the load test to prove thisOf course, for actual production use, there's probably a lot of things still, but this is a very nice works nonetheless

评论 #37137443 未加载

gexla超过 1 年前

primitivesuave超过 1 年前

The "N bots posting X times/second" isn't a very meaningful statistic. A system's reliability is mostly characterized by its performance under stress.

NoahTheDuke超过 1 年前

评论 #37139830 未加载

donavanm超过 1 年前

评论 #37153669 未加载

nevi-me超过 1 年前

It sounds like the authors spent 10 years building a Twitter factory, and now they can produce a Twitter faster than Twitter could produce it "by hand".

zubairq超过 1 年前

doublepg23超过 1 年前

HN seems to be putting you through the wringer, I for one am excited you guys made this and plan to open source it- it looks like a fantastic project.

评论 #37142125 未加载

评论 #37142939 未加载

runeks超过 1 年前

wink超过 1 年前

beders超过 1 年前

debadyutirc超过 1 年前

_dwt超过 1 年前

评论 #37137525 未加载

评论 #37137591 未加载

评论 #37137931 未加载

prepor超过 1 年前

dustingetz超过 1 年前

评论 #37144983 未加载

mariusor超过 1 年前

alexcpn超过 1 年前

评论 #37144434 未加载

Huhuhn超过 1 年前

Would this framework also be useful for building a Lemmy instance?

samsquire超过 1 年前

评论 #37153685 未加载

whateverman23超过 1 年前

评论 #37137719 未加载

评论 #37138010 未加载

FridgeSeal超过 1 年前

评论 #37139628 未加载

jonstewart超过 1 年前

How does Rama differ from Flink and timely-/differential-dataflow/Materialize?I see “microbatching” in the diagram and, maybe this isn’t a fair take, but it feels more 2013 than 2023.

Ryan_HD超过 1 年前

j45超过 1 年前

Just on system design alone this was enjoyable to read.Clever architecture can help as much if not more than clever coding especially when keeping it simple but scalable is needed.

DigitalSea超过 1 年前

One of these posts. Dig into the numbers and claims, and you'll see that they're not building something anywhere near Twitter scale.

RHSman2超过 1 年前

Easy to copy something that has been done before and knowing what you want, need and responding to expected market traffic.

romgrk超过 1 年前

elwell超过 1 年前

No Clojure?EDIT: Oh, I see in comments: "The customer API in Java, and the implementation of that API is in Clojure"

runeks超过 1 年前

评论 #37151297 未加载

chiefalchemist超过 1 年前

stuaxo超过 1 年前

Lovely, I could see this paradigm spreading to other languages, something was definitely needed.

crenwick超过 1 年前

Congrats on the launch. Great read.Is there any rough infrastructure cost comparison?Excluding the cost of engineering effort, which I understand is the major pitch.

2Gkashmiri超过 1 年前

boredumb超过 1 年前

neat read but I was expecting to read about twitter migrating and literally 100x savings being had.

评论 #37138607 未加载

kennydude超过 1 年前

You didn't build Mastodon at Twitter-scale.You built a Mastodon-compatible clone in Spring/Reactor.

freecodyx超过 1 年前

Less code excluding dependencies

ketang超过 1 年前

It's like Apache Camel if camels had 5 humps and 11 legs.

rugina超过 1 年前

Can you implement a web shop like Amazon using Rama?

lionkor超过 1 年前

Very interesting, looking forward to reading the docs once they come out.Why Java?

评论 #37138327 未加载

评论 #37142449 未加载

throwaway892238超过 1 年前

Are people still thinking social media is a good thing? Hasn't every study showed that it's terrible for everybody?

mlindner超过 1 年前

But it’s not at Twitter’s scale?

say_it_as_it_is超过 1 年前

need to port this to Go...

itissid超过 1 年前

ceejayoz超过 1 年前

Headline: "building Twitter at Twitter-scale"Article: "building Mastodon at sub-Twitter-scale"

评论 #37137407 未加载

评论 #37137406 未加载

评论 #37137384 未加载

phillipcarter超过 1 年前

评论 #37137634 未加载

评论 #37137914 未加载

polishdude20超过 1 年前

评论 #37137804 未加载

评论 #37138044 未加载

评论 #37139176 未加载

评论 #37137847 未加载

评论 #37143037 未加载

评论 #37138739 未加载

评论 #37137900 未加载

sandGorgon超过 1 年前

nice! is this is cloudflare worker & block storage built in Java ?

reset2023超过 1 年前

Don't worry, chatGPT will do the same think in 1000 lines

sourcecodeplz超过 1 年前

Who cares. Mastodon was/is destined to fail. Trigger happy mods ban you from a server, then you're banned from a bunch.

评论 #37140426 未加载

评论 #37139847 未加载

LeifCarrotson超过 1 年前

评论 #37137417 未加载

评论 #37137568 未加载

评论 #37137539 未加载

评论 #37137469 未加载

riffic超过 1 年前

评论 #37137887 未加载

throwaway7382超过 1 年前

Their big reveal after 10 years is "keep waiting".Move along, nothing to see here.

评论 #37138099 未加载

trollied超过 1 年前

> We spent nine person-months building our scalable Mastodon instance.+ the time spent creating Rama, the platform that enables it.Very dishonest leaving that out.

评论 #37137496 未加载

评论 #37137524 未加载

MisterBastahrd超过 1 年前

评论 #37137690 未加载