The Architecture of Serverless Data Systems

147 点作者 orangechairs超过 1 年前

7 条评论

Dave3of5超过 1 年前

A bit too waffling for me to read all but I would like to make a small comment.Why are more and more devs trying to use s3 as a general purpose DB?Working on a system right now where the architects have made this mistake it has insanely poor performance (High latency) and lack any proper ACID compliance. I've now been asked to "make it faster" and the answer is to switch back to an actual DBMS.> Top tier SaaS services like S3 are able to deliver amazing simplicity, reliability, durability, scalability, and low price because their technologies are structurally oriented to deliver those things. Serving customers over large resource pools provides unparalleled efficiency and reliability at scaleIn terms of simplicity using s3 is anything but simple. Sure the CRUD api is simple but there are a bunch of gotchas. What about transactionality, partial updates, running multi document queries, consistency of the whole set of documents. You have to rewrite a whole DBMS on top of s3 itself or use redshift to get these things.In terms of scalability there are, limits 3500rps per key prefix.It's actually not lower price than a DBMS when you have a lot of data.

评论 #38276168 未加载

评论 #38285099 未加载

评论 #38275091 未加载

bob1029超过 1 年前

See also - Sql Server Hyperscale. We've been using this for about 18 months now and it feels like it has saved us a lot of hassle.<a href="https://learn.microsoft.com/en-us/azure/azure-sql/database/hyperscale-architecture" rel="nofollow noreferrer">https://learn.microsoft.com/en-us/azure/azure-sql/database/h...</a>The only downside we can spot so far is the presence of a 100mb/s throttle for txn log writes in order to satisfy replication requirements. Beyond this, it is indistinguishable from an express instance on a local dev machine. You lose some of the other on-prem features when you go managed, but most new applications don't need that stuff anymore. The message broker pieces are the only ones I miss, but there are a lot of other managed options for that, and you can still DIY with a simple Messages table and 3-4 stored procedures.On the read & reporting side, I see no downsides. You mostly get OLAP+OLTP in the same abstraction without much headache. If someone really wanted to go absolutely bananas with reporting queries, data mining, AI crap, whatever, you could give them their own geo replica in a completely different region of the planet. Just make sure they aren't doing any non-queries and everything should be fine.For large binary data, we rely on external blob storage and URLs. The txn log write limit shouldn't feel like much of a restriction if you are using the right tools for each part of the job. Think about how many blob URLs you could fit within 100 megabytes. If you make assumptions about URL structure, you can increase this by a factor of 2-3x.

nathants超过 1 年前

serverless compute and storage are a great thing. the chronic silverbullitus that plagues industry since the dawn of time doesn’t change that. nonsensically bad systems will be built.lambda is very reliable, more so than ec2. for serious systems, use it to manage servers.s3 and dynamo are the same thing with different settings. yes dynamo also adds a kitchen sink, but the only feature you should use is CAS.s3 is x10 cheaper for storage, x10 more expensive per request, x10 slower per request. dynamo is the opposite.many great system designs can run properly serverless, ie without any ec2 or ec2-spot. they are simpler. serious systems require you to understand what lambda/s3/dynamo give you and what they do not.more systems can be designed by adding ec2 and/or ec2-spot. the same understanding is required.s3/dynamo are equidistant from every point within that region. there is no cross az bandwidth cost. there is no bottleneck. there is no contention. a lot of cool designs fall out of this.lambda can burst to thousands of cpus in a second, for a second.ec2-spot boots in 30s, and often has very large nvme physically attached.there’s nothing fundamentally wrong with misusing all these tools and building inefficient systems. the builders will probably do better on their next system. if the owners wanted it done better initially, they could have hired more expensive builders.

fergie超过 1 年前

Two observations:- "serverless" is a really bad name for these systems. As is often commented, some variation of "somebody-elses-server" would be better.- Cost wasn't mentioned in the article, but the cost of renting databases and search-indices is still really high, even though these technologies are no longer the new hotness.

评论 #38276017 未加载

评论 #38274674 未加载

评论 #38274428 未加载

评论 #38274442 未加载

评论 #38278518 未加载

评论 #38274394 未加载

评论 #38276011 未加载

评论 #38274929 未加载

threeseed超过 1 年前

I wasn't aware of 800G/1TB networking before.Very strange world that transferring data between servers is 2x as fast as reading from a PCI bus.

评论 #38273723 未加载

评论 #38276163 未加载

socketcluster超过 1 年前

I'm currently working on a server-less, no-code multi-tenant platform. I'm still unsure if I should aim for full no-code or go for low-code. So far it's possible to build complex apps with it using only HTML tags (web components). Although it also exposes a CRUD interface, I haven't promoted this aspect as I feel it detracts from the huge time-saving and maintenance benefits which come with building apps using only declarative HTML + CSS.The other thing I've been careful about is to ensure that the backend is fully no-code. As soon as you allow the user to execute custom code on your backend, it opens up security risks with multi-tenancy. The risk doesn't fully go away when you containerize as vulnerabilities have been encountered in the past in Docker which allow escaping the sandbox.In my case, although the user can customize back end behavior, they can only do so in a highly constrained way using well defined parameters, not custom code. It saves a lot of effort not having to write a VM or restrict each container to a single host.

评论 #38274530 未加载

irq-1超过 1 年前

> There is also V8 isolates where tenants can share the same V8 process but in separate lightweight contexts, though I haven’t yet seen this in data systems.Cloudflare does this. <a href="https://developers.cloudflare.com/durable-objects/api/transactional-storage-api/" rel="nofollow noreferrer">https://developers.cloudflare.com/durable-objects/api/transa...</a>