Something that turned me off of GPU lambda services is that they don't offer a way to run everything locally. I have an instance with a GPU where I do my dev. I'm running Postgres, the front end, back end (Node), and my GPU worker thread (Python) on that box. Replicate and other offerings do not let me run the full hosted environment on my machine (you can run the service container but it behaves differently as there's no included router/scheduler). It feels wrong to use a magically instantiated box for my GPU worker.<p>Really all that I want is for Render.com to have GPU instances (I saw fly.io now has GPUs which is great, but I've both heard bad things about their stability and don't care for their rethinking of server architecture). Please will someone give me PaaS web hosting with GPU instances?<p>I'm a simple man. If there's a fundamental shift in hosting philosophy I will resist that change. I have loved docker and PaaS as revolutions in development and hosting experiences because the interface is at some level still just running Linux processes like I do on my computer. You can tell me that now my code is hosted in a serverless runtime. But you need to give me that runtime so that I can spin it up on my own computer, on EKS, or whatever if need be.
Lambda is fundamentally a request/response architecture and is meant to be tied together with several other AWS services. As such, I don't think Modals offering is really comparable, nor is "lambda on hard mode" a particularly good description for what they've made.<p>Perhaps "EC2 on easy mode" is more like it.
I've been transitioning some compute-heavy workloads from Lambda/AWS Batch to modal recently and have nothing but good things to say about it. One of those technologies where you are shipping the same afternoon as checking it out. "Wow, that just <i>works</i>?" Highly highly recommended, feels like the future IMO.
As the other commenter points out, this offering isn't quite comparable to Lambda directly. This ends up comparing apples to oranges here and there, but overall I was able to get a good idea of the choices made and the tradeoffs involved. Nice work! I do have a complaint about the comparison table that shows 'convert HTTP to function calls' as an alternative to load balancers and reverse proxies: as we see later in the article, there is still a load balancer involved, and that table creates a false impression that there isn't.
I don’t get it. A lot of this post is describing how they translate HTTP requests into function calls, and deal with issues around the HTTP protocol.<p>If you’re invoking a heavy-duty, long running function then you’re likely doing something bespoke. Why use HTTP at all? Wouldn’t GRPC be a better fit, because that seems to be what is being reinvented here.<p>The selling point of HTTP is that it’s ubiquitous and simple. But you’re coupling it with an offering that is specific and complex. Is using a GRPC library such a burden that makes this effort worthwhile?
Their quoted limits on AWS seem quite off.<p>> As of 2024, they can only use 3 CPUs (6 threads) and 10 GB of memory<p>Actually you get 1vCPU (eg a hyperthread) per 1769MB of memory[1]<p>[1] -<a href="https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console" rel="nofollow">https://docs.aws.amazon.com/lambda/latest/dg/configuration-f...</a><p>> Response bandwidth is 2 Mbps<p>This is shockingly low, and I wouldn't believe it without data. 16Mbps (2MB/s) would be more believable. In my experience you can reliably get 25MB/s (~400Mbps) in the network layer of things in AWS.
The services compared aren't really equivalents - Cloudflare Workers is more like Lambda@Edge, in particular I just can't imagine a reason you'd need it for the NN training/video processing/the background job type tasks it mentioned; and Google Cloud Run I'm less familiar with, but isn't it more like AppRunner or Fargate?
FYI the hyperlink on text "traditionally understood" is giving 404 (<a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-3.pdf" rel="nofollow">https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-...</a>)
For workloads that take minutes to run, it's very easy to hit max socket connection limit somaxconn. How do you handle that with just request response model?