Perhaps I misread, but it sounds like the author never managed to get Fargate to run without even a minimum of 3% failures?<p>That to me sounds like an astoundingly high number, especially for the amount of resources dedicated to the test (400GB of memory and 200 CPUs). Hell, if you're monitoring uptime, that's one nine (97% success).<p>I can't believe that this is a well-configured setup. Running 50 instances with four CPUs each at a total of ~100rps means that half of the CPUs are probably doing exactly nothing (and if my understanding is correct and they're using Flask in a single-threaded way, 150 of the 200 CPUs are going to be idle).<p>Triggering SNS is an API call. Assuming that's all the test application is doing, you hardly need one server to do this. I'd bet that you could make 100 simultaneous API calls from a stock Macbook Pro with a small handful of Node or Go processes without even making your CPU spin up.<p>If Fargate can't handle 100rps (to make a single API call per request) with <10 instances, it's a useless product. But I find it hard to believe that Amazon would put something so absolutely incapable into the wild. With the specs the author put up, that's the equivalent of ~$10/hr (if I'm reading their pricing page correctly). You could run 380 A1 instances, 58 t3.xl instances, or two m5a.24xlarge instances for that price.
I tend to think of these services as fulfilling different use cases.<p>For Fargate, the ideal scenario in my workloads is async background processing. Add tasks to SQS, Fargate pulls tasks off and does the job. Elastically scales up or down and lots of flexibility on machine specs. Ok with some failure rate.<p>For AWS Lambda, recently I like the combo of adding a Cloudflare worker in front and using it as an API gateway. More flexibility on routing, faster performance, and reverse proxy for free which can be good for SEO. And you get all the goodies of Cloudflare like DDOS protection and CDN.
This benchmark is all over the place; there are way too many variables changing to make any sorts of conclusions. You have flask-gunicorn-meinheld stack on the fargate side, all bits that might impact performance. Then there are the failing requests that alone would disqualify the comparison. I would also assume ApiGW to be slower than plain ELB, although I don't know for sure and this comparison certainly did give any information about that. No comparison of different lambda/fargate instance types, nor any mention about costs. Obviously just throwing more money should get better perf, so perf/$ is sort of important.
Unless I misread the article, but the writer makes a claim of having to support 100 req/sec with about 50 instances. "For Fargate, this meant deploying 50 instances of my container with pretty beefy settings — 8 GB of memory and 4 full CPU units per container instance.". If the api service was just returning some data with a few manupilations, then that is quite inefficient. At NodeChef ( <a href="https://www.nodechef.com" rel="nofollow">https://www.nodechef.com</a> ), there are users running over 1000 req/sec with just around 12 containers each with 512 MB ram and 2 CPUs.
As usual with such benchmarks, I miss performance comparisons over time. Did Fargate outperform the other two solutions because it's generally faster or because API Gateway was just slow during the few minutes that benchmark took?<p>What'd also be interesting here would be a price comparison. Without having done the math I'd expect Fargate to be significantly more expensive than the other solutions, which'd make a nice trade-off of cost vs. performance: If performance matters choose Fargate, if cost matters choose API Gateway as service proxy.
im confused how the author thinks it's ok that 10% of requests to fargate are failing because they don't know how to configure docker / flask?
Helpful analysis. Thanks. I was planning to do something like this soon. I’m interested in the performance difference between the same docker image running on fargate vs ECS with a c5.large EC2 backing instance. My initial tests ( a few months ago before they started using firecracker for fargate) have showed at least a 200% performance increase. I also noticed fargate performance really depends on the processor AWS allocates to the fargate task. And the processor is not always the same between identical tasks. Would be worth checking if there are significant differences between fargate with different underlying CPU types.
This was so bad, I'm not even sure where to start.
As a general advice when you see 4cores/8GB doing 100rps~~ there is some serious problem somewhere, every dev should know that this kind of instance doing that kind of benchmark ( simple REST calls ) should be in the ballpark of 4-5 digits not 3.
I think people are overly concerned with 'performance'.<p>So long as the approaches meet real world thresholds for specific things, particularly the speed of certain time-sensitive requests - then the technology is 'viable'.<p>(And let's also assume 'reliability' as a key component of 'viability')<p>Once the tech is 'viable' - it's really a whole host of other concerns that we want to look at.<p>#1 I think would be the ability of the tech to support dynamic and changing needs of product development.<p>Something that is easy, fewer moving parts, smaller API, requires less interference and support from Devops - this is worth a lot. Strategically - it may be the most valuable thing for most growing companies.<p>'Complexity' in all it's various forms represent a kind of constant barrier, a force that the company is going to have to fight against to make customers happy. This is the thing we want to minimize.<p>Obviously, issues such as switching costs and the 'proprietary trap' are an issue, and of course 'total cost of operations' i.e. the cost of the services are important basis of comparison, but even the later is an issue later on for the company, once it reachers maturity. (i.e. A 'Dropbox' type company should definitely start in the cloud, and not until they have the kind of scale that warrants unit-cost scrutiny would they consider building their own ifra)<p>In the big picture of 'total cost of ownership' - it's the ability of the system to meet the needs of product development, now 'how fast or cheap' it is, that's really the point.<p>Something that is 2x the cost, and 10% 'less performant' - but is very easy to use, requires minimal devops focus, and can enable feature iteration and easy scale - this is what most growing companies need.<p>Unless performance or cost are key attributes and differentiators of the product or service - then 'take the easy path'.
> First, I ran a small sample of 2000 requests to check the performance of new deploys. This was running at around 40 requests per second.<p>...<p>> When I ran my initial Fargate warmup, I got the following results. Around 10% of my requests were failing altogether!<p>....<p>> To remedy this, I decided to bump the specs on my deployments.<p>> The general goal for this bakeoff is to get a best-case outcome for each of these architectures, rather than an apples-to-apples comparison of cost vs performance.<p>> For Fargate, this meant deploying 50 instances of my container with pretty beefy settings — 8 GB of memory and 4 full CPU units per container instance.