My experience doing perf optimizations in real world systems with many many people writing code to the same app is a lot of inefficiencies happen due to over fetching data, inefficiencies caused by naively using the ORM without understanding the underlying cost of the query, and lack of actual profiling to find where the actual bottlenecks are (usually people writing dumb code without realizing it's expensive).<p>Sure, the framework matters at very large scale and the benefits from optimizing the framework become large when you're doing millions of requests a second over many thousands of servers because it can help reduce baseline cost of running the service.<p>But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.
A humble request to folks making benchmark or other graphs - please understand that thin coloured lines are not easy to visually parse .. even for folks like me who aren't totally colour blind but have partial red-green colour blindness. At least, the lines can be made thicker so it is easier to make out the colours. Even better, label the lines with an arrow and what they represent.
Related to ORMs/queries/performance, I have found the following combination really good:<p>* aiosql[0] to write raw SQL queries and having them available as python functions (discussed in [1])<p>* asyncpg[2] if you are using Postgres<p>* Map asyncpg/aiosql results to Pydantic[3] models<p>* FastAPI[4]<p>Pydantic models become the "source of truth" inside the app, they are designed as a copy of the DB schema, then functions receive and return Pydantic models in most cases.<p>This stack also makes me think better about my queries and the DB design. I try to make sure each endpoint makes only a couple of queries. Each query may have multiple CTEs, but it's still only a single round-trip. That also makes you think about what to prefetch or not, maybe I want to also get the data to return if the request is OK and avoid another query.<p>[0] <a href="https://github.com/nackjicholson/aiosql" rel="nofollow">https://github.com/nackjicholson/aiosql</a>
[1] <a href="https://news.ycombinator.com/item?id=24130712" rel="nofollow">https://news.ycombinator.com/item?id=24130712</a>
[2] <a href="https://github.com/MagicStack/asyncpg" rel="nofollow">https://github.com/MagicStack/asyncpg</a>
[3] <a href="https://pydantic-docs.helpmanual.io/" rel="nofollow">https://pydantic-docs.helpmanual.io/</a>
[4] <a href="https://fastapi.tiangolo.com/" rel="nofollow">https://fastapi.tiangolo.com/</a>
Don't forget that you're paying a huge price using the sqlalchemy orm - <a href="https://docs.sqlalchemy.org/en/13/faq/performance.html" rel="nofollow">https://docs.sqlalchemy.org/en/13/faq/performance.html</a><p>If I know an endpoint is going to be hit hard, I forgo trying to use the ORM (except to maybe get the table name from the model obj so some soul can trace it's usage here in the future) and directly do an engine.execute(<raw query>). Makes a huge difference. Next optimization I do is create stored procedures on the database. Only then I start thinking about changing the framework itself.<p>For folks like me who want to get prototypes off the ground in hours, flask and fastapi are godsend, and if that means I have to worry about serving thousands of requests a second soon thats a happy problem for sure.
Use of ORMs is often a performance choke point. Raw DB queries are often much, much faster.
Almost always, the more you abstract, the worse you perform. It's great as a developer but not so great as a user.
Good article, but I can't help but notice a gaping hole in the benchmark -- why was there no attempt to run gunicorn in multi-threaded mode?<p>The article has a link to <a href="https://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/" rel="nofollow">https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a...</a>, but failed to mention the key takeaway from the article:<p>> threaded code got the job done much faster than asyncio in every case
In my benchmark testing, SSL appears to be the bottleneck; e.g., Apache vs. Nginx does not really matter. I assume the benchmarks above 10,000 RPS are not using SSL and regular HTTP? How are people doing benchmarks at 10k-100k RPS?
As a Django shop, we’ve always hoped PyPy would one day be suitable for our production deployments but in the end with various issues we were never able to make the switch.<p>And then Pyston was re-released...and changed everything. It was drop in compatible for us and we saw a 50% drop in latencies.<p>Source availability aside, I suggest anyone running CPython in prod take a look.
When you start hitting bottlenecks in your python web framework its probably time to switch to a faster language, not another framework in python.<p>You're probably done rapid prototyping by this point anyway.
Why Python at all? About 10 years ago I liked Python a lot (and still like it in principle) and felt very productive compared to, say, Java. Java was full of inconvenience, XML, bloated frameworks and all that. But today you can use Kotlin, that is in my opinion even nicer than Python, with performant frameworks (e. g. Quarkus or Ktor) on the super fast JVM.<p>I don't want to start a language war, but maybe Python is not the first choice for their requirements.
Might as well refer to TechEmpower benchmarks.<p><a href="https://www.techempower.com/benchmarks/" rel="nofollow">https://www.techempower.com/benchmarks/</a>
We did an evaluation for our API. The API accepts an image upload, passes it onto the backend for processing and returns a ~2k json lump in return.<p>Long story short, fastapi was much much faster than anything else for us. It also felt a bit like flask. The integration with pydantic for validating dataclasses on the fly was also great.
I would question choosing Python for large server projects because the performance ceiling is so low. At least with the "middle tier" performance languages such as Java / C# you are unlikely to require a complete language switch as the project scales.
I inherited a flask queue worker, and it suffers from some major problems (like 12 req/second when it's not discarding items from the queue). I am primarily a javascript programmer so I'm a little bit out of my element.<p>I am tempted to refactor the worker to use async features, and that would require factoring out uWSGI, which is fine, I only added it last week. The article states that Vibora is a drop in replacement for flask, but I guess I'm a bit skeptical, as I can't find much information outside of Vibora having a similar api. For a web service with basically one endpoint, I could refactor to another implementation fairly easily, I'm just looking for the right direction.<p>I thought maybe I should refactor the arch to either batch requests to the worker, or to use async. Anyone have a feeling where I should go? I am just getting started researching this, but any advice would be appreciated.<p>Edit: at least quart has a migration page.. probably will just try it out, what can I lose? <a href="https://pgjones.gitlab.io/quart/how_to_guides/flask_migration.html" rel="nofollow">https://pgjones.gitlab.io/quart/how_to_guides/flask_migratio...</a><p>Second edit: Also might try out polyrand's stack in the comments.
The fact that you are using offset of 50000 and complaining it slows everything down says a lot about the benchmarks. Top it all with ORM query with prefetch all, GIL, and shared CPU (I am guessing) that you used to run benchmark on. You see where this is headed?
I have great experiences with Falcon for backend REST APIs, and it is supposed to be great in terms of requests per second.<p>How does it compare to Sanic?
C#/ASP.NET is the fastest web framework now:<p><a href="https://www.techempower.com/benchmarks/#section=test&runid=8ca46892-e46c-4088-9443-05722ad6f7fb&hw=ph&test=plaintext" rel="nofollow">https://www.techempower.com/benchmarks/#section=test&runid=8...</a><p>7.000.000 requests per second<p>Even GO can only achieve 4.500.000 million requests per secnod being a low-level language, in opposite to high-level C#.
The important thing to remember is that unless you're running a massive service, <i>requests per second</i> is less important than <i>seconds per request</i>.<p>Getting an API hit from 300ms to 70ms, and proper frontend caching is far more valuable than concurrency (if you can afford to throw servers at it) because it actually affects user performance.
Since I've been a developer there have been two changes that I feel have given major performance improvements and made backend framework improvements much less significant (atleast in the apps I develop): CDNs and client side rendering (that means the more, smaller requests for data which are more suited to be served via a CDN)<p>Using (for example) AWS Cloudfront was a gamechanger in how I design webapps and view performance. Being able to 'slice and dice' what requests get SSL terminated at the CDN, cached fairly locally, served from an Amazon managed webserver, or sent to our app server, increased our performance 10 fold.<p>That approach isn't always practical, but I find that it's now much easier to choose the backend for developer performance and doubling the server CPU/memory is quicker and cheaper when needed.
Not that it matters any more, but a colleague mentions Flask was originally a joke of what not to do:<p><a href="https://lucumr.pocoo.org/2010/4/3/april-1st-post-mortem/" rel="nofollow">https://lucumr.pocoo.org/2010/4/3/april-1st-post-mortem/</a><p>Flask author reflects on that here:<p><a href="http://mitsuhiko.pocoo.org/flask-pycon-2011.pdf" rel="nofollow">http://mitsuhiko.pocoo.org/flask-pycon-2011.pdf</a><p>Quite relevant to the conclusion in the article.
And here I was living under the assumption that psycopg2 was the only option, and probably the biggest reason I was not using pypy. Gotta take a look at pg8000.<p>In general, I've always liked the idea of pypy, so I'll try to use it more, and not just for performance. Will also donate when I can.
I always assumed Python could scale because of Reddit: <a href="https://github.com/reddit-archive/reddit" rel="nofollow">https://github.com/reddit-archive/reddit</a><p>Not quite sure if their current site's code is opensource... anyone know?
TLDR - pypy is awesome. Dont use frameworks. Use pypy.<p>Please donate. Pypy needs funds - <a href="https://opencollective.com/pypy" rel="nofollow">https://opencollective.com/pypy</a><p>Pypy doesnt get a fraction of the funding that python does.
This benchmark was run on a laptop, which has a very small number of cores compared to the servers that usually run such apps. The author doesn’t mention any attempt to tweak the number of workers, which would make sense in this case. Given that they did notice at some point that CPU usage is lower than expected, I am surprised that they did not try it.
Why is it so? I've got 100K requests per sec with PHP easily [1]<p>[1] <a href="https://github.com/gotzmann/comet" rel="nofollow">https://github.com/gotzmann/comet</a>
The TL;DR you should be looking for:<p>> all of this emphasises the fact that unless you have some super-niche use-case in mind, it's actually a better idea to choose your framework based upon ergonomics and features, rather than speed.
I think it’s been bog standard practice to run flask via uwsgi or gunicorn with async workers and use multiple process based workers per deployed server unit (eg per pod in Kubernetes).<p>What matters is that the cumulative latency & throughput solve your problem, not how fast you can make one singular async worker thread.<p>I figure most people running complex web services in production would just do an eye roll at this post. Nobody's going to switch to PyPy for any of this.<p>My team at work runs several complex ML workloads, and we use the exact same container pattern for every service running gunicorn to spawn X async workers per pod and then scale pods per service to meet throughput requirements. Sometimes we also just post complex image processing workloads to a queue and batch them to GPU processor workers. In all these use cases, super low effort “just toss it in gunicorn running flask” has worked without issue for services supporting up to peak load of thousands to hundreds of thousands of requests per second.
It's a bit of a step back in time reading things like this.<p>This is stateless HTTP requests hitting a relational database. How is this dead horse still being beaten? The patterns for load balancing, horizontal scalability, caching in this space well documented.<p>What are we gaining still profiling Django, Flask and Ruby on Rails in 2021.