I'm curious what backpocket optimizations you have from your past experiences which you pull out when a workload costs too much in the cloud. For me its.<p>* disable lower c-states<p>* enable adaptive coalesce in the network driver<p>* mount volumes with noatime
Reduce how often something runs and the amount of work it does when it does run. Sometimes, things are done too often, or process unnecessary data.<p>I once had a project where a SAP system was crawling and literally causing company-wide stoppages. We found a job that literally ran every minute of every day and it processed a table that contained a few thousand tasks. This was something that could be done once per hour, and only during business hours. Furthermore, it was re-processing thousands of records each time it ran. In reality, after a record was processed once, it could be deleted from the table.<p>We emptied out the table, and scheduled the job to run hourly. The whole company noticed an immediate improvement.<p>This pattern happens a lot. Someone builds a polling system that hits the server once a second in order to see if a task finished. A cron job runs every 5 minutes. All data is processed in a daily job, instead of doing a 24 hour cutoff.<p>The world is filled with computers doing useless work. Mostly, no one notices.
Push less data through wires.<p>The memory hierarchy is so stark on modern hardware that the 30-year-old adage that "the fastest code is the code you don't run" is maybe less important than, "the fastest code is the code that doesn't spend much time talking to the memory controller."<p>And it's even worse once we start talking about accessing memory that's on an entirely different computer. Serialization/deserialization, IPC, network calls, and all those other things we do with reckless abandon in modern service-oriented and distributed applications are just unbelievably expensive.<p>Last year I took a slow heavily parallelized batch job and improved its throughput by 60% by getting rid of both scale-out and multithreading and just taking it all down to a single thread. Everyone expected it to be slower because we were using a small fraction as many CPU cores, but in truth it was faster because the time savings from having fewer memory fences, less data copying, and less network I/O was just that great. And then the performance gains kept coming because, having simplified things to that extent, I was then in a much better position to judiciously re-introduce parallelism in ways ways that weren't so wasteful.
I've spent the last couple of years identifying and resolving N+1 problems in a Django codebase.<p><a href="https://planetscale.com/blog/what-is-n-1-query-problem-and-how-to-solve-it" rel="nofollow">https://planetscale.com/blog/what-is-n-1-query-problem-and-h...</a><p>Aside from the performance gains, it's very satisfying to go from 1,000+ inefficient DB queries to 1-2 optimized queries.
I realize you're asking about performance optimizations, but since you put it in the context of a workload's cloud bill being too large, I'll chip in and say that by far the largest impact on cost I've seen over the years is to just rightsize the infrastructure the workloads are running on. What I see more than anything is applications that are reserving 10x more CPU or memory than they're actually using. In some cases this might mean amortizing resource usage over time, by asynchronously consuming some kind of queue, in cases where the extreme reservation of resources is due to some temporary usage spike (downstream client doing some batch processing, for example).
Easy trick to making joins 50x faster: don't use Postgres and give your tables a primary key which groups related items together.<p>A lot of people don’t know that a database index doesn’t order the actual rows on disk. It’s just a Btree of pointers.<p>If you use clustered index for a table query pattern, the rows are actually ordered on disk.<p>Most DBs load data in 8KiB chunks. So if you query 100 rows that are 100bytes, if they’re not sorted, you actually need to load nearly 1MiB of data even tho the query result is 10KiB.<p>Speeds up joins and range queries 50x or more, less cache evictions, etc.<p>You can do this in any database except for Postgres. Postgres doesn’t have the ability to keep rows sorted on disk.
I've seen memoization improve performance by enormous amounts. Even for simple functions that do a few simple calculations before returning a result.<p>Another go-to of mine is to take conditionals out of loops that really only need to be checked once. For example:<p><pre><code> for foo in whatever:
for bar in foo:
if len(foo) > some_value:
do_something(bar)
</code></pre>
Can become:<p><pre><code> for foo in whatever:
if len(foo) > some_value:
for bar in foo:
do_something(bar)
</code></pre>
This example is trivial and wouldn't gain much but imagine if `len(foo)` was a more computationally expensive function. You'd only need to call it on each iteration of foo instead of every iteration of foo * bar.
Eliminate complicated stacks/frameworks (dockers, npm, reacts, multi-stores, clouds, etc) and use the simplest alternative available (solid, single `exes`, htmx+tailwind, only Postgres, normal hosting, etc).<p>Improve the data(structure) first if possible.
On the back-end application level programming side:<p>* Look for O(N^X), that is nested loops even when they are not necessarily expressed as loops on the language level.<p>* If possible, get rid of ORMs in favor of raw SQL. Not because ORMs are very bad but because almost nobody bothers to learn them; they often start causing issues with any non-trivial amount of load.<p>* Study data access patterns and figure out where and what composite indexes might help. I am saying composite indexes because I assume regular indexes are more or less always there, often even too many of them.<p>Especially with the last one I have achieved impressive results without any kind of impressive effort, just setting aside some time to understand the code.
In my case, it was database optimizations that reduced the overall costs significantly<p>* Instead of writing big complex queries that had nested SELECT's, I split them into smaller bite-sized chunks that could be cached
* Better caching strategies - reducing how many caches were flushed when a change was made
* Tweaking the index's on database tables to improve WHERE clauses
* Storing intermediate calculations into the database (for example, the number of posts a user has could be stored on the user table instead of counting them each time)<p>When I optimized the database, I could then reduce the size of the DB and the server as they no longer needed to work / wait as much
This is a surprisingly uncommon technique, but:<p>Think deeply about what you're making the computer do, and ask it to do less things by being smarter about what you ask it to do.<p>I'd say 95% of the time most of your OOM gains will come from the above.
Assuming you gate things like merges on test results . . .<p>Remove end-to-end tests. Replace with contract tests and service-level functional tests.<p>Much faster feedback! At the same time, better coverage. The only serious problem with the approach is that it upsets the magical thinkers in your org. Often those folks are managers.
Making sane DB indices and constraints. It's amazing how often people just don't add indices even when the access pattern is clear from the outset. "Premature optimization is the root of all evil!" ok, so when are we actually going to add that index? (Answer: never)
Usually, just checking for access patterns and data structure (like not using a set where it should be).<p>Also, avoiding code that has pointers to many small pieces everywhere in memory and lead to a bad cache misses score.<p>Finally, just good old profiling.
Meta rules that are often ignored:<p>1. Establish what is good enough<p>2. Measure, don't guess<p>3. Fix the biggest bottleneck first<p>4. Measure after fixing<p>And some general things:<p>5. Avoid micro-benchmarks (i.e. things not at the entire system level)<p>6. Be careful with synthetic data<p>7. Know your general estimates (e.g. cache, memory, disk, network speeds)
Profiling (e.g: Pyroscope) for better understanding (+load test).
More performant libraries with same interface.
DB optimizations (e.g: Indexing, Denormalization, Tuning, Connection Pooling)