Ask HN: What is your go to performance optimization?

31 点作者 perfsea10 个月前

I'm curious what backpocket optimizations you have from your past experiences which you pull out when a workload costs too much in the cloud. For me its.* disable lower c-states* enable adaptive coalesce in the network driver* mount volumes with noatime

23 条评论

koliber10 个月前

Reduce how often something runs and the amount of work it does when it does run. Sometimes, things are done too often, or process unnecessary data.I once had a project where a SAP system was crawling and literally causing company-wide stoppages. We found a job that literally ran every minute of every day and it processed a table that contained a few thousand tasks. This was something that could be done once per hour, and only during business hours. Furthermore, it was re-processing thousands of records each time it ran. In reality, after a record was processed once, it could be deleted from the table.We emptied out the table, and scheduled the job to run hourly. The whole company noticed an immediate improvement.This pattern happens a lot. Someone builds a polling system that hits the server once a second in order to see if a task finished. A cron job runs every 5 minutes. All data is processed in a daily job, instead of doing a 24 hour cutoff.The world is filled with computers doing useless work. Mostly, no one notices.

评论 #41069466 未加载

评论 #41070947 未加载

bunderbunder10 个月前

Push less data through wires.The memory hierarchy is so stark on modern hardware that the 30-year-old adage that "the fastest code is the code you don't run" is maybe less important than, "the fastest code is the code that doesn't spend much time talking to the memory controller."And it's even worse once we start talking about accessing memory that's on an entirely different computer. Serialization/deserialization, IPC, network calls, and all those other things we do with reckless abandon in modern service-oriented and distributed applications are just unbelievably expensive.Last year I took a slow heavily parallelized batch job and improved its throughput by 60% by getting rid of both scale-out and multithreading and just taking it all down to a single thread. Everyone expected it to be slower because we were using a small fraction as many CPU cores, but in truth it was faster because the time savings from having fewer memory fences, less data copying, and less network I/O was just that great. And then the performance gains kept coming because, having simplified things to that extent, I was then in a much better position to judiciously re-introduce parallelism in ways ways that weren't so wasteful.

评论 #41069885 未加载

评论 #41069673 未加载

评论 #41135648 未加载

jawns10 个月前

I've spent the last couple of years identifying and resolving N+1 problems in a Django codebase.<a href="https://planetscale.com/blog/what-is-n-1-query-problem-and-how-to-solve-it" rel="nofollow">https://planetscale.com/blog/what-is-n-1-query-problem-and-h...</a>Aside from the performance gains, it's very satisfying to go from 1,000+ inefficient DB queries to 1-2 optimized queries.

评论 #41069476 未加载

nevon10 个月前

I realize you're asking about performance optimizations, but since you put it in the context of a workload's cloud bill being too large, I'll chip in and say that by far the largest impact on cost I've seen over the years is to just rightsize the infrastructure the workloads are running on. What I see more than anything is applications that are reserving 10x more CPU or memory than they're actually using. In some cases this might mean amortizing resource usage over time, by asynchronously consuming some kind of queue, in cases where the extreme reservation of resources is due to some temporary usage spike (downstream client doing some batch processing, for example).

srcreigh10 个月前

Easy trick to making joins 50x faster: don't use Postgres and give your tables a primary key which groups related items together.A lot of people don’t know that a database index doesn’t order the actual rows on disk. It’s just a Btree of pointers.If you use clustered index for a table query pattern, the rows are actually ordered on disk.Most DBs load data in 8KiB chunks. So if you query 100 rows that are 100bytes, if they’re not sorted, you actually need to load nearly 1MiB of data even tho the query result is 10KiB.Speeds up joins and range queries 50x or more, less cache evictions, etc.You can do this in any database except for Postgres. Postgres doesn’t have the ability to keep rows sorted on disk.

评论 #41071449 未加载

评论 #41073607 未加载

mikequinlan10 个月前

The fastest code is code that isn't executed.

评论 #41069519 未加载

评论 #41069538 未加载

riskable10 个月前

I've seen memoization improve performance by enormous amounts. Even for simple functions that do a few simple calculations before returning a result.Another go-to of mine is to take conditionals out of loops that really only need to be checked once. For example:<pre><code> for foo in whatever: for bar in foo: if len(foo) > some_value: do_something(bar) </code></pre> Can become:<pre><code> for foo in whatever: if len(foo) > some_value: for bar in foo: do_something(bar) </code></pre> This example is trivial and wouldn't gain much but imagine if `len(foo)` was a more computationally expensive function. You'd only need to call it on each iteration of foo instead of every iteration of foo * bar.

评论 #41070603 未加载

mamcx10 个月前

Eliminate complicated stacks/frameworks (dockers, npm, reacts, multi-stores, clouds, etc) and use the simplest alternative available (solid, single `exes`, htmx+tailwind, only Postgres, normal hosting, etc).Improve the data(structure) first if possible.

bbstats10 个月前

@njit on any for loops w/ numpy that require a recursive calculation

评论 #41069434 未加载

perfsea10 个月前

also less logging. Seen this pop up way too many times

评论 #41069313 未加载

评论 #41069285 未加载

pllbnk10 个月前

On the back-end application level programming side:* Look for O(N^X), that is nested loops even when they are not necessarily expressed as loops on the language level.* If possible, get rid of ORMs in favor of raw SQL. Not because ORMs are very bad but because almost nobody bothers to learn them; they often start causing issues with any non-trivial amount of load.* Study data access patterns and figure out where and what composite indexes might help. I am saying composite indexes because I assume regular indexes are more or less always there, often even too many of them.Especially with the last one I have achieved impressive results without any kind of impressive effort, just setting aside some time to understand the code.

PigeonHolePncpl10 个月前

In my case, it was database optimizations that reduced the overall costs significantly* Instead of writing big complex queries that had nested SELECT's, I split them into smaller bite-sized chunks that could be cached * Better caching strategies - reducing how many caches were flushed when a change was made * Tweaking the index's on database tables to improve WHERE clauses * Storing intermediate calculations into the database (for example, the number of posts a user has could be stored on the user table instead of counting them each time)When I optimized the database, I could then reduce the size of the DB and the server as they no longer needed to work / wait as much

Enginerrrd10 个月前

This is a surprisingly uncommon technique, but:Think deeply about what you're making the computer do, and ask it to do less things by being smarter about what you ask it to do.I'd say 95% of the time most of your OOM gains will come from the above.

withinboredom10 个月前

- Remove locks (lock-free algorithms)- Delete as much code as possible

kriz910 个月前

If the workload does not benefit from cloud then I just run it locally because hardware is usually much cheaper and much faster than cloud.

signa1110 个月前

umm this is kinda tongue-in-cheek```#include <time.h>#include <stdio.h>int main(int argc, char argv[]) { int i = 0; time_t timep;<pre><code> /* * ok so now we are printing something **/ printf("Greetings!\n"); /* * this is a for loop from 0..9 **/ for(i=0; i<10; i++) { time(&timep); localtime(&timep); } printf("Godspeed, dear friend!\n"); return 0;</code></pre> }```now, the canonical```$> gcc tz-test.c -o obj/tz-test```now do this```$> unset TZ$> strace -ff ./obj/tz-test 2>&1 | grep 'local' | wc<pre><code> 10 77 851 </code></pre> $> export TZ=:/etc/localtime$> strace -ff ./obj/tz-test 2>&1 | grep 'local' | wc<pre><code> 1 5 59 </code></pre> ```moral, always set TZ to avoid localtime(3) from stat'ing /etc/localtime :o)

评论 #41069424 未加载

drewcoo10 个月前

Assuming you gate things like merges on test results . . .Remove end-to-end tests. Replace with contract tests and service-level functional tests.Much faster feedback! At the same time, better coverage. The only serious problem with the approach is that it upsets the magical thinkers in your org. Often those folks are managers.

BWStearns10 个月前

Making sane DB indices and constraints. It's amazing how often people just don't add indices even when the access pattern is clear from the outset. "Premature optimization is the root of all evil!" ok, so when are we actually going to add that index? (Answer: never)

jiehong10 个月前

Usually, just checking for access patterns and data structure (like not using a set where it should be).Also, avoiding code that has pointers to many small pieces everywhere in memory and lead to a bad cache misses score.Finally, just good old profiling.

awaythrow99910 个月前

On 32bit it remains -fomit-frame-pointer for me. On native compiles -march=native

评论 #41069447 未加载

joshka10 个月前

Meta rules that are often ignored:1. Establish what is good enough2. Measure, don't guess3. Fix the biggest bottleneck first4. Measure after fixingAnd some general things:5. Avoid micro-benchmarks (i.e. things not at the entire system level)6. Be careful with synthetic data7. Know your general estimates (e.g. cache, memory, disk, network speeds)

bkgh10 个月前

Profiling (e.g: Pyroscope) for better understanding (+load test). More performant libraries with same interface. DB optimizations (e.g: Indexing, Denormalization, Tuning, Connection Pooling)

anonymoushn10 个月前

Use explicit huge pages.

评论 #41069865 未加载