Making a Go program faster with a one-character change

511 点作者 hcm超过 2 年前

28 条评论

ludiludi超过 2 年前

> If you read the title and thought “well, you were probably just doing something silly beforehand”, you’re right!Don't feel too silly. Russ Cox, one of the technical leads on the Go language, made the same mistake in the regexp package of the standard library.<a href="https://go-review.googlesource.com/c/go/+/355789" rel="nofollow">https://go-review.googlesource.com/c/go/+/355789</a>

sakras超过 2 年前

A while ago at my company we switched from GCC to Clang, and noticed a couple of massive regressions (on the order of 50%?) in performance having to do with floating point.After profiling for a bit, I discovered that suddenly a lot of time was spent in isinf on Clang and no time in GCC… Clang was emitting a function call where GCC wasn’t. I happened to randomly change isinf to std::isinf (it’s a random habit of mine to put std:: in front of these C functions). Suddenly the regression disappeared! I guess on Clang only std::isinf was a compiler intrinsic while GCC recognized both? Anyway, that’s my small-change optimization story.

评论 #33599524 未加载

评论 #33608160 未加载

asim超过 2 年前

If you want to have a solid understanding and need to do it in just a few hours here's a few things to review.- The Go programming language spec <a href="https://go.dev/ref/spec" rel="nofollow">https://go.dev/ref/spec</a>- Effective Go <a href="https://go.dev/doc/effective_go" rel="nofollow">https://go.dev/doc/effective_go</a>- Advanced Go concurrency patterns <a href="https://go.dev/talks/2013/advconc.slide#1" rel="nofollow">https://go.dev/talks/2013/advconc.slide#1</a>- Plus many more talks/slides <a href="https://go.dev/talks/" rel="nofollow">https://go.dev/talks/</a>

评论 #33604275 未加载

评论 #33604783 未加载

karmakaze超过 2 年前

> I did consider two other approaches:<pre><code> Changing Ruleset from being []Rule to []*Rule, which would mean we no longer need to explicitly take a reference to the rule. Returning a Rule rather than a *Rule. This would still copy the Rule, but it should stay on the stack instead of moving to the heap. </code></pre> > However, both of these would have resulted in a breaking change as this method is part of the public API.The problem with heap allocated objects could be due to the incorrect public API.The change that improves performance also gives out pointers to the actual elements of Ruleset itself permitting the caller to change the contents of Ruleset which wasn't possible before the speed-up. Perhaps you're already aware since change to []*Rule was being considered.

评论 #33604407 未加载

评论 #33600872 未加载

coder543超过 2 年前

There is potentially another option: use the midstack inliner to move the allocation from the heap to the stack of the calling function: <a href="https://words.filippo.io/efficient-go-apis-with-the-inliner/" rel="nofollow">https://words.filippo.io/efficient-go-apis-with-the-inliner/</a>As long as the global slice is never mutated, the current approach is probably fine, but it is definitely a semantic change to the code.

评论 #33600036 未加载

评论 #33600381 未加载

评论 #33597782 未加载

tuetuopay超过 2 年前

Aaaaaand that's why I love Rust's decision to make copies explicit with `.clone()`. Annoying as hell when you're not used to it but overall worth it.

评论 #33609047 未加载

assbuttbuttass超过 2 年前

Returning a pointer to a local variable is convenient, but can be a source of hidden allocations.It's best to treat each struct as a "value" or "pointer" type, and use one or the other consistently for each type. This mostly avoids the need to use & in the first place

cbsmith超过 2 年前

As an old C/C++ programmer, I'm always surprised by how often software developers are surprised by the performance costs of inopportune value semantics (C and C++ even more so, punishes you severely for using value semantics when you shouldn't). I increasingly see the wisdom of languages with implicit reference semantics.It's not that value semantics can't be better (they most assuredly can be), or that reference semantics don't cause their own complexity problems, but rather that so often we thoughtlessly imply/impose value semantics through interfaces in ways that negatively impact performance; getting interfaces wrong is a much tougher bell to unring.The vast majority of my mental energy when I define an interface in C++ is carefully thinking through a combination of ownership contracts and value vs. reference semantics that I can mostly ignore in languages with implicit reference semantics. While occasionally ignoring those contracts while developing in Java/Python/whatever comes back to bite me, the problem isn't nearly as common or problematic as when I unintentionally impose value semantics in a language that allows me to.

评论 #33599116 未加载

评论 #33597135 未加载

评论 #33599114 未加载

评论 #33600874 未加载

评论 #33600793 未加载

评论 #33598062 未加载

评论 #33598844 未加载

评论 #33598700 未加载

评论 #33599020 未加载

评论 #33598571 未加载

评论 #33597961 未加载

评论 #33602979 未加载

amluto超过 2 年前

Somewhat off topic, but I find a different part of this to be quite ugly:<pre><code> if match || err != nil { return rule, err } </code></pre> Translating this code to actual logic takes too much thought and is too fragile. Is that an error path or a success path? It’s both! The logic is “if we found a rule or if there was an error then return a tuple that hopefully indicates the outcome”. If any further code were to be added in this block, it would have to be validated for the success and the error case.But this only makes any sense at all if one is okay with reading Go result returns in their full generality. A canonical Go function returns either Success(value) or Error(err not nil, meaningless auxiliary value). And this code has “meaningless auxiliary value” != nil! In fact, it’s a pointer that likely escapes further into unrelated error handling code and thus complicates and kind of lifetime or escape analysis.I don’t use Go, but if I did, I think this part of the language would be my biggest peeve. Go has very little explicit error handling; fine, that’s a reasonable design decision. But Go’s error handing is incorrectly typed, and that is IMO not a reasonable design.

评论 #33597348 未加载

评论 #33599680 未加载

评论 #33597664 未加载

评论 #33602960 未加载

enedil超过 2 年前

Went from 4.139s to 2.413s. I fail to see how it is 70%. I think it is explained as 4.139/2.413 = 1.7 which of course doesn't make sense here.

评论 #33601041 未加载

评论 #33596238 未加载

评论 #33596275 未加载

评论 #33596383 未加载

评论 #33601411 未加载

gp超过 2 年前

I was trying to debug and improve the performance of some parallelized C++ code over the weekend for parsing CSV files. What would happen was parsing each file (~24k lines, 8 columns) would take 100ms with one execution context, but when split across many threads, the execution time of each thread would slow down proportionally and the throughput of the whole program would strictly decrease as thread count increased.I tried all of the obvious things, but the offender ended up being a call to allocate and fill a `struct tm` object from a string representation of a date. This doesn't have any obvious reasons (to me) that it would cause cache invalidation, etc, so I'm a little in the dark.Still, replacing this four line block improved single threaded performance by 5x, and fixed the threaded behavior, so on the whole it is now ~70x faster and parses about 400mb of csv per second.

评论 #33605715 未加载

评论 #33597777 未加载

lanstin超过 2 年前

That seems like a potential for compiler optimization. It should already know that the rule value is only used one time, as the target of a & and this must be somewhat common in managing return values.

评论 #33596422 未加载

评论 #33596412 未加载

评论 #33596392 未加载

评论 #33596204 未加载

评论 #33609936 未加载

jimsmart超过 2 年前

From the headline alone, I guessed this was to do with pointers/references to values vs values themselves.Yep, with values that take a lot of memory, it's faster to pass pointers/references around than it is to pass the values around, because it is less bytes to copy.Of course there is more to such a decision than just performance, because if the code makes changes to the value which are not meant to be persisted, then one wants to be working with a copy of the value, not a pointer to the value. So one should take care if simply switching some code from values to pointers-to-values.All of these things are things that coders with more experience of languages that use such semantics kinda know already, almost as second nature, since the first day they got caught out by them. But everyone is learning, to various degrees, and we all have to start somewhere (i.e. knowing little to nothing).

hoosieree超过 2 年前

> You can see these decisions being made by passing -gcflags=-m to go build:That's a very nice feature! I wonder if compilers for other languages have something similar.

评论 #33596840 未加载

评论 #33597598 未加载

评论 #33601651 未加载

Beltalowda超过 2 年前

The deeper lesson here is "don't use pointers unless you're sure you need them". I've seen quite a few people use pointers for no reason in particular, or there's simply the assumption it's faster (and have done this myself, too), but it puts a lot more pressure on the GC than simple local stack variables.Of course sometimes pointers are faster, or much more convenient. But as a rule of thumb: don't use pointers unless you've got a specific reason for them. This applies even more so if you're creating a lot of pointers (like in a loop, or a function that gets called very frequently).

评论 #33601554 未加载

评论 #33601261 未加载

评论 #33600891 未加载

评论 #33602369 未加载

chubot超过 2 年前

FWIW, to prevent the bug where a = b is slow for big types, Google's C++ style guide used to mandate DISALLOW_COPY_AND_ASSIGN (which used to be DISALLOW_EVIL_CONSTRUCTORS I think) on all types (most types?)Looks like that's been gone for awhile in favor of C++ 11 stuff, which I don't really like:<a href="https://google.github.io/styleguide/cppguide.html#Copyable_Movable_Types" rel="nofollow">https://google.github.io/styleguide/cppguide.html#Copyable_M...</a>A lot of good software was written in that style, but it has grown bureaucratic over time, and as the C++ language evolved

sendfoods超过 2 年前

1 character, in 2 places ;) I did not know profiling support for go was so seamless, thank you!May I ask, is that theme custom or available somewhere? I really enjoyed it

评论 #33596270 未加载

评论 #33596752 未加载

评论 #33596429 未加载

is_taken超过 2 年前

Would be interesting to see the performance difference if you undo that move-&-change and change the function signature from:<pre><code> func (r Ruleset) Match(path string) (*Rule, error) </code></pre> to:<pre><code> func (r *Ruleset) Match(path string) (*Rule, error)</code></pre>

评论 #33600971 未加载

lxe超过 2 年前

This is the kind of stuff that the compiler needs to really understand. If all this de-referencing and referencing magic is at the control of the user, it needs to have meaningful effect on what the code does. Otherwise we might as well just write C.

评论 #33600693 未加载

tonymet超过 2 年前

Overall good review of profiling tactics . But there’s nothing egregious about Golang here . Pass by value vs reference is a common performance issue.

评论 #33601070 未加载

infamousclyde超过 2 年前

Thank you for sharing. I'm curious if you would recommend any good resources for profiling with Go. I enjoyed your code snippets and methodology.

stephen123超过 2 年前

Great post. I always feel smart when I find these kind of optimisations. Then I wonder why the compiler isnt smarter, I dont have to be.

erdaniels超过 2 年前

Is there any nice tooling / static analysis for golang that instruments the builds the process to add all the gcflags with verbose output and give you hints as to what can be optimized?

renewiltord超过 2 年前

Clear tutorial of how to go about identifying this. Good blog post. Since the problem was constrained and real, it helps someone know when to use these tools. Thank you for sharing.

cratermoon超过 2 年前

This has made me go back to look at all the Go I've written recently and look at the & uses.

amtamt超过 2 年前

It falls in those 3% of code lines one should think of while not optimizing prematurely.

notpushkin超过 2 年前

Well, technically it's either a 2-character or 0-character change! :-)

AtNightWeCode超过 2 年前

So, this is very basic Go design and you could write something about how it works in C and Go and why a older lang like C don't have this prob but then at the end of the day the Go fanclub will down vote the hell out you no matter what.

评论 #33600220 未加载