Error stack traces in Go with x/xerror

100 pointsby sicromoftover 3 years ago

11 comments

mappuover 3 years ago

At $DAYJOB we had a Go dependency on a package (maybe pkg/sftp?) that used github.com/pkg/errors to capture a stack trace with the error. These errors were ultimately used in a loop for flow control (maybe testing a lot of files/servers?) where collecting all the stack traces caused a lot of slowdown, heap garbage, and GC pressure.I used go mod replace to strip the stack trace collection out of pkg/errors, with a fork that does a no-op for that function call, and it was a significant improvement for our use case.

评论 #28287206 未加载

评论 #28288959 未加载

评论 #28288645 未加载

justincliftover 3 years ago

Hmmm...<pre><code> and (2) a way to cut down on if err != nil { ... } boilerplate that pervades all Go code. </code></pre> Am I weird in liking the explicit error handling? :/

评论 #28284220 未加载

评论 #28283630 未加载

评论 #28284682 未加载

评论 #28284820 未加载

评论 #28283652 未加载

评论 #28286231 未加载

评论 #28284758 未加载

评论 #28283839 未加载

评论 #28285007 未加载

评论 #28287864 未加载

评论 #28283552 未加载

评论 #28283428 未加载

评论 #28288568 未加载

评论 #28283798 未加载

评论 #28283852 未加载

评论 #28283628 未加载

ThePhysicistover 3 years ago

I think you can just use the `Callers()` function from the "runtime" package to the get the call stack, though in general I think it's a bit overkill. I think what you should do instead is to wrap errors with '%w' [1], manually adding context to them as you pass them up your call stack. That leads to more readable code and won't incur any performance overhead. It's tempting to have the call stack available for debugging, but IMHO it creates too much overhead when doing it indiscriminately for all errors you generate, which for me also goes a bit against the "spirit" of Golang.[1]: <a href="https://go.dev/blog/go1.13-errors" rel="nofollow">https://go.dev/blog/go1.13-errors</a>

评论 #28286719 未加载

siasiaover 3 years ago

<a href="https://github.com/cockroachdb/errors" rel="nofollow">https://github.com/cockroachdb/errors</a>

alecthomasover 3 years ago

One extra tip when wrapping error return values is to use the wrapcheck (<a href="https://golangci-lint.run/usage/linters/#wrapcheck" rel="nofollow">https://golangci-lint.run/usage/linters/#wrapcheck</a>) linter. This will tell you when you're returning an error without wrapping it.

ollienover 3 years ago

Like the article mentions, they didn't bring over stack traces (namely the `Formatter` interface) from xerrors. I wrote a library[1] around it that would generate true stack traces. I don't use it as much as I used to, because I don't want to depend on a package like xerrors I don't trust to remain maintained, but it was a fun exercise at the time, and very useful while I used it. I wish that we wouldn't have to depend on a tool like Sentry for bringing this about, like the author suggests.[1] <a href="https://github.com/ollien/xtrace" rel="nofollow">https://github.com/ollien/xtrace</a>

评论 #28283724 未加载

jrockwayover 3 years ago

People seem to fixate on stack traces, because other languages present nearly every error in stack trace form. I think you should think about why you want them and make sure you have a good reason before mindlessly adding them. I do collect stack traces in some Go code, because Sentry requires it for categorization, but in general, you can do a much better job yourself, with very little sorcery involved.A common problem is that when multiple producers produce failing work items and send them to a consumer -- a stack trace will just show "panic -> consumer.doWork() -> created by consumer.startWork()". Gee, thanks. You need to track the source, so that you have an actionable error message. If the consumer is broken, fine, you maybe have enough information. If a producer is producing invalid work items, you won't have enough information to find which one. You'll want that.The idea of an error object is for the code to make a decision about how to handle that error, and if it fails, escalate it to a human for analysis. The application should be able to distinguish between classes of failures, and the human should be able to understand the state of the program that caused the failure, so they can immediately begin fixing the failure. It's up to you to capture that state, and make sure that you consistently capture the state.Rather than leaving it to chance, I have an opinionated procedure:1) Every error should be wrapped. This is where all the context for the operator of your software comes from, and you have to do it every time to capture the state of the application at the time of the error.2) The error need not say "error" or "failure" or "problem". It's an error, you know it failed. As an example, prefer "upgrade foos: %w" over "problem upgrading foos: %w". (The reason is that in a long chain, if everyone does this, it's just redundant: "problem frobbing baz: problem fooing bars: problem quuxing glork: i/o timeout". Compare that to "frob baz: foo bars: quux glork: i/o timeout".)But if you're logging an error, I pretty much always put some sort of error-sounding words in there. Makes it clear to operators that may not be as zen about failures as you that this is the line that identifies something not working. "2021-08-23T20:45:00.123 PANIC problem connecting to database postgres://1.2.3.4/: no route to host". I'm open to an argument that if you're logging at level >= WARNING that the reader knows it's a problem, though. (I also tend to phrase them as "problem x-ing y" instead of "error x-ing y" or "x-ing y failed". Not going to prescribe that to others though, use the wording that you like, or that you think causes the right level of panic.)3) Error wrapping shouldn't duplicate any information that the caller already has. The caller knows the arguments passed to the function, and the name of the function. If its error wrapping needs those things to produce an actionable error message, it will add them. It doesn't know what sub-function failed, and it doesn't know what internally-generated state there is, and those are going to be the interesting parts for the person debugging the problem. So if you're incrementing a counter, you might do a transaction, inside of which is a read and a write -- return "commit txn: %w", "rollback txn: %w", "read: %w", "write: %w", etc. The caller can't know which part failed, or that you decided to commit vs. roll back, but it does know the record ID, that the function is "update view count", etc.The standard library violates this rule, probably because people "return err" instead of wrapping the error, and this gives them a shred of hope. And, it's my made-up rule, not the Go team's actual rule! Investigate those cases and don't add redundant information. (For example, os.ReadFile will have the filename in the error message, because it returns an fs.PathError, which contains that. net.Dial is another culprit. Make a list of these and break the rule in these cases.)4) Any error that the program is going to handle programmatically should have a sentinel (`var ErrFoo = errors.New("foo")`), so that you can unambiguously handle the error correctly. (People seem to handle io.EOF quite well; emulate that.)You can describe special cases of your error by wrapping it before returning it, `fmt.Errorf("bar the quux: %w", ErrFoo)`.Finally, since I talked about logging, please talk about things that DID work in your logs. Your logs are the primary user interface for operators of your software, but often the most neglected interface point. When you see something like "problem connecting to foo\nproblem connecting to foo", you're going to think there's a problem connecting to foo. But if you write "problem connecting to foo (attempt 1/3)\nproblem connecting to foo (attempt 2/3)\nconnected to foo", then the operator knows not to investigate that. It worked, and the program expected it to take 3 attempts. Perfect. (Generally, for any long-running operation a log "starting XXX" and "finished XXX" are great. That way, you can start looking for missing "finished" messages, rather than relying on self-reported errors.)(And, outside of HN comments, I would make that a structured log, so that someone can easily select(.attempt == .max_attempts) and things like that. It's ugly if you just read the output, but great if you have a tool to pretty-print the logs: <a href="https://github.com/jrockway/json-logs/releases/tag/v0.0.3" rel="nofollow">https://github.com/jrockway/json-logs/releases/tag/v0.0.3</a>)Anyway, I guess where this rant goes is -- errors are not an afterthought. They're as much a part of your UI as all the buttons and widgets. They will happen to all software. They will happen to yours. Some poor sap who is not you will be responsible for fixing it. Give them everything you need, and you'll be rewarded with a pull request that makes your program slightly more reliable. Give them "unexpected error: success", and you'll have an interesting bug report to context-switch to over the course of the next month, killing anything cool you wanted to make while you track that down.

评论 #28287128 未加载

samuellover 3 years ago

> This end solution isn’t amazing, but it works. We’d of course far prefer if Go itself had a built-in equivalent.Totally agree.For a package like SciPipe, which we have managed to develop with zero dependencies, to maximize future reproducibility of scientific pipelines, it would very much hurt to bring in the first external (Go-) dependency, while at the same time, we really really would be much helped by stack traces.

shp0ngleover 3 years ago

From what I understand, go errors don't have stacktraces by default for performance reasons.If all errors had stacktraces, everything would be much slower.

mozeyover 3 years ago

> At boundaries between our code and calls out to external packages, make an effort to always wrap the result with xerrors.Errorf. This ensures that we always capture a stack trace at the most proximate location of an error being generated as possibleI've been doing something similar for a while, using `errors.WithStack` from <a href="https://github.com/pkg/errors" rel="nofollow">https://github.com/pkg/errors</a>The error can then be logged with <a href="https://github.com/rs/zerolog" rel="nofollow">https://github.com/rs/zerolog</a> like this `log.Error().Stack().Err(err).Msg("")`For human readable output (instead of the standard JSON) use a console writer, see <a href="https://github.com/mozey/logutil" rel="nofollow">https://github.com/mozey/logutil</a>

SamuelHarrisover 3 years ago

Individuals appear to focus on stack follows, in light of the fact that different dialects present practically every blunder in stack follow structure. I figure you should ponder why you need them and ensure you have a valid justification before thoughtlessly adding them. I do gather stack follows in some Go code, since Sentry requires it for order, yet as a general rule, you can improve work yourself, with very little magic included. <a href="https://www.nection.io/" rel="nofollow">https://www.nection.io/</a>