I don't disagree with any points but they missed a big one. If at all possible, include some application (or attempt at a globally) unique error code on each of your errors - i.e. YCOM-HN-9021. When you provide a clearly googleable string you can help your users independently resolve the issue and you can also set up google alerts on the string - if you roll out a new feature that took 3 months to develop and a week later google tells you that YCOM-HN-9021 is up 9000% you probably broke something. If at all possible make yourself open to client communication but most users won't reach out about an error - users have very low trust in customer care in the modern world (and it is, honestly, often more trouble than it's worth) and are more likely to turn to reddit/technical forums for a solution. It is extremely advantageous to try and track these users.
I'll add a few for developer-oriented messages.<p>* Say what the program was trying to do.<p>* Make the message unique and searchable.<p>* Make it detailed.<p>* FFS, include the filename or whatever else the program is having trouble with.<p>* If possible, include the source code location.<p>* If possible, include useful contextual information.<p>* Quote strings. Once in a while, some unexpected whitespace sneaks in somewhere and this can be hard to figure out.<p>Eg, don't just abort with "Open failed: NOT_FOUND". Abort with "job.c:2105 Failed to open job description file '/var/spool/jobs/125.json' when processing job #5 for user 'alice': NOT_FOUND".<p>This way I don't have to strace the damn thing to try and figure out what's it looking for, and know which user it was for, so I don't have to dig around and try and figure out which entry in the database might contain the wrong information.<p>Also, context-free, generic error messages are awful. A large enough codebase may be impossible to search for some very common keywords.<p>If possible, googleable error codes are great to have, but they shouldn't replace the error message. It's ideal if you can search the source code and instantly find where the error message originates.
At a previous job, writing unambiguous error messages was discouraged. Everything just had to be "Oops! Something went wrong"<p>The reasoning was that "users can't do anything with information we tell them anyways", despite the overwhelming number of help desk tickets we'd get from "Oops!" appearing in a million different scenarios with no clear way for us to tell what error actually caused the message to appear.<p>Users naturally report the messages that they see because they're helping us to see the problem. I didn't get why that was such a hard concept to understand
20 years ago I was working on Acrobat at Adobe. I was mostly the "Windows guy" but also worked and tested on the Mac.<p>When I tried to install Acrobat on my Mac, I got this message:<p>"Your hard disk is too small"<p>My <i>what</i> is too small?!<p>Later, on Windows I got this unexpected popup:<p>"You are not here"<p>WTF?<p>I searched the code for that string and found it in a function named "CantHappen()". This function was called in numerous places where the programmer thought there was no possible way for the code to get to that place. But of course CantHappen() <i>did</i> happen.<p>As I looked through the code I found many other messages that were bizarre and incomprehensible and sometimes downright offensive.<p>So I started a project to go through all our messages and make them more clear and informative - and even better, when possible to not have the message at all but just take care of the situation.<p>The underlying cause of these bad messages was twofold:<p>1. Programmers never got raises for writing great error messages or finding ways to avoid them in the first place. We were just rated on how much work we got done.<p>2. We did have a product designer who was supposed to specify all user-facing messages. But the designer mainly considered the "happy path" and didn't think about edge cases. It was left to developers working under time pressure to handle those.
Probably just me, but I am less concerned with how good my error messages are, and more concerned with trying very very hard to make the errors happen closer to the cause of the problem, rather than further away.<p>"Fail early, fail hard"<p>i.e. if I can make the error message happen near the beginning of a process, I can get away with making it a hard error.<p>Hard errors in the middle of a multi-hour operation tend to annoy people.
I would, if i had any evidence at all that they would be read and acted on. I’m convinced even seemingly competent people are just rendered contextually blind by the appearance of any error at all.<p>In the past month, i’ve had about a dozen interactions like this:<p><pre><code> developer: your service crashed, here’s a screenshot of the last 5 lines of the crash
me: do you see where the final text you just pasted is “RuntimeError: Did not find ENVVAR, ensure this is set to the proper value (see <internal wiki link>) and then restart this service”
developer: yeah?
me: well, did you do that thing?
developer: what thing?
me: <headdesk>
</code></pre>
and this at work, where the developer in question is intimately acquainted with the context and purpose of the project.
A big part of this is to direct more of your development time into errors that happen more frequently.<p>Most systems I was involved in designing have some kind of error tracking system, so we can know exactly how often each error occurs.<p>An error that never happened needs (usually) no attention.<p>An error that 28% of installations have seen needs <i>a lot</i> of attention. The error text should be translated into local languages, wiki pages should be written about how to resolve it, efforts should be made to auto-resolve the error. The error message should include helpful info, etc.<p>Eg. "SSH server can't start. Config file unreadable".<p>Could be split into:<p>SSH server can't start. Config file error on line 7. 'AllowPasswordLoogin' is an invalid setting. Did you mean 'AllowPasswordLogin'? If you want to make this change, 'sudo nano /etc/sshserver.conf' will let you change this config.
If you're raising an exception deep in some internal code, provide as much detail as possible.<p>If the error bubbles up to the user, then either the information is over their head, in which case there's no difference to a non-detailed error message, or the user/support person can actually act on it.<p>The most infuriating error I see is "file not found"... WHICH FILE?!<p>Of course if the error is found in the higher level due to some consistency check in the business logic, then yeah try to guide the user. But for internal stuff, try to help the person who needs to fix it or find a workaround. It might be you.
This reminds me of the two most annoying error messages of all time [for me].<p>The first one is from PayPal. Whenever I try to add a US bank account to my PayPal account, it says something like "You cannot add this bank account at this time, period"<p>After more than a year, it turned out that there was no way to add such an account for a foreigner, despite my friends [from the same country] being able to do it easily a couple of months before.<p>The second one is, poor me again, trying to edit a Facebook page URL I created for a side project, that should read FB.com/[SIDE_PROJECT], where FB keeps rejecting my request with a generic/ unexplained/unhelpful error message despite the page URL name was available.<p>About a year later, I got it working by, SIMPLY, having my phone number verified! How bad!!
There are fundamentally two classes of error message:<p>1. Information that can help a technically engaged person debug a problem.<p>2. Information that can help a user of the system understand what they have to do the overcome the problem.<p>Since most error messages are created by people responsible for debugging the system they tend to be of the 1st class. There has to be a way to provide different information based on who is getting the error.
Watched the new Quantum Leap yesterday (it's not great) and there was this really cringeworthy moment when something goes wrong with their awesome supercomputer and the screen flashes a giant "INTERNAL SYNTAX ERROR". Apparently, somebody didn't run their linter before sending people through time. Too bad.
As with everything, context matters. It's a great run-down of how to empower an error message. Many products can add so much value and saved support resources by doing so.<p>There's one thing I wasn't sure about in this article though. Did they talk to actual users regarding these empowered error messages or even asked them what they want to see out of common error messages they run into? It seems rather difficult to empower error messages without first understanding the scenarios that got them into the error state to begin with. Next would be understanding if these error messages are helpful to the users and asking them how they go about resolving these types of issues. All of that is hinted at in the "what makes a good error message".
The general approach that I take, is that an error message is one of the most stressful occurrences that a user encounters, so it's incumbent upon me to make it as pain-free as possible.<p>First of all, unless I'm writing an engineering tool, my users aren't geeks, and don't especially care <i>why</i> the error is happening (geeks always need to know <i>why</i>). They just need to know that what was expected, did not happen. If there is a remedy, and it can be simply stated, then I can add that, but <i>it needs to be short and simple</i>. Longer stuff needs to go into some kind of secondary screen (which probably won't be read).<p>Also, I take the "shopkeeper" approach. The customer is always right, and it's never the customer's fault. I avoid any hints of blaming the user (even if it is their fault), and try to be polite and helpful[0].<p>Of course, the best way to deal with errors, is to avoid them. I try to design good affordances.<p>The rules are different for SDKs, though. In that case, I tend to send a great deal of information back. I take advantage of Swift's enums, and the ability to associate data. It can allow me to nest error reports.<p>[0] <a href="https://littlegreenviper.com/miscellany/the-road-most-traveled-by/#my-fault" rel="nofollow">https://littlegreenviper.com/miscellany/the-road-most-travel...</a>
Over time I've come to believe in the "grepability" of error messages, and the code-lines that construct them.<p>Sometimes the data (and error-messages) are flowing up and down through many different modules and APIs and job-queues and whatnot, that when an error pops up it saves a lot of developer-time when you can just text-search on the code repo(s) and see exactly the line that generated it in the first place.
,,Try again'' button is the worst way to solve the problem of having no connection. GMail does it right by trying again automatically periodically while having an error bar on the top of the screen, at the same time not stopping the user from using the application.<p>If Wix can save the data locally, why not just copy the GMail error interface and let the user decide when to connect to internet?
All the 'do this' versions suffer from the same problems as the 'don't do this' versions. Aside from fixing the tone, they are still generic, still inactionable, and still verbose.
It is my opinion that software problems tend be analyzed corresponding to these four axes:<p>- Can an end-user solve the problem themselves? If so, tell them how, if not, display a generic error message telling them to ask for support (with an error identifier they can tell the support)<p>- Developers and end-users need different information: developers need as much information as possible, like file names, contents of important variables and especially where the error happened in the source code with a backtrace, sometimes even two backtraces: the backtrace for the cause of the error, too; and end-users only need to be told what they can do, but this needs to be worded clearly and carefully. This means that error messages need to be written twice.<p>- Is the problem serious? If so, report, crash and restart, if not, just report and abort the affected operation when neccessary.<p>- The problem should be logged. Sometimes it can be sent to developers automatically.
My recent experience with docker, I am a total newb so I was running a tutorial step by step, then I get some error about apt certificates/keys/repo stuff. After lot of googling the issue was there was not enough disk space but the fucking error was pointing in a different direction. Also this is a good example why Stack Overflow is usefull for the dudes that hate on it and RTFM everyone else.<p>This is why I love exceptions, I had an issue with a C# game, but with a stack trace I could figure out myself that the issue is happening when the app initialize and fails to open a file.<p>I think twe should always give the users a detailed log and stack traces, also docker should fucking have some way to catch the issue when there is not enough space and report the error properly.
I really like this. There are clear shibboleths which identify the author as a person who deeply respects and cares for the readers of error messages, and their experiences. It makes me hopeful for the future of software when I see that there are others. Thanks for sharing.
Tried to download my data from takeout.google.com and got this error:<p>"500. It's an error."<p>Thanks, google. I tried to start a chat (I'm a Workspace customer) and could not continue because all the language choices were disabled (even English).
This is great, I would add one critical ingredient: provide actual customer care.<p>Meaning, the "way out" is to point users to customer care, but this still does not help if customer care is shit. And we know it often is.<p>Customer care should be an email address (and/or phone number) in the footer. Not a contact form. Self-help/FAQ is fine, but no replacement for direct contact. Nor is a shitty AI bot.<p>And when contacting support directly, answers should not be scripted non-sense completely ignoring the actual issue at hand.<p>I don't care if it doesn't scale. Make it scale. Your problem.
3 things I'd add (and have used with success):<p>1. Always have an error specific URL to point at. Changing a document stored outside the system is often significantly easier than redeploying a system (order of magnitude seconds or minutes vs. hours or days in the worst case). There are many benefits to this approach. It's available when your system is not. It's possible to look at metrics and collect NPS scores on the information. It's easy to add pictures, steps, links etc.<p>2. Try to add an operation specific correlation ID. This allows the user to talk about a specific instance of an error easily when dealing with support and developers to look for specific log info. This is also useful if you provide a 'get support' link on errors that require manual intervention.<p>3. Add an error specific identifier to help developers map error strings back to source code. Often with error messages that are string interpolated the unique values tend to obscure the non-unique parts of the message. Also messages that are fairly similar can make it more difficult for a developer to find the specific cause.<p>These are not alternatives, but additions to TFA's suggestions.
I believe that any language that treats errors and error management as an afterthought are bad. Also any programmer that treats errors as an afterthought or simply ignore them is going to write bad code/programs. Errors are hard and need language first level support. People talks about “higher order functions” but never how to deal with errors (mainly because it’s boring and complicated). Also errors are tightly coupled with intentions, as if you fail to do something, well that’s an error. But that also means that it’s tightly coupled with what the program is trying to achieve. So anywhere an error happens should be close to what it tries to do. Also it solves what an error is all about, which makes it easy to describe what it should be. Yes there are errors that may not fall into this category as they are much less related to what you are trying to do functionally. Any program which ignores how errors work and flow, in my experience, has always been bad in general, as the structure of it is also bad as there’s no organization.
For me error messages come in two forms.<p>1. For the user.<p>You can't do that (maybe explain why). Don't do that.<p>2. Error that's actually there for the support or engineering team for a customer to convey to support, probably with a handy copy to clipboard link (that the user has at best a 50/50 chance of using no matter how much prodding).<p>That's it.<p>Humans generally lock up hard when they see an error in my experience. No amount of information or hand holding will help most of them figure it out. It's better to try to solve it in software.<p>If the software can't fix the issue internally then they get an error message and 2 things happen:<p>1. The user is going to try something else and solve it themself (awesome) regardless of the error because they're smart and capable people and could probably solve it no matter what you told them.<p>2. Their brain locks up, they do the same thing 20 times and get the same result and complain to support with some form of "doesn't work". Doesn't matter what error you give them, they won't even try to tell you what the error was / doesn't register in their brain unless it had a cute cat on it or something (that actually works... so forget this "tone" stuff).<p>I like the article, but I am skeptical about a UX team who doesn't answer support tickets ... just magically knows what the user is thinking / will work. I get lots of advice on error messages, I change them when they ask, but when it's from folks inside the company who know the product it often isn't helpful.<p>Heck even users give bad advice about errors. I've had them tell me "Well it should have said X" where X is exactly word for word what it said (they forgot...).<p>Granted I still try to help the user along, but I'm skeptical that software with any large user base can have "good" error messages.
Really, error handling has been my big beef with CS education for like 40 years. There is none.<p>Error handling has been left to engineers, and when left to they own devices engineers will almost always make the wrong choice from a user point of view.<p>Engineering need to think of error messages this way: the error message is there to help people (which might be fellow engineers, support, and/or and your consultants) identify the error quickly so that they can manage the user's expectations, fix the error, and/or both.<p>Unfortunately, many engineering paradigms make this an impossible task.<p>Layering and encapsulation means that you have little idea what's happening downstream or how the downstream stuff actually works, but the lower-level you are the less likely the error will mean anything to the end-user.<p>Then, it's a question of who's responsible for handling the error? If you're on the backend, where does it go? Does the user care that the backend microservice can't connect to the database? Heck, the UI probably has no idea what's happening back there.<p>However, for accurate troubleshooting detail is needed.<p>For many orgs, leaving transaction IDs in your log files is the primary way that you figure out errors, especially in big distributed systems. That doesn't really help end-users, and requires developer discipline, something many engineering teams find challenging.<p>Ideally error objects would aggregate error codes up the stack, so that if an error occurs you can at least present technical people with the errors that were thrown..and they can search through the source code trying to find that unique error code. But designing that is difficult; conceptually you don't want a list of 500 error codes being thrown upwards, one from each function in the call chain. But sometimes you do.<p>Anyway, error handling design really should be part of the initial architecture, but it usually isn't because architecture guys don't really understand support.
I've been guilty of this in the past - I remember writing an error message that looked like "if you used X setting, do this, otherwise that". The code should have instead checked what settings the user enabled and given a clearer error for the situation at hand.
If you have tech support or knowledge base articles for your product, you can include unique error codes in your error messages so that Googling the error code will find the appropriate support article. Microsoft is pretty good about this with their KB article numbers and their compiler error messages like C4000: <a href="https://learn.microsoft.com/en-us/cpp/error-messages/compiler-warnings/compiler-warnings-c4000-c5999?view=msvc-170" rel="nofollow">https://learn.microsoft.com/en-us/cpp/error-messages/compile...</a>
How about just engineering stuff to not have errors in the first place.<p>My toaster is a complex bit of engineering - it has thousands of parts which all work together to take power from the wall to make toast.<p>Yet it has no errors. It just does the job I ask it to do.<p>A computer on the other hand seems to have a lot of ways to fail, and does so nearly every day. I suspect everyone reading this comment has seen at least one error <i>today</i>. Can't we engineers make the software better so that these errors can't/don't happen?
What the article is missing is how they learned the new error messages are now more helpful to the end user. Some kind of metrics: maybe, the number of support tickets/angry reviews decreased? Otherwise without clear criteria for success I'm not sure if it was worth it and wasn't just changing the error messages for the sake of changing. Sure what they talk about makes sense but "it makes sense" is not a business metric.
Redesign err msg or UX you want, I hope there is always a "more" button to show exactly what went wrong. I hate eventvwr.msc or less -nir wall of log texts.
In my current gig, I would be content if the just has <i>consistent</i> error handling. But for of dozens of endpoints (REST and grpc), there are almost as many different kinds of error responses. Some will return a 400 instead of a 404, some will return a 500 for any error, some will return a sensible error code but the status and message amount to "you called GET on XXX and it failed"
"Passing the Blame" in particular is a personal pet peeve. I hate when apps phrase errors like I did something wrong by clicking the totally normal link. Closely related is the general trend of "lol wut" tone in error messages, which really grates when you're frustrated and doing something that might be very important. "Whoops! We made an Oopsies! Sorry :("
I'm not sure we'll ever eclipse the awesomeness of the VB6 error: "Method ~ of object ~ failed".<p>On a more serious note, error messages is something I always try to keep in mind on in code reviews. Most error messages the code I review deals with are only ever seen in production logs, so I try to think what I'd do with that message (and accompanying details) if I saw it in production.
It reminds me of an article from <i>Byte</i> magazine back in 1981, but the basics stay the same.<p>I'd like to learn how to make more meaningful error messages in compilers, particularly "low code" compilers that slice code transformations thinly and thus have a hard time explaining which lines of code are interacting to create this situation that happens at phase 39.
Please also put variable names at the end of sentences when possible. For example, instead of "Your file /user/foo/bar.baz did not load correctly because of whales", how about "This file did not load correctly because of whales: /user/foo/bar.baz". The search is much easier.
Nice. While working on a large and long-going project, at one point I started redoing all error messages to be more helpful, have implied suggestions and divided by alert levels and categories. Because I decided to take a pause and take care about my users.
Here's another link for how to write useful error messages: <a href="https://www.bbc.co.uk/gel/features/how-to-write-useful-error-messages" rel="nofollow">https://www.bbc.co.uk/gel/features/how-to-write-useful-error...</a>.
I completely agree with this article, but it never bothers me in particular. But I'm a developer, so I'm an outlier. That said, I do wish that the error message I see every day would be simpler.<p><pre><code> <looks at TypeScript></code></pre>
> If the issue keeps happening, contact Customer Care.<p>This actually means "if you like wasting your time and want to speak to incompetent fools who will pass you to an endless stream of their 'colleagues' then dial this number."
> Unable to connect your account<p>Do they mean “Unable to connect <i>to</i> your account”? Because otherwise it’s not clear to me what this is about. Connect my account to what? This doesn’t read like a user-level concept.
Bonus points if your link to customer care auto-populates the fields necessary to get the ticket where it needs to go and can attach relevant diagnostic information to the resulting ticket.
Nicely written piece with clear examples. It would be great to know the impact of this work. Perhaps one metric to look at would be the number of tickets submitted to customer care?
Reminds me of years ago a junior developer I was working with got a log of good-natured ribbing for a validation message that simply said, "You can't do that."
They recommend, avoid technical jargon so change it to:<p>'due to a technical issue on our end'<p>but isn't that also generic and obvious which they were trying to avoid too.
So essentially go back to dev style error messages?<p>A UX person telling us not to do what the previous UX person thought was cute.<p>Thank you sooo much! Ask PM for a pat on the back.
write errors that don't make me think: <a href="https://dev.to/swyx/write-errors-that-don-t-make-me-think-24hg" rel="nofollow">https://dev.to/swyx/write-errors-that-don-t-make-me-think-24...</a>
> Even in today’s world of user-centered design, technical jargon still sneaks its way into error messages. You couldn’t fetch my data? My credentials were denied? What? The technical stuff is not important to the user<p>This is the opposite of what I want. Stop condescending and just tell me what actually went wrong.