What they don't tell you about working for yourself is the fact you can be effectively on-call 24x7 every day. I am currently supporting four wineries that are processing thousands of tonnes of receivals 24x7. It happens for two months of the year and I am expected to be available from 06:00 to 22:00 during that time, there is no phoning in sick or having a lazy day, I work alone and only have one reputation. I don't want to be that contractor forever known for destroying a clients business.<p>You can only do this for so long though, when two or three problems come in simultaneously it can cause issues as you drop something halfway through when something more important comes in. I once executed an SQL update query without a where clause under this kind of pressure, and ended up working until the next morning to recover, only to start again at 6AM. I have even had land-line calls at 2AM to bypass my mobile restrictions. The rewards are great, but don't let anyone tell you it is always easy.<p>My current system is 16 years old now and I know all the ins and outs so it has been pretty easy to keep on top of things the last several years, however I am glad the replacement system is nearly written and it will be somebody else problem in 2026.
For "non-exempt" employees, that's paid "stand-by time" California.[1] Also see this case involving on-call coroners.[2]<p>The way this works in most unionized jobs is that there's a stand-by rate paid for on-call hours, plus a minimum number of hours at full or overtime pay, usually four, when someone is called to duty. This is useful to management - if the call frequency is too high, it becomes cheaper to hire an additional person.<p>[1] <a href="https://www.dir.ca.gov/dlse/CallBackAndStandbyTime.pdf" rel="nofollow">https://www.dir.ca.gov/dlse/CallBackAndStandbyTime.pdf</a><p>[2] <a href="https://casetext.com/case/berry-v-county-of-sonoma" rel="nofollow">https://casetext.com/case/berry-v-county-of-sonoma</a>
Excellent article. I can relate to a lot of it. The sad part is that we can't even control the quality of the systems we're oncall for. We're pushed by management for new features, not for robustness of the tools. Also some systems have no clear ownership, so nobody has an incentive to fix them. It'll be next oncall's business. Oncall is really the worst part of my job. I can stand long hours but this is something else.
I just want to point out that the answer is shift work. Here's an example of an SRE job at a national lab:<p><a href="https://lbl.referrals.selectminds.com/jobs/site-reliability-engineer-7041" rel="nofollow">https://lbl.referrals.selectminds.com/jobs/site-reliability-...</a><p>"Work 5 shifts per week to monitor the NERSC HPC Facility, which includes 2 - 3 OWL (midnight - 8am) shifts. Some days may be onsite, some may be offsite. The schedule will be determined by staffing needs."<p>40 hours per week, full salary, full disclosure about the night shifts, but none of this 24x7 wake up in the middle of the night on top of your regular job bullshit that the tech industry insists on.
Here's an idea: Compensate any on-call work received during off hours at 10X the normal hourly rate. E.g., if my salary is $150K per year, then my hourly pay rate is about $75 per hour, so compensate my on call work at a rate of $750 per hour. Thus if I get a call at 10pm, log in to my laptop and work for 30 minutes to resolve the issue to a satisfactory level, then I pocket $375. That puts a financial incentive on companies to structure their on call protocols so that only the most important calls are handled. And I can envision variations on this theme. Different sorts of on-call disasters could offer bids for how much they're worth to fix based on some automated rubrick, and anyone on the ENG team could pick these up on a first-come, first-serve basis. Or various combinations of the above for a guaranteed backup person. But the companies should offer enough incentive to make it worthwhile. And this is in the companies' own best interest. To maintain a workforce that can think clearly during the normal work, to have a good reputation in the industry, to get good reviews on Glassdoor, etc.
One of the biggest problems with on-call rotations is that you're actually incentivized to do it poorly. Every minute you spend doing on-call work is time that you can't spend on the things you've actually been assigned to do. You're never going to put your on-call work into your performance reviews; doing so might actively work against you. "I see that you spent time tuning alerts and updating the runbook. That's time you <i>didn't</i> spend on the actual tasks that were assigned to you."<p>If it's better to spend the least amount of time doing on-call work then the logical conclusion is that it's best to snooze as many alerts as possible until they either go away on their own or roll over past your rotation. Fixing the underlying problem <i>might</i> be worthwhile if it's something that you can fairly easily fix but if the on-call rotation is more than 2 people, the underlying problem is mathematically unlikely to be of your making and is it really a good idea to make a habit of fixing other people's broken code?<p>What's crazy is that I've never seen anyone with on call duties acting in this worst case bad faith manner. Companies basically abuse the work ethic of their employees because it's the cheapest possible way to check that box.
That article made me shudder with echos of having what we used to call “beeper madness” back in the 1990s. After a while of being on a roster of on call weeks, anything that beeped would make you reach for that pager on your belt.<p>As a kid the first few weeks were kind of exciting as it felt like you had been elevated to a new level of responsibility. Once that wore off it was obvious what a cage it was.<p>I don’t miss pagers.
" insure against every possible thing that could ever go wrong, they would have to build a second studio on a separate part of the city’s electric grid, with redundant copies of all the equipment and broadcast content, along with a full crew of understudies ready to take over at a moment’s notice."<p>WTH?? I guess this person has never heard of backup generators? Every broadcast TV station has them.
That seems like a very long-winded way to say you hate on-call, which is a completely normal thing to do. That said, is on-call effectively mandatory or very popular in the US startup world? Because here, in the European established company world, I can’t really recall seeing a job posting with on-call listed.
PTSD. I was that guy dragging his bagpack everywhere. A drink at a bar on Saturday ? Sunday lunch with the in-laws ? with my bag pack, ready to bust the laptop in case of emergency.<p>My two personal lows... I had to pull the laptop at my own birthday party, in a restaurant. And at a funeral (not proud on that one).<p>So sad.
There's a great option for small companies that aren't amenable to on-call that is underappreciated. Hire an MSP. There are companies whose entire business model is having a geo distributed team with a stack of automated monitoring and run books for multiple clients. You train them, pay a set fee and never ask anyone on your team to be on call.
I’m quitting my job with nothing lined up because our oncall is such a piece of shit. House arrest every 5 weeks with no compensation. If it wasn’t for this I would just quiet quit but there’s no way I can make it through another shift without getting fired. Fuck oncall and anyone who has such little respect they think it’s ok
This article gave me unpleasant flashbacks to the first half of 2023. I resigned from planet.com in mid 2023 due to the stress caused by being on-call every second week. It took me six months to get my head into a healthy state again. Now I have a <i>much</i> better job, better paid and no possibility of on-call, ever.
The difference with dev oncall vs doctor on call is that it is self inflicted.
Why are you getting paged? Because you built the system.
Either your system isn't resilient enough, or you have noisy alerts. Both are problems you should be motivated to fix.<p>I have been on call 24/7/52 in SRE roles most of my career. It has either sucked hard, or not at all. And the time it sucked the most was because every single practice was bad. And now, I build better things because of if. Paying me more for on call wouldn't have changed how much it sucked. It wouldn't have made any material impact on my actual quality of life. But it would have done two things:
1) made me feel like I can't complain
2) give me less motivation to fix it<p>Paying for on call doesn't seem like a win. I want happy employees, not disgruntled but silent ones.
Yeah, I can relate to people saying they nearly got PTSD, I sure did get it. Paging apps use seriously offensive alarm sounds. I hated every sound they had in the options. It made me instantly sick. Fuck that!
I want to armchair quarterback this:<p>> <i>I suffered through a particularly acute week of on-call pain. At one point I was in my third or fourth video call about the same long and protracted smoldering SEV and, in a moment of frustrated weakness, I made an offhand comment about just being tired of repeatedly handling the same problem. [...] With the utterance of a single sentence, I opened a rift in the relationship with my manager that remained until the day I left that job.</i><p>It's possible that the manager dropped the ball in <i>two</i> ways:<p>1. Perma-haranguing an employee, when manager should instead have had a talk with employee about what was bothering the employee. (And split it, between the little that really needs to be said right now, while employee is already overextended, and what can happen after they recharge.)<p>2. The root cause of the recurring problem might be something management should've tackled already. Which is additional reason not to blame the employee, for calling out the problem in frustration.
I was on call in a 4 man startup for a 1 week rotation for about 9 months, 6 years ago. I still have an anxiety reaction when my phone rings. Can very much relate to the author's thoughts about PTSD.
This is an excellent article. I've commented before¹ why "on-call" as described is bad because it conflates roles and responsibilities and robs developers of resources, but this goes quite a bit further and explains why it's a bad practice that eventually leads to burnout.<p>With the exception of companies like PagerDuty, the sooner this practice is ended the better off we'll be.<p>¹ <a href="https://news.ycombinator.com/item?id=43400898">https://news.ycombinator.com/item?id=43400898</a>
> <i>My manager was present on the call, and my statement seemed to really set him off. I was essentially told that my feelings about the situation—perhaps the only authentic part of myself I ever expressed there—were wrong. In the days that followed I was made to feel like I was not a team player, that I was not pulling my weight, and that I was not meeting the bare minimum of what was expected of a person bearing the torch of on-call. With the utterance of a single sentence, I opened a rift in the relationship with my manager that remained until the day I left that job.</i><p>So, this is just <i>plainly bad</i> leadership, right? Totally believable too, of course, but just really bad. Bad for the employee, but also self-defeating for the manager.<p>It seems like this would be an awful manager reaction to anything short of a quasi-fireable offense. If that's your response to an employee to not being enthusiastic about a part of work that sucks, what are you even doing as a leader?
(Former) iOS app developer here. I was oncall once and it was actually not that bad because every change had to go through app review, which put the lower bar on response times at days if not weeks. I hate app review but it was actually very nice that "oncall" really just meant "check the Slack channel in the morning" because there was no point in doing anything faster.
In my country if you are on employee agreement, you are just paid for overtime. My company never made any problems with that payment. And surprise...there were always people who chose to take the additional hours. I was often among them - the additional money/vacation was just too nice to resist.
I had the same reaction as the author, the first time I heard the Kafka name, and then the same reaction when I heard the reasoning behind the name.<p>(Then I wondered when would be a sufficiently good occasion to say "Kafkaesque" regarding the software, since that's not something you want to waste.)
Highly recommend listening to the 70s country gem that the title is an homage to: Take This Job And Shove It [1] by (the aptly named) Johnny Paycheck<p>[1] <a href="https://youtu.be/gj2iGAifSNI" rel="nofollow">https://youtu.be/gj2iGAifSNI</a>
This article form might be cathartic for the writer, and actionable recommendations aren't the main point. But they are sprinkled throughout, and a small management slide deck could be distilled from the piece.
I recall "Soul of a New Machine" having parts that were fatiguing to read, incidentally (or intentionally) conveying the miserable mood of the characters, slogging through the project's trials.
Counterpoint: I'm at a small business and I'm primary for 24x7 oncall. I don't even take shifts with my coworker. But, this is because I'm empowered to make out of hours (overnight, weekend, holiday) calls <i>STOP</i>. I get woken up by something about once or twice a quarter.<p>When something wakes me up, the next day I start a process to ensure I don't get that same alert again: bugfixes, adjusting thresholds or time-to-critical, detecting problems and auto-remediation, determining it can be a "business hours" response.<p>This also requires buy-in from development. Literally yesterday I had an education opportunity with one of the developers about a ticket slated to go into production that evening that would have immediately eliminated one of our leading monitoring indicators, because it would have started creating hundreds or thousands of Sentry issues an hour. "I was thinking it was more like logging, where more information is better, where with monitoring we want the fewest messages possible."<p>Always, always, look at every pager hit and ask "what can prevent that from happening again?"
OP is right, but apart from that, this is one of the best written pieces I've read in an age. Agree, disagree, but it's so well written it's mesmerizing.
being oncall forces the quality of software to improve.<p>if you want fewer incidents: ensure better QA, monitoring, smaller rollouts<p>usually developers start becoming more conservative after they do few oncall shifts and suddenly prioritize important reliability improvements, instead of shiny new features nobody will use
I was on as many as three on-call rotations for a few years. One had only two people for a while, so I was on every other week. The two things I most remember are:<p>* Arranging my whole life around on-call requirements. Bringing my laptop and backpack every time I went out. Designing new running routes that would use every street in a neighborhood and keep me close to home so I could respond within a 15-minute window. And yeah, the drinking thing. It pervaded my life in many ways I hadn't expected.<p>* Time zones and geography. These were always problematic, but especially during on-call. Often I'd narrow a problem down to a particular component that I didn't know well, so I'd try to contact the sub-team responsible for it, but <i>nobody</i> would respond. Then I'd try to turn the right knobs myself, and as often as not get yelled at for it in the morning. No, my afternoon, because my coworkers were three hours behind and late-commuters to boot. Of course they'd never hesitate to schedule meetings or ping me for trivial things well after my dinner.<p>I had taken the job, initially working on a project for which I was already a maintainer, because I wanted to avoid becoming an "architecture astronaut" by getting closer to operational reality. Indeed, I did learn a lot about how my own code behaved in real life. I don't have a problem with on-call requirements in and of themselves, but the way people and organizations handle the details is kind of <vomit emoji>.
There is another option. The on-call person just does a deliberately piss-poor job of resolving the problem. I mean, they resolve it but they make sure it takes a hour longer than necessary.<p>What are they going to do, fire you? If they make life hard for you, then get another job. The shoddier your work outside of your normal hours, the better. You can have quality, speed and cheapness, but you can only pick two.
The OP needs to write with some more focus. Most of this reads like a very long rant by someone who was woken up too many times recently.<p>> We need to talk about Kafka<p>No we don't, that entire section was irrelevant.