Great writeup (including the human cost, e.g. loss / lack of sleep, which in my experience has a huge impact on complicated incident resolution).<p>Here’s what jumped out at me: “The new account was created in our database with a null value in the URI field.”<p>Almost every time I see a database-related postmortem — and I have seen a lot of them — NULL is lurking somewhere in the vicinity of the crime scene. Even if NULL sometimes turns out not to be the killer, it should always be brought in for questioning.<p>My advice is: never rely on NULL as a sentinel value, and if possible, don’t allow it into the database at all. Whatever benefits you think you might gain, they will inevitably be offset by a hard-to-find bug, quite possibly years later, where some innocuous-seeming statement expects either NULL or NOT NULL and the results are unexpected (often due to drift in the semantics of the data model).<p>Although this was a race condition, if the local accounts and the remote accounts were affirmatively distinguished by type, the order of operations may not have mattered (and the account merge code could have been narrowly scoped).
The part that resonates here is saying<p>"ah yes well we have a full database backup so we can do a full restore", then<p>"the full restore will be tough and involve downtime and has some side effects," then<p>"I bet we could be clever and restore only part of the data that are missing", then<p>doing that by hand, which hits weird errors, then<p>finally shipping the jury-rigged selective restore and cleaning up the last five missing pieces of data (hoping you didn't miss a sixth)<p>Happens every time someone practices backup/restore no matter how hard they've worked in advance. It always ends up being an application level thing to decide what data to put back from the backup image.
> <i>To Renaud, Claire, and Eugen of the Mastodon developer team, who went above and beyond all expectations to help us out. You folks were amazing, you took our situation very seriously, and immediately jumped in to help us. I really could not have asked for anything more. Thank you!</i><p>I don't know if Vivaldi provides financial support to Mastodon (I couldn't find their name on the sponsors page). If not, I hope this situation causes them (and other companies using Mastodon) to consider sponsorship or a support contract.
Items two and three not happening atomically feels like an issue, though I assume there's a reason that it's not trivial to do so (I haven't looked at the code; really should at some point.)
I'll never forget the first time I had to restore a massive sql dump and realized that vim actually segfaults trying to read it.<p>That's when I discovered the magic of spit(1) "split a file into pieces". I just split the huge dump into one file per table.<p>Of course a table can also be massive, but at least the file is now more uniform which means you can easier run other tools on it like sed or awk to transform queries.
This make anyone elses eyebrows raise sky high at this?<p>> Claire replied, asking for the full stacktraces for the log entries, which I was able to also extract from the logs.<p>This is either deep voodoo magic, or the code or configuration is turning a Xeon into the equivalent of a 286. House is that not, like, megabytes on every single hit?
> And it just so happens that all local accounts in a Mastodon instance have a null value in their URI field, so they all matched.<p>How? NULL = NULL evaluates to FALSE, SQL is a three value logic, specifically Kleene's weak three-valued logic, NULL anyoperator NULL is NULL.
> 6 Users with symbols in their usernames couldn’t log in. This turned out to be due to a mistake I’d made in the recovery script, and was very easily fixed.<p>UTF-8 strikes again.
Hm, so a distributed twitter runs into the challenge that each independently managed node is ... and independently managed node. Backup problems etc.<p>Centralized twitter improves its operations for all users over time. But can be purchased by a nutso billionaire on a whim, or subjected to the """"""national security"""""" directives of the US Government.