How I Fired Myself

620 pointsby mkrecnyabout 12 years ago

139 comments

bguthrieabout 12 years ago

More than anything else, this describes an appalling failure at every level of the company's technical infrastructure to ensure even a basic degree of engineering rigor and fault tolerance. It's noble of the author to quit, but it's not his fault. I cannot believe they would have the gall to point the blame at a junior developer. You should expect humans to fail: humans are fallible. That's why you automate.

评论 #5293602 未加载

评论 #5294577 未加载

评论 #5293683 未加载

评论 #5294120 未加载

评论 #5293387 未加载

评论 #5293831 未加载

评论 #5294389 未加载

评论 #5296758 未加载

评论 #5294008 未加载

评论 #5297368 未加载

评论 #5298088 未加载

评论 #5294790 未加载

评论 #5293580 未加载

评论 #5295803 未加载

评论 #5296902 未加载

评论 #5294732 未加载

评论 #5295987 未加载

评论 #5297412 未加载

评论 #5295028 未加载

评论 #5296681 未加载

评论 #5295042 未加载

columboabout 12 years ago

News flash,If you are a CEO you should be asking this question: "How many people in this company can unilaterally destroy our entire business model?"If you are a CTO you should be asking this question: "How quickly can we recover from a perfect storm?"They didn't ask those questions, they couldn't take responsibility, they blamed the junior developer. I think I know who the real fuckups are.As an aside: Way back in time I caused about ten thousand companies to have to refile some pretty important government documents because I was doubling xml decoding (&amp; became &amp;amp;). My boss actually laughed and was like "we should have caught this a long time ago"... by we he actually meant himself and support.

评论 #5293934 未加载

评论 #5293955 未加载

评论 #5293348 未加载

评论 #5293412 未加载

评论 #5293026 未加载

评论 #5293085 未加载

xentroniumabout 12 years ago

This is certainly a monumental fuckup, but these things inevitably happen even with better development practices, this is why you need backups, preferably daily, and as much separation of concerns and responsibilities as humanly possible.Anecdote:I am working for a company that does some data analysis for marketers aggregated from a vast number of sources. There was a giant legacy MyISAM (this becomes important later) table with lots of imported data. One day, I made some trivial looking migration (added a flag column to that table). I tested it locally, rolled it out to staging server. Everything seemed A-OK until we started migration on the production server. Suddenly, everything broke. By everything, I mean EVERYTHING, our web application showed massive 500-s, total DEFCON1 across the whole company. It turned out we ran out of disk space, since apparently myisam tables are altered the following way: first the table is created with updated schema, then it is populated with data from the old table. MyISAM ran out of disk space and somehow corrupted the existing tables, mysql server would start with blank tables, with all data lost.I can confirm this very feeling: "The implications of what I'd just done didn't immediately hit me. I first had a truly out-of-body experience, seeming to hover above the darkened room of hackers, each hunched over glowing terminals." Also, I distinctly remember how I shivered and my hands shook. It felt like my body temperature fell by several degrees.Fortunately for me, there was a daily backup routine in place. Still, several hour long outage and lots of apologies to angry clients."There are two types of people in this world, those who have lost data, and those who are going to lose data"

评论 #5293730 未加载

评论 #5293223 未加载

评论 #5294675 未加载

grey-areaabout 12 years ago

Tens of thousands of paying customers and no backups?No staging environment (from which ad-hoc backups could have been restored)!?!?No regular testing of backups to ensure they work?No local backups on dev machines?!?Using a GUI tool for db management on the live db?!?!?No migrations!?!?!Junior devs (or any devs) testing changes on the live db and wiping tables?!?!?!What an astonishing failure of process. The higher ups are definitely far more responsible than some junior developer for this, he shouldn't have been allowed near the live database in the first place until he was ready to take changes live, and then only on to a staging environment using migrations of some kind which could then be replayed on live.They need one of these to start with, then some process:<a href="http://www.bnj.com/cowboy-coding-pink-sombrero/" rel="nofollow">http://www.bnj.com/cowboy-coding-pink-sombrero/</a>

评论 #5293888 未加载

评论 #5294121 未加载

评论 #5296178 未加载

评论 #5295207 未加载

评论 #5295059 未加载

评论 #5293652 未加载

cmosabout 12 years ago

When I was 18 I took out half my towns power for 30 minutes with a bad scada command. It was my summer job before college and I went from cleaning the warehouse to programming the main SCADA control system in a couple weeks.Alarms went off, people came running in freaking out, trucks started rolling out to survey the damage, hospitals started calling about people on life support and how the backup generators were not operational, old people started calling about how they require AC to stay alive and should they take their loved ones on machines to the hospital soon.My boss was pretty chill about it. "Now you know not to do that" were his words of wisdom, and I continued programming the system for the next 4 summers with no real mistakes.

评论 #5296035 未加载

cedsavabout 12 years ago

Whoever was your boss should have taken responsibility. Someone gave you access to the production database instead of setting up a proper development and testing environment. For a company doing "millions" in revenues, it's odd that they wouldn't think of getting someone with a tiny bit of experience to manage the development team.

评论 #5292970 未加载

评论 #5292938 未加载

评论 #5293087 未加载

mootothemaxabout 12 years ago

The CEO leaned across the table, got in my face, and said, "this, is a monumental fuck up. You're gonna cost us millions in revenue".No, the CEO was at fault, as was whoever let you develop against the production database.If the CEO had any sense, he should have put you in charge of fixing the issue and then making sure it could never happen again. Taking things further, they could have asked you to find other worrying areas, and come up with fixes for those before something else bad happens.I have no doubt that you would have taken the task extremely seriously, and the company would have ended up in a better place.Instead, they're down an employee, and the remaining employees know that if they make a mistake, they'll be out of the door.And they still have an empty users table.

评论 #5293165 未加载

hackoderabout 12 years ago

I was in a situation very similar to yours. Also a game dev company, also lots of user data etc etc. We did have test/backup databases for testing, but some data was just on live and there was no way for me to build those reports other than to query the live database when the load was lower.In any case, I did a few things to make sure I never ended up destroying any data. Creating temporary tables and then manipulating those.. reading over my scripts for hours.. dumping table backups before executing any scripts.. not executing scripts in the middle/end of the day, only mornings when I was fresh etc etc.I didn't mess up, but I remember how incredibly nerve wracking that was, and I can relate to the massive amount of responsibility it places on a "junior" programmer. It just should never be done. Like others have said, you should never have been in that position. Yes, it was your fault, but this kind of responsibility should never have been placed on you (or anyone, really). Backing up all critical data (what kind of company doesn't backup its users table?! What if there had been hard disk corruption?), and being able to restore in minimum time should have been dealt with by someone above your pay grade.

评论 #5293046 未加载

Yareabout 12 years ago

If it helps explain things, the only experience the CEO had before this social game shop was running a literal one-man yogurt shop.This happened a week before I started as a Senior Software Engineer. I remember getting pulled into a meeting where several managers who knew nothing about technology were desperately trying to place blame, figure out how to avoid this in the future, and so on."There should have been automated backups. That's really the only thing inexcusable here.", I said.The "producer" (no experience, is now a director of operations, I think?) running the meeting said that was all well and good, but what else could we do to ensure that nobody makes this mistake again? "People are going to make mistakes", I said, "what you need to focus on is how to prevent it from sinking the company. All you need for that is backups. It's not the engineer's fault.". I was largely ignored (which eventually proved to be a pattern) and so went on about my business.And business was dumb. I had to fix an awful lot of technical things in my time there.When I started, only half of the client code was in version control. And it wasn't even the most recent shipped version. Where was the most recent version? On a Mac Mini that floated around the office somewhere. People did their AS3 programming in notepad or directly on the timeline. There were no automated builds, and builds were pushed from peoples' local machines -often contaminated by other stuff they were working on. Art content live on our CDN may have had source (PSD/FLA) distributed among a dozen artist machines, or else the source for it was completely lost.That was just the technical side. The business/management side was and is actually more hilarious. I have enough stories from that place to fill a hundred posts, but you can probably get a pretty good idea by imagining a yogurt-salesman-cum-CEO, his disbarred ebay art fraudster partner, and other friends directing the efforts of senior software engineers, artists, and other game developers. It was a god damn sitcom every day. Not to mention all of the labor law violations. Post-acquisition is a whole 'nother anthology of tales of hilarious incompetence. I should write a book.I recall having lunch with the author when he asked me "What should I do?". I told him that he should leave. In hindsight, it might have been the best advice I ever gave.

Morendilabout 12 years ago

So the person who made a split-second mistake while doing his all for the business was pressured into resigning - basically, got fired.What I want to know is what happened to whoever decided that backups were a dispensable luxury? In 2010?There's a rule that appears in Jerry Weinberg's writings - the person responsible for a X million dollar mistake (and who should be fired over such a mistake) is whoever has controlling authority over X million dollars' worth of the company's activities.A company-killing mistake should result in the firing of the CEO, not in that of the low-level employee who committed the mistake. That's what C-level responsibility means.(I had the same thing happen to me in the late 1990's, got fired over it. Sued my employer, who opted to settle out of court for a good sum of money to me. They knew full well they had no leg to stand on.)

lkrubnerabout 12 years ago

Klicknation is hiring. Of themselves, they say:"We make astonishingly fun, ferociously addictive games that run on social networks. ...KlickNation boasts a team of extremely smart, interesting people who have, between them, built several startups (successful and otherwise); written a novel; directed music videos; run game fan sites; illustrated for Marvel Comics and Dynamite Entertainment with franchises like Xmen, Punisher, and Red Sonja; worked on hit games like Tony Hawk and X-Men games; performed in rock bands; worked for independent and major record lables; attended universities like Harvard, Stanford, Dartmouth, UC Berkeley; received a PhD and other fancy degrees; and built a fully-functional MAME arcade machine."And this is hilarious: their "careers" page gives me a 404:<a href="http://www.klicknation.com/careers/" rel="nofollow">http://www.klicknation.com/careers/</a>That link to "careers" is from this page:<a href="http://www.klicknation.com/contact/" rel="nofollow">http://www.klicknation.com/contact/</a>I am tempted to apply simply to be able to ask them about this. It would be interesting to hear if they have a different version of this story, if it is all true.

caseysoftwareabout 12 years ago

One of the things I like asking candidates is "Tell me about a time you screwed up so royally that you were sure you were getting fired."Let's be honest, we all have one or two.. and if you don't, then your one or two are coming. It's what you learned to do differently that I care about.And if you don't have one, you're either a) incredibly lucky, b) too new to the industry, or c) lying.

评论 #5293101 未加载

评论 #5293015 未加载

评论 #5293411 未加载

评论 #5293406 未加载

评论 #5293611 未加载

评论 #5294648 未加载

kibwenabout 12 years ago

"I found myself on the phone to Rackspace, leaning on a desk for support, listening to their engineer patiently explain that backups for this MySQL instance had been cancelled over 2 months ago."Here's something I don't get: didn't Rackspace have their own daily backups of the production server, e.g. in case their primary facility was annihilated by a meteor (or some more mundane reason, like hard drive corruption)?Regardless, here's a thought experiment: suppose that Rackspace did keep daily backups of every MySQL instance in their care, even if you're not paying for the backup service. Now suppose they get a frantic call from a client who's not paying for backups, asking if they have any. How much of a ridiculous markup would Rackspace need to charge to give the client access this unpaid-for backup, in order to make the back-up-every-database policy profitable? I'm guessing this depends on 1) the frequency of frantic phone calls, 2) the average size of a database that they aren't being paid to back up, and 3) the importance and irreplacebility of the data that they're handling (and 4) the irresponsibility of their major clients).

评论 #5294547 未加载

laumarsabout 12 years ago

I really feel sorry for this guy. Accidents happen, which is why development happens in a sandboxed copy of the live system and why back ups are essential. It simple shouldn't be possible (or at least, that easy) for human error to put an entire company in jeopardy.Take my own company, I've accidentally deleted /dev on development servers (not that major of an issue thanks to udev, but the timing of the mistake was lousy), a co-worker recently dropped a critical table on dev database and we've had other engineers break Solaris by carelessly punching in chmod -R / as root (we've since revised engineers permissions so this is no longer possible). As much as those errors are stupid and as much as engineers of our calibre should know better, it can only takes a minor lack of concentration at the wrong moment to make a major fsck up. Which is doubly scary when you consider how many interruptions the average engineer gets a day.So I think the real guilt belongs to the entire technical staff as this is a cascade of minor fcsk ups that lead to something catastrophic.

cantlinabout 12 years ago

Last year I worked at a start-up that had manually created accounts for a few celebrities when they launched, in a gutsy and legally grey bid to improve their proposition†. While refactoring the code that handled email opt-out lists I missed a && at the end of a long conditional and failed to notice a second, otherwise unused opt-out system that dealt specifically with these users. It was there to ensure they really, really never got emailed. The result?<a href="http://krugman.blogs.nytimes.com/2011/08/11/academia-nuts/" rel="nofollow">http://krugman.blogs.nytimes.com/2011/08/11/academia-nuts/</a>What a screw up!These mistakes are almost without fail a healthy mix of individual incompetence and organisational failure. Many things - mostly my paying better attention to functionality I rewrite, but also the company not having multiple undocumented systems for one task, or code review, or automated testing - might have saved the day.[†] They've long been removed.

bambaxabout 12 years ago

Once, a long time ago, I spent the best part of a night writing a report for college, on an Amstrad PPC640 (<a href="http://en.wikipedia.org/wiki/PPC_512" rel="nofollow">http://en.wikipedia.org/wiki/PPC_512</a>).Once I was finished, I saved the document -- "Save" took around two minutes (which is why I rarely saved).I had an external monitor that was sitting next to the PC; while the saving operation was under way, I decided I should move the monitor.The power switch was on top of the machine (unusual design). While moving the monitor I inadvertently touched this switch and turned the PC of... while it was writing the file.The file was gone, there was no backup, no previous version, nothing.I had moved the monitor in order to go to bed, but I didn't go to bed that night. I moved the monitor back to where it was, and spent the rest of the night recreating the report, doing frequent backups on floppy disks, with incremental version names.This was in 1989. I've never lost a file since.

评论 #5293819 未加载

JohnBootyabout 12 years ago

This happened to me once on a much smaller scale. Forgot the "where" clause on a DELETE statement. My screwup, obviously.We actually had a continuous internal backup plan, but when I requested a restore, the IT guy told me they were backing up everything but the databases, since "they were always in use."(Let that sink in for a second. The IT team actually thought that was an acceptable state of affairs: "Uh, yeah! We're backing up! Yeah! Well, some things. Most things. The files that don't get like, used and stuff.")That day was one of the lowest feelings I ever had, and that screwup "only" cost us a few thousand dollars as opposed to the millions of dollars the blog post author's mistake cost the company. I literally can't imagine how he felt.

评论 #5295507 未加载

评论 #5295552 未加载

islonabout 12 years ago

I know how you felt. Many years ago when I was a junior working in a casual game company, I were to add a bunch of credits to a poker player (fake money). I forget the where in the SQL clause and added credits to every player in our database. Lucky me it was an add and not a set and I could revert it. Another time I was going to shutdown my pc (a debian box) using "shutdown -h now" and totally forgot that I was in a ssh session to our main game server. I had to call the tech support overseas and tell him to physically turn on the server...

评论 #5293574 未加载

评论 #5293255 未加载

评论 #5295069 未加载

newishuserabout 12 years ago

You did them more good than harm.1) Not having backups is an excuse-less monumental fuckup.2) Giving anyone delete access to your production db, especially a junior dev through a GUI tool, is an excuse-less monumental fuckup.Hopefully they rectified these two problems and are now a stronger company for it.

评论 #5293012 未加载

mootothemaxabout 12 years ago

If you ever notice that your employer or client isn't backing up important data, take a tip from me: do a backup, today, in your free time, and if possible, and again in your free time, create the most basic regular backup system you can.When the time comes, and someone screws up, you will seem like a god when you deliver your backup, whether it's a 3-month-old one-off, or from your crappy daily backup system.

评论 #5293117 未加载

评论 #5293864 未加载

munificentabout 12 years ago

> The CEO leaned across the table, got in my face, and said, "this, is a monumental fuck up. You're gonna cost us millions in revenue".Yes, it is a monumental fuck-up. You put a button in front of a junior developer that can cost the company millions if he accidentally clicks it and doesn't even have undo.

leothekimabout 12 years ago

Mistakes happen, and there should have been better safeguards -- backups, locking down production, management oversight.But, I actually applaud how he tried to take responsibility for his actions and apologized. Both "junior" AND "senior" people have a hard time doing this. I've seen experienced people shrug and unapologetically go home at 6pm after doing something equivalent to this.The unfortunate thing here seems to be that he took his own actions so personally. He made an honest mistake, and certainly there were devastating consequences, but it's important to separate the behavior from the person. I hope he realizes this in time and forgives himself.

pmelendezabout 12 years ago

There are several reasons why you should not feel guilty. The company was asking for trouble and you just happen to be the trigger. These are the three top things that could prevented that incident.1) A cron job for the manually task you were doing.2) Not working directly on production.3) Having daily backupsAnd this could happened to anybody. After midnight any of us are at junior level and very prone to do this kind of mistakes.

评论 #5293707 未加载

alan_cxabout 12 years ago

I cannot believe that people still don't have reliable back up in place.My feeling is this: If you are in any way responsible for data that is not backed up, you should be fired or resign right now. You should never work in IT, in anyway, ever again. If you are the CEO of a company in a similar state, again, fire your self right now. Vow to never ever run a business again. This is 2013. And guess what? You still can't buy your unique data back from PCWorld. Your data is "the precious".As for the treatment of this guy, IMHO, his employers were the worst kind of spineless cowards. This was 100% the fault of the management, and you know what? They know it. To not have backups is negligent, and should result in a high up firings. Yet these limp cowards sought to blame this kid. Pure corporate filth of the lowest order. Even the fact he was junior is irrelevant, any one could have done that, more likely a cocky senior taking some short cut. Let me tell you now, I have made a similar cock up, and I think I know it all. But I had backups, and lucky for me, it was out of business hours. Quick restore, and the users never knew. I did fess up to my team since I thought it had direct value as a cautionary tail.Frankly, I am utterly amazed and gutted that such a thing can still happen. The corporate cowardice is sadly expected, but to not have backups is literally unforgivable negligence.Yeah, Im quite fundamentalist about data and backups. I'd almost refer to my self as a backup jihadist.

Tichyabout 12 years ago

Just wondering, when consulting I usually take care that there are appropriate clauses in the contract to make me not liable. But what is the rule for employees, are they automatically insured?In Germany there is the concept of "Fahrlässig" (negligence) and "severe negligence". Per law you are already liable if you are just negligent, but it is possible to lower it to severe negligence in the contract. That is my understanding anyway (not a lawyer). Usually I also try to kind of weasel out of it by saying the client is responsible for appropriate testing and stuff like that... Overall it is a huge problem, though, especially if the client has a law department. Getting insurance is quite expensive because it's easy to create millions of dollars in damages in IT.Before court "standard best practices" can become an issue, too. This worries me because I don't agree with all the latest fads in software development. It seems possible that in the future x% test coverage could be required by law, for example. Or even today a client could argue that I didn't adhere to standard best practices if I don't have at least 80% test coverage (or whatever, not sure what a reasonable number would be).

malux85about 12 years ago

Whoever cancelled the backups was equally responsible

评论 #5292857 未加载

评论 #5292815 未加载

评论 #5293060 未加载

评论 #5292852 未加载

kylloabout 12 years ago

Sounds like no one else at that company had any more of a clue what they were doing than you did. The whole scenario is horrifying.

Kestyabout 12 years ago

I've did a very similar thing after one year working at my company instead of clearing the whole user table I replaced every user information with my account information.I forgot to copy the WHERE part of the query .....The only difference is that it was policy to manually do a backup before doing anything on production and the problem was restored in less than 10 minutes. Even if I had forgotten to make a backup manually we had a daily complete backup and an incremental one every couple of hours.

Spooky23about 12 years ago

If I were the author, I would rewrite this and reflect on what was actually wrong here. At the end of the day, you resigned out of shame for a serious incident that you triggered.But the fact that the organization allowed you to get to that point is the issue. Forget about the engineering issues and general organizational incompetence... the human side is the most incredibly, amazingly ridiculous.I respect your restraint. If I was singled out with physical intimidation by some asshat boss while getting browbeaten by some other asshat via Skype, I probably would have taken a swing at the guy.Competent leadership would start with a 5-why's exercise. Find out why it happened, why even the simplest controls were not implemented. I've worked in places running on a shoestring, but the people in charge were well aware of what that meant.

rhizomeabout 12 years ago

The CEO leaned across the table, got in my face, and said, "this, is a monumental fuck up. You're gonna cost us millions in revenue". His co-founder (remotely present via Skype) chimed in "you're lucky to still be here".This is when you should have left. That's no way to manage a crisis.

eric_bullingtonabout 12 years ago

Wow, I'm sorry you had to experience that. I'm sure it was traumatic -- or perhaps you took it better than I would have. It must be of some comfort to look back now and realize that you only bore a small part of the blame, and that ultimately a large potion of the responsibility lies on the shoulders of whomever set up the dev environment like that, as well as whomever cancelled backups.

hcarvalhoalvesabout 12 years ago

You should fire the company for not having a staging environment nor up-to-date backups.

lallouzabout 12 years ago

I would love to see some reflection on this story from OP. What do you think you learned from this experience? Do you think your response was appropriate? What would you have done differently? Are you forever afraid of Prod env?Many, many , many of us have been in this situation before, whether as 'monumental' or not. So it is interesting to hear how others handle it.

评论 #5293079 未加载

评论 #5293031 未加载

aidosabout 12 years ago

Ah ha ha ah yeah.... I've done that.Something similar anyway (was deleting rows from production and hadn't selected the where clause of my query before I ran it).It was on my VERY FIRST DAY of a new job.Fortunately they were able to restore a several hours old copy from a sync to dev but the wasn't a real plan in place for dealing with such a situation. There could have just as easily not been a recent backup.This was in a company with 1,000 employees (dev team of 50) and millions in turnover. I've worked other places that are in such a precarious position too.At least my boss at the time took responsibility for it - new dev (junior), first day, production db = bad idea.

Yhippaabout 12 years ago

"The implications of what I'd just done didn't immediately hit me. I first had a truly out-of-body experience, seeming to hover above the darkened room of hackers, each hunched over glowing terminals."Holy crap. I know that _exact_ same feeling. I had to laugh. I know that out-of-body feeling all too well.

superflitabout 12 years ago

I would fire the CTO for canceling the backups.NEVER.. NEVER go production without backup.Backup is not only to 'recover' but to have a 'historical' data for audits, check for intrusion, etc.And the 'other' guy on Skype?'You are lucky to be here..'Seriously? You are lucky to still talk using skype because I am sure skype has some kind of backup at their user table..

ownagefoolabout 12 years ago

I worked at a small web hosting company that did probably £2m in revenue a year in my first programming job. They had me spending part of my time as support and the other part on projects.After about 3 or so months they took me took me out of support and literally placed my desk next to the only full time programmer that company had.They made all changes direct on live servers and I'd already raised this as a concern and now that became my full time job it was agreed that I'd be allowed to create a dev environment.Long story short, I exported the structure of our MySQL database and imported it into dev. Some variable was wrong so it didn't all import, so I changed the variable, dropped the schema and back to redo.Yeah that was the live database I just dropped. After a horrible feeling that I can't really explain I fessed up. I dropped it during lunch so it took about two hours to get a restore.The owner went mad but most other people were sympathetic, telling me their big mistakes and telling me thats what backups were for.The owner was going crazy about losing money or something and the COO pulled me into a room. I thought I was getting fired but he just asked me what happened and said "yeah we all make mistakes, thats fair enough, just try not to do it again".I was then told to get on with it and it must have took me a day to finish what would have taken me an hour but I done it and now we had a process and a simple dev environment. I lasted another two years there. I left over money.

vinceguidryabout 12 years ago

I used to be a web freelance web developer/tech guy with one client, a designer. What made me quit was an incident where his client's Wordpress site hadn't been moved properly to the new hosting. (not by me)The DB needed to be searched and replaced to remove all the old urls. After doing so, the wp_options cell on the production site holding much of the customizations kicked back to the defaults for the theme, the serialized data format being used was sensitive to brute DB-level changes.I had talked to my client before about putting together a decent process including dev databases, scheduled backups, everything needed to prevent just such a screwup, but he waffled. Then blamed me when things went wrong.I'd had enough and told him to do his own tech work, leaving him to fix his client's website himself. Being that I didn't build it, I didn't know which settings to flip back. I left freelance work and never looked back.People and companies do this all the time, refuse to spend the time and money ensuring their systems won't break when you need them the most, then scapegoat the poor little technician when it does.I'd like to say the answer is "don't work in such environments," but there's really no saying that it won't be this way at the next job you work, either.I certainly wouldn't internalize any guilt being handed down, ultimately it's the founders' jobs to make sure that the proper systems are in place, after all, they have much more on the line than you do. Count it a blessing that you can just walk away and find another job.

KenLabout 12 years ago

I agree with the comments here that spread the blame past this author.I manage a large number of people at a news .com site and know that screw-ups are always a combination of two factors: people & systems.People are human and will make mistakes. We as upper management have to understand that and create systems, of various tolerance, that deal with those mistakes.If you're running a system allowing a low-level kid to erase your data, that was your fault.I'd never fire someone for making a stupid mistake unless it was a pattern.

johngaltabout 12 years ago

"How I was setup to fail."Who asks junior engineer to develop directly on live systems with write access and no backup? Are you kidding me?Edit: No one ever builds a business thinking about this stuff, until something like this happens. There are people who have learned about operations practices the hard way, and those who are about to. They hung the author out to dry for a collective failure and it shows that this shop is going to be taught another expensive lesson.

jtchangabout 12 years ago

I'm with everyone else in this thread: you screwed up but in reality it is EXPECTED.Do you know why I have backups? Because I'm not perfect and I know one day I will screw up and somehow drop the production database. Or mess up a migration. Or someone else will. This is stuff that happens ALL THE TIME.Your CEO/CTO should have been fired instead. It is up to the leadership to ensure that proper safeguards are in place to avoid these difficult conversations.

greghinchabout 12 years ago

Whoever a) gave production db access to a "junior" engineer and b) disabled backups of said database is at fault. I hope the author takes this more as a learning experience of how to (not) run a tech department than any personal fault.Someone who has to use a GUI to manage a db at a company of that scale shouldn't have access to prod

chris_mahanabout 12 years ago

Let me make it really simple: Anything that happens in a company is always, always management's fault. The investors hire the management ream to turn a pile of money into a bigger pile of money, and if management fails, it is management's fault, because it can do whatever it needs to do (within the law) to make that happen. That they failed to hire, train, motivate, fire, promote, follow the law, develop the right products, market them well, ensure reliability, ensure business sustainability, ensure reinvestment in research and development, and ultimately satisfy investor, is their fault, and they further demonstrate their failure by not taking responsibility for their own failure and blaming others.

noonespecialabout 12 years ago

This was a "sword of Damocles" situation. No backups, no recovery plan, and now clue how important any of these things were.A thousand things can make an SQL table unreadable. "What do we do when this happens" is what managers are for, not finding someone to blame for it.

ferrouswheelabout 12 years ago

Ah, I remember being called away from my new year holiday when an engineer dropped our entire database.This happened because they didn't realise they were connected to the production database (rather than their local dev instance). We were a business intelligence company, so that data was vital. Luckily we had a analysis cluster we could restore from, but afterwards I ensured that backups were happening... never again.(Why were the backups not already set up? Because they were not trivial due to the size of the cluster and having only been CTO for a few months there was a long list of things that were urgently needed)

评论 #5295054 未加载

KVFinnabout 12 years ago

>I was 22 and working at a Social Gaming startup in California.>Part of my naive testing process involved manually clearing the RAIDS table, to then recreate it programatically.>Listening to their engineer patiently explain that backups for this MySQL instance had been cancelled over 2 months ago."The CEO leaned across the table, got in my face, and said, "this, is a monumental fuck up. You're gonna cost us millions in revenue".What. The. Fuck.The LAST person I would blame is the brand new programmer. They don't backup up their production database? If it wasn't this particular incident it would have been someone else, or a hardware failure.

desireco42about 12 years ago

I was working two years ago in very successful, billion dollar startup. All developers had production access, but then, if you didn't know what you were doing, you would not be working there. Also, we didn't routinely access production and when we did, mostly for support issues on which we rotated, we did through 'rails console' environment that enforced business rules. In theory you could delete all data, but only in theory, and even then, we could restore it with minimal downtime.I think it is obvious that CEO/CTO are the one to be held responsible here.

评论 #5293501 未加载

systematicalabout 12 years ago

Your CEO was correct. He should have also said the same thing to the guy who cancelled backups as well...and the guy who never put in place and periodically tested a disaster recovery plan. So much fail in this story, but mistakes happen and I've had my share as well.I once (nah twice) left a piece of credit card processing code in "dev mode" and it wasn't caught until a day later costing the company over 60k initially. Though they were able to recover some of the money getting the loss down to 20k. Sheesh.

S_A_Pabout 12 years ago

Sounds to me like this operation was second rate and not run professionally. If this sort of incident is even able to happen, you're doing it wrong. Maybe it's just my experience with highly bureaucratic oil and gas companies, but the customer database has no backup for 2 months?!?!?!?!?!?!That is asinine. What would they have done if they couldn't pin it on a junior engineer? A disk failure would have blown them out of the water. I think he did them a favor, and hopefully they learned from that.

mac1175about 12 years ago

Wow. This reminds me of a time in which I used to work for a consulting agency. It was back in 2003 and I was working on a some database development for one of the company's biggest clients. One day, I noticed the msdb database had a strange icon telling me it was corrupted. I went onto MSDN and followed some instructions to fix it and, BAM, the database I was working for months on was gone (I was running SQL Server 2000 locally where this all happened and I was very junior as a SQL developer). I was silently freaking out knowing this could cost me my job. I got up from my desk and took a walk. On that walk, I contemplated my resignation. When I got back from my walk, a thought occurred to me that maybe the database file is still there (I had zero clue at the time that msdb's main purpose was just cataloguing the existing databases among other things). I did a file search in the MSSQL folders and found a file named with my database's name. So, that day I learned to attach a database, what msdb's role is, and to make sure to take precautions before making a fix! However, OP's post shows that this company had no processes in place control levels of access or disaster recovery. Show the company's faults more than OP's.

samstaveabout 12 years ago

This was clearly a lack of oversight and sound engineering practices.Who cancelled the backups? Why were they cancelled? Was it for non-payment of that service?---I worked for an absolutely terrible company as Director of IT. The CEO and CTO were clueless douchebags when it came to running a sound production operation.The CTO would make patches to the production system on a REGULAR basis and break everything, with the statement "that's funny... that shouldn't have happened"I had been pushing for dev|test|prod instances for a long time - and at first they appeared to be on-board.When I put the budget and plan together, they scoffed at the cost, and reiterated how we needed to maintain better up-time metrics. Yet they refused to remove Dave's access to the production systems.After a few more outages, and my very loud complaining to them that they were farking up the system by their inability to control access - I saw that they had been hunting for my replacement.They were trying to blame me for the outages and ignoring their own operational faults.I found another job and left - they offered me $5,000s to not disparage them after I left. I refused the money and told them to fark off. I was not going to lose professional credibility to their idiocy.Worst company I have ever worked for.

fotoblurabout 12 years ago

I think that everyone does this at some point in their career. Don't let this single event define you. The most important thing to ask yourself is what was the lesson learned...not only from your standpoint but also from the business'.In addition, to heal your pain its best to hear that you're not the only one who has ever done this. Trust me, all engineers I know have a story like this. (Please share yours HN - Here I even started a thread for it: <a href="http://news.ycombinator.com/item?id=5295262" rel="nofollow">http://news.ycombinator.com/item?id=5295262</a>)Here is mine: When I worked for a financial institution my manager gave me a production level username and password to help me get through the mounds of red tape which usually prevented any real work from getting done. We were idealists at the time. Well I ended up typed that password wrong, more than 3 times...shit, I locked the account. Apparently half of production's apps were using this same account to access various parts of the network. Essentially, I brought down half our infrastructure in one afternoon.Lesson learned: Don't use the same account for half your production apps. Not really my fault :).

nigglerabout 12 years ago

If you want to see monumental screw-up, look at knight capital group (they accumulated a multi billion dollar position in the span of minutes, losing upwards of $440M, and ended up having to accept a bailout and sell itself to GETCO):<a href="http://dealbook.nytimes.com/2012/08/03/trading-program-ran-amok-with-no-off-switch/" rel="nofollow">http://dealbook.nytimes.com/2012/08/03/trading-program-ran-a...</a>

blisterpeanutsabout 12 years ago

Good lord, that's unbelievable! If millions of dollars are riding on a database, they should have spent a few thousand to replicate the database, run daily backups and maintain large enough rollback buffers to reverse an accidental DROP or DELETE.We've all screwed up at various times (sometimes well beyond junior phase), but not to have backups.... That's the senior management's fault.

gearoidocabout 12 years ago

This post just made me feel 10x smarter (not that I blame the author - the blame here lies at the feet of the "Senior" Devs).

cmbausabout 12 years ago

Any manager who doesn't take responsibility for this isn't a manager you'd want to work for. The manager should be fired.

评论 #5296322 未加载

doktrinabout 12 years ago

I found myself doing very much this my very first day on the job working for a software startup.We had a Grails app that acted as a front end for a number of common DB interactions, which were selected via a drop down. One of these (in fact, the default) action was titled "init DB". Of course, this would drop any existing database and initialize a new one.When running through the operational workflow with our COO on the largest production database we had, I found myself sleepily clicking through the menu options without changing the default value. I vividly remember the out of body experience the OP describes, and in fact offered to fire myself on the spot shortly thereafter.It's fun to laugh about in hindsight, but utterly terrifying in the moment - to say nothing of the highly destructive impact it had on my self confidence.

评论 #5294161 未加载

jgeertsabout 12 years ago

This article sounds so incredible to me, I think I might have been holding my breath reading it. These are two major mistakes that the company is responsible for, not the author. Why would they let anyone in on the production password and do direct queries onto that database instead of working on a different environment, it's laughable that they sent this to their customers admitting their amateurism. Secondly, no backups? At my previous project, a similar thing happened to our scrum master, he accidently dropped the whole production database in some kind of the same situation. The database was back up in less than 10 minutes with an earlier version. It's still a mistake that should not be possible to make, but when it happens you should have a backup.

tetsuseusabout 12 years ago

I once fired everyone at a nonprofit foster care company with a careless query.I cried to the sysops guy, and he gave me a full backup from 12 hours before, and before any cronjobs ran I had the database back in order.Backups are free. It was their fault for not securing a critical asset to their business model.

alyrikabout 12 years ago

Oh dear... I once logged into the postgresql database of a very busy hosted service in order to manually reset a user's password. So I started to write the query:UPDATE principals SET password='Then I went and did all the stuff required to work out the correctly hashed and salted password format, then finally triumphantly pasted it in, followed by '; and newline.FORGOT THE WHERE CLAUSE.(Luckily, we had nightly backups as pg_dump files so I could just find the section full of "INSERT INTO principals..." lines and paste in a rename of the old table, the CREATE TABLE from the dump, and all the INSERT INTOs, and it was back in under a minute - short enough that anybody who got a login failure tried again and then it worked, as we didn't get any phonecalls). It was a most upsetting experience for me, however...

fayyazklabout 12 years ago

While i fully agree with the position of author not being responsible entirely, i find it hard to believe it happened the way it appears to be.It could be but there are bunch of loopholes. - I can believe that he was lousy enough to click no "delete" on users table. I can believe that he when the dialog box asked "are you sure you want to drop this table" he clicked yes. I can believe that after deleting he "committed" the transaction. But what i can't believe that the database let him delete a table which was base for every other table implemented by a Foriegn key constraint ? It could be argued that due to efficiency they hadn't put constraints on the table but it's hard to digest.Probably the story is some what tailored to fit to a post.

clavalleabout 12 years ago

Wow. So many mistakes.Working in production database? Bad.No backups of mission critical data? Super bad.Using a relational database as a flat data store? Super bad.Honestly...I think this company deserved what they got. Good thing the author got out of there. Hopefully in their new position they will learn better practices.

评论 #5293717 未加载

pjaabout 12 years ago

I doubt the problems this company had started when they employed the author of this blog post!

anton-107about 12 years ago

It's unlikely that the database had no foreign keys related to users table. And if so DBMS should have prevented deleting all users from the table.Perhaps the Database Designer also failed his job. As well as the guys who cancelled backups and set up dev environment.

评论 #5293995 未加载

elomarnsabout 12 years ago

Although the author of the post obviously did a huge mistake, he is far from being the actual responsible for the problem that follows his mistake. It's the job of the CTO to make sure no one can harm the company main product this way, accidentaly or not.He could never write code against the production database when developing new features. And if he was doing it, it wasn't his fault, considering he was a junior developer.And who the hell is so stupid to don't have any recent backup for the database used by a piece of software that provides millions of revenue?In the end, when you do such a shity job protecting your main product, shit will eventually happens. The author of the post was merely a agent of destiny.

tn13about 12 years ago

I dont think this is author's fault. These kind of human mistakes are more than common. It is said that the top management actually assigned the blame to this young man. This was an engineering failure.I can understand what this person must have gone through.

jdmarescoabout 12 years ago

I had a similar situation when collaborating with a team on a video project during a high school internship. Somehow I managed to delete the entire timeline accounting for hours of editing work that my boss had put in. To this day I don't know how it happened, I just looked down and all the clips were gone from the timeline. In the end, I think we found some semblance of a backup, and at least we didn't lose the raw data/video content, but I can relate to the out-of-body experience that hits you when you realize you just royally screwed up your team's progress and there's nothing you can do about it.

zulfishahabout 12 years ago

Every engineer's worst nightmare. I've worked at a one of the biggest software companies in the world, and I'm working on my own self-funded one-person startup: the panic before doing anything remotely involving production user data is still always nerve-wracking to me. But agree with everyone's assessments here of the failure of the whole company to prevent this. A hardware failure could have just as likely have wiped out all their data. If you're going to cut corners with backing up user data, then you should be prepared to suffer the consequences.Thanks for sharing this. Took real guts to put it out there.

navid_dicholsabout 12 years ago

If your senior management/devs are worth anything, they were already aware that this was a possibility. There is no excuse for what ostensibly appears to be a total lack of a fully functioning development & staging environment--not to mention any semblance of a disaster recovery plan.My feeling is that whatever post-incident anger you got from them was a manifestation of the stress that comes from actively taking money from customers with full knowledge that Armageddon was a few keystrokes away. You were just Shaggy pulling-off their monster mask at the end of that week's episode of Scooby Doo.

lognabout 12 years ago

Your response should have been: "With all due respect sirs, I agree that I am still lucky to be here, that the company is still here being that it's so poorly managed, that they cancelled their only backups with rackspace, that they had no contingency plans, and that you were one click from losing millions of dollars--in your estimate. It makes me wonder what other bills aren't being paid and what other procedures are woefully lacking. I will agree to help you though this mess and then we should all analyze every point of failure from all departments, and go from there."

monkeyonahillabout 12 years ago

That wasn't your failure per se. But the failure of pretty much everyone above you. That they treated you like that after the fact is pretty shitty. In hindsight I'd say that you are much better off by not being there, where you would learn bad practices.No Stage Environment. Proactively Cancelled Backups on a Business Critical System. Arbitarily implementing features 'because they have it' rather than it having some purpose in the business model. No Test Drills of disaster scenarios. The list goes on. As I say, and you probably realise now, that you are lucky to no longer be there.

flaymanabout 12 years ago

This is not your fault. Not really. And it's galling that the company blamed the incident on the workings of a 'junior engineer'. There was NO DATABASE BACKUP! For Christ's sake. This is live commercial production data. No disaster recovery plan at all. Zilch. And to make matters worse, you were expected to work with a production database when doing development work. This company has not done nearly enough to mitigate serious risks. I don't blame you for quitting. I would. I hope you have found or manage to find a good replacement role.

banachtarskiabout 12 years ago

What company that makes millions in revenue doesn't replicate their database or at least have snapshots?What engineer uses any GUI to administrate MySQL?This story feels totally unreal to me (unreal as in just crazy, not disbelief).

developingJimabout 12 years ago

Know a lot of others have said it, but no production backups? Blame a junior dev for a mistake that almost 100% of the people I've worked with have made at some point or another (including me)? I feel horrible for the author, it's sickening the way he was treated. Now they'll just move on, hire another junior, never mention this, and guess what? The next guy will do the same thing and there probably still aren't any backups. Didn't learn anything, well, other than how easy it is to blame one person for everyone's failure.

TheTechBoxabout 12 years ago

A lot of people have said it before on here but really?! The company is blaming on person, whilst yes it was technically his fault, why in the first place was he allowed on the production database and why was the company keeping very regular backups of all this mission critical data.If the company saw that the data contained in this live database was so critical you would have thought that would not have given the keys to everyone and that if they did, they would at least make sure that they can recover from this, and fast.

stretchwithmeabout 12 years ago

While working for a large computer company in the late 90s, I joined a team that ran the company store on the web. The store used the company's own e-commerce system, which it was also selling.The very first day, at home in the evening, I went to the production site to see if I could log in as root using the default password. Not a problem.Anyone with any experience with the product could easily have deleted the entire database. I immediately changed the password and emailed the whole team.No one ever responded.

lucb1eabout 12 years ago

Let me get this straight.- Tens of thousands of paying customers- No backups- Working in a production database- Having the permissions to empty that table- Even having read access to that table with customer info...You are hardly responsible. Yeah you fucked up badly, but everyone makes mistakes. This was a big impact one and it sucks, but the effect it had was in now way your fault. The worst-case scenario should have been two hours of downtime and 1-day-old data being put back in the table, and even that could have been prevented easily with decent management.

ThePhysicistabout 12 years ago

The only people that should have gotten fired for this are:* The person responsible for the database backup (no backup plan for your production DB!? wtf)* The person having designed the SQL admin tool (not putting an irreversible DELETE operation behind a confirmation dialogue!? wtf)* The person giving full write access to the company's production database to a junior developer (data security!? wtf)Sure, the employee made a mistake, but most of the failure here is due to the bad management and bad organizational design.

danielnaabout 12 years ago

I still remember the all-consuming dread I felt as an intern when I ran an UPDATE and forgot the WHERE clause. I consider it part of the rite of passage in the web developer world. Kind of like using an image or text in a dev environment that you never expect a client to see.Luckily the company I was at (like any rational company) backed up their db and worked in different environments, so it was more of a thing my coworkers teased me for than an apocalyptic event.

hkmurakamiabout 12 years ago

I'm. a little worried for OP because he obviously took the time to keep the characters in this article anonymous, but we now know who this CEO with ridiculous behavior must have been, since we know the name of OP's former company from his profile. Not sure what said former CEO of Noe acquired company can do, but this is the kind of thing I fear happeneing toe when/if I write something negative about a past employer, being a blogger myself.

jamiabout 12 years ago

Awesomely honest and painful story.This happened somewhat in reverse to someone I worked with. He was restoring from a backup. He didn't notice the "drop tables" aspect, assuming, as one might, that a backup would simply supplement new stuff rather than wipe it clean and go back in time to a few weeks ago.He is (still) well-liked, and we all felt sick for him for a few days. Our boss had enough of a soul to admit that we should have had more frequent backups.

VexXtremeabout 12 years ago

In the author's defense, it wasn't all his fault. Whoever thought it was a good idea to:1. Work directly on the production database 2. Not have daily backups 3. Not have data migrations in place for these kinds of situationsneeds to be fired immediately. My guess is it was one of the 'senior' engineers and that the author only worked with what they gave him.I've worked with all kinds of bozos but I've never seen this kind of incompetence. Ridiculous.

imsofutureabout 12 years ago

Wow, that's terrible. Mistakes happen, and for the notion of 'blame' to surface requires some monumentally incompetent management... the exact kind that would have their junior programmers developing against an un-backed-up production database.The immediate takeaway from a disaster should always be 'How can we make sure this doesn't happen again?' not 'Man, I can't believe Fred did that, what an idiot.'

anovikovabout 12 years ago

LOL a gaming startup i worked for in 2010 had the same fuckup! But nobody was fired or quit, there was just a total anger around the place for a few days, and almost all data was eventually recovered. The startup still flopped in about one year after that with ever falling user retention rates - the marketplace was more and more flooded with those more and more similar games.

toomuchcoffeeabout 12 years ago

The CEO leaned across the table, got in my face, and said, "this is a MONUMENTAL fuck up."It certainly was -- on multiple levels, but ultimately up at the C-level. Blaming a single person (let a lone a junior engineer) for it just perpetuates the inbred culture of clusterfuckitude and cover-ass-ery which no doubt was the true root cause of the fuck-up in the first place.

ry0ohkiabout 12 years ago

I think all developers have to do something like this at some point to get the compulsion I have which is backups to the extreme. I can never have enough backups. Any DROP/ALTER type change I make a backup. (And also learned to pretty much never work on a production db directly, and in the event I need to, doing a complete test of a script in staging first...)

gte910habout 12 years ago

He worked at a company stupid enough to test on the prod databases without tools to safely clear them. The former is stupid, the later is REALLY stupid.This is a multi-layer failure and almost none of the blame falls on him. Stupid compounded stupid, and this guy did nothing more than trip over the server cord several people who knew better stupidly ran past his cube exit.

Shorelabout 12 years ago

Awesome tale.However, I think the CTO was the one who deserved to be fired.Not having at the very least develop and production environments is a higher ups fault.Where I work, developers can't even touch production systems, there's a separate team responsible for that.I even have a solr, nginx, php, mysql, etc separate install of almost everything in my workstation, so I only touch test servers when doing testing.

andyhmltnabout 12 years ago

Stuff like this happens. The best thing to prevent something like this is to completely sever the line between production and development. I've worked with companies that work directly on the production database. It's horrible. How can the person in charge of managing the workflow expect something like this not to happen eventually?

msdet11about 12 years ago

Things like this fall on the shoulders of the team as a whole. Certainly a tough pill to swallow for a junior engineer, but a good more senior developer or PM should've also realized you were working on prod and tried to remedy that situation. Humans are notoriously prone to fat fingering stuff. Minimize risk where ever you can!

friendly_chapabout 12 years ago

I think it is entirely clear from the writing that the author is a humble being. I feel sorry for him, from the writing it seems he is a much better person and engineer than most of the others at that company, pointing fingers at him.The guy may be absentminded, but that is a trait of some of the brighest people on earth.

snambiabout 12 years ago

This company sucks. You are out of college and doing the first job. Are these stupid enough to give you direct access to production database? if they are making millions in revenue where was there DBAs? Obviously the management got what they deserved. Its unfortunate that it happened through you.

bart42_0about 12 years ago

You can't make an omelette without breaking eggs.Clicking on 'delete' with the user table selected was not very wise. The software maybe even asked 'Are you sure?' and of course you reply 'yes'.But operating your company with proper recovery tools is a bit like climbing Mount everest without a rope.If something goes wrong you are in deep sh.t.

danielweberabout 12 years ago

I feel tremendous empathy for this guy.Not because I've done this. But because there but for the grace of God go I. It wouldn't take much changing in the universe for me to be this guy.I'm very glad he's posting it, and I hope everyone reads it, so you can learn from his very painful mistake instead of committing it yourself.

praptakabout 12 years ago

They should reward him. Seriously, anyone who exposed such a huge weakness deserves a reward. He limited the damage to only 10k users' data loss. With such abysmally crappy practices the damage would happen anyway only perhaps with 30k users and who knows what else instead of a mere 10k.

bfrenchakabout 12 years ago

This was not a junior engineer's fault, but the DBA's fault. Any company should be backing up their database regularly, and then testing the restores regularly. Also don't give people access to drop table's etc. This was a very poor setup on the part of the company/DBA not the engineer.

elhowellabout 12 years ago

Wow dude, that's quite a story. That must have been an awful feeling, I hope you're doing better now

taurussaiabout 12 years ago

Bold on your part to own up and offer a resignation. (he "higher ups" should have recognized that and not accepted it). From the movie, "Social Network" <a href="http://www.youtube.com/watch?v=6bahX2rrT1I" rel="nofollow">http://www.youtube.com/watch?v=6bahX2rrT1I</a>

jiggy2011about 12 years ago

Wow, that's quite a story. If your company is ever 1 button press away from destruction. Know that this will eventually happen.I'm quite surprised stuff like this hadn't happened earlier. When I am doing development with a database I will quite often trash the data when writing code.

ArenaSourceabout 12 years ago

I can't believe this history, but if it's true, don't worry, you just did what they deserve

jinfiestoabout 12 years ago

This has been stated by others, but it's not the author's fault. It's totally idiotic for a database like that not to have been regularly backed up. At worst, this should have been only a couple hours of down time while the database was restored.

shaurzabout 12 years ago

It's an organisational failure if a junior employee can bring down the company in a few clicks. No backups, testing on the production database, this is no way to run a company. Feel sorry for the guy who made a simple mistake.

lexilewtanabout 12 years ago

So many structures in life are based around 'not fucking up.' We protect our assets & our dignity as if they mean anything; and yet at the end of the day nobody knows what the fuck is going on.really simple, revealing story. kudos.

zaidfabout 12 years ago

It seems insane that you still worked 3 days in a row after a gigantic mistake that can be attributed in good part to being overworked.Once the damage was done, I would have sent you home instead of overworking you further.

SonicSoulabout 12 years ago

this is insanity! it was already pointed out in comments but i still can't believe a company that mature (actually having 1000's of users and millions in revenue!!!) would omit such a basic security precaution. Giving [junior?!] developers free reign in production database and no backups???? seriously, the CTO should have been fired on the spot instead of putting blame on developer.no matter how careful you are (i'm extremely careful) when working with data, if you're working in dev/qa/uat/prd sooner or later someone on dev team will execute on wrong environment.

misleading_nameabout 12 years ago

It's not your fault. it's the fault of the person who cancelled backups, the person who didn't check that backups are being created, the "senior" people who let you work on the production database.... etc.

bwbabout 12 years ago

It was a mistake, but not huge. You should never have not have had backups, and that wasn't your responsibility. + their should have been a dev instances and a proper coding environment.So don't blame yourself there!

ballstothewallsabout 12 years ago

This is the most ridiculous thing ever. Why weren't there backups? Sure, the author was the one who "pulled the trigger" but the management "loaded the gun" by not making sure there were back-ups.

elicashabout 12 years ago

I think it's admirable that you stayed long enough to help fix everything before quitting, despite it being rough -- even though, as others have said, others screwed up even bigger than you did.

kunilabout 12 years ago

But why did you cleared users table at the first place? I don't get it.

scottschulthessabout 12 years ago

In my opinion, it's the fault of the whole organization, or at least the engineering team, for making it so likely that something like this would happen.Database backups would've solved the problem

mydpyabout 12 years ago

My name is Myles. I read this and felt like I was looking into a crystal ball. Fortunately, my work doesn't require I interact with the production database (yet). Gulp.

Nikolas0about 12 years ago

To be honest as a CEO I would fire myself for letting someone in the team work that wrong (I mean in the production server)Plus there is no excuse for not having backups...

seivanabout 12 years ago

I find it disgusting that the "game designers" are the so called overlords. Fuck them. If you're a developer and a gamer then you're practically a game designer. What ever "education" they had is bullshit. You can go from imagination to reality with just you alone. And perhaps an artists to do the drawing. All those "idea" fuckers a.k.a game designers are just bullshitters.And yeah this wasn't your fault. It was the CTO's fault. He holds responsibility."They didn't ask those questions, they couldn't take responsibility, they blamed the junior developer. I think I know who the real fuckups are."

nighthawkabout 12 years ago

the monumental fuck up was cancelling mysql backup and having all engineers work directly with the production database, what you did was INEVITABLE..

nekitamoabout 12 years ago

Using LinkedIn, you can easily figure out the name of the company and the name of the game. Using CrunchBase you can figure out the name of the CEO.

deciobabout 12 years ago

This is the most incredible story I have read in a long time. To have such a business relying all on one database and no backups... unbelievable!

engtechabout 12 years ago

I'm guessing the game was one of these? Likely Age of Champions.<a href="http://www.klicknation.com/games/" rel="nofollow">http://www.klicknation.com/games/</a>

zensavonaabout 12 years ago

How is it that a company with 'millions in revenue' is directing a junior developer to develop on a production database with no backups?

Aardwolfabout 12 years ago

In the article it says your coworkers looked differently at YOU. Did anyone look differently at their database without back ups though?

coldclimateabout 12 years ago

Did you click the wrong button - yes. Was this your fault - no. So many things wrong here.I hope he came out ok in the long run, it's a hell of a story.

cafardabout 12 years ago

He did well to leave a company that a) had such practices in place and b) would hang out an inexperienced employee to dry like that.

capexabout 12 years ago

The senior engineers got to own the mistakes of their juniors. That's how teams are built. This clearly didn't happen in this case.

ommunistabout 12 years ago

You are accidental hero and should be proud! You freed thousands of souls from one of the worse digital addictions.

meshkoabout 12 years ago

I really hope this is some kind of hoax and no real company was operating like that in 2010.

kordlessabout 12 years ago

I'm getting internal server error.

评论 #5293054 未加载

madaoabout 12 years ago

What I want to know is, why didnt the guy who canceled the database backups get fired also?

geldedusabout 12 years ago

it is the fault of those responsible with creating a regular backup procedure and/or a hotswap database serverand developping on the production database speaks volumes about the incompetency level of that company and of the "developer" in particular, afterall

shocksabout 12 years ago

I'm getting a 500 error. Anyone else? Anyone got a mirror or able to paste the content?

评论 #5293405 未加载

stretchwithmeabout 12 years ago

This is handing a heart for transplant to the Post Office and hoping for the best.

joeblauabout 12 years ago

On the bright side, now you know not to test on the production database :).

alexrsonabout 12 years ago

If your data is not backed up it may as well not exist.

outside1234about 12 years ago

this is a great example of why you run the "five whys?" after a failure like this.The CEO/CTO should have fired himself as the answer of one of those.

smalleganabout 12 years ago

This reads like a PSA for backups and RI.

coolSCVabout 12 years ago

This is why you have backups.

jblotusabout 12 years ago

just awesome

hawleyalabout 12 years ago

TIFU

daemonfire300about 12 years ago

This could've happened anyone. It's a huge shame for those in charge not for you. Any business letting such operations happen without having backups or proper user-right-management should consider why they still exists, if they really make huge amounts of many as you mentioned.

rorrrabout 12 years ago

I don't see how it's your fault, other than making a slight error of clicking on the wrong table name1) Senior developers / CTO letting anybody mess with the prod DB should be grounds for their firing. It's so incompetent, it's insane.2) No backups. How is this even possible. You even had paying customers.

139 comments

bguthrieabout 12 years ago

评论 #5293602 未加载

评论 #5294577 未加载

评论 #5293683 未加载

评论 #5294120 未加载

评论 #5293387 未加载

评论 #5293831 未加载

评论 #5294389 未加载

评论 #5296758 未加载

评论 #5294008 未加载

评论 #5297368 未加载

评论 #5298088 未加载

评论 #5294790 未加载

评论 #5293580 未加载

评论 #5295803 未加载

评论 #5296902 未加载

评论 #5294732 未加载

评论 #5295987 未加载

评论 #5297412 未加载

评论 #5295028 未加载

评论 #5296681 未加载

评论 #5295042 未加载

columboabout 12 years ago

评论 #5293934 未加载

评论 #5293955 未加载

评论 #5293348 未加载

评论 #5293412 未加载

评论 #5293026 未加载

评论 #5293085 未加载

xentroniumabout 12 years ago

评论 #5293730 未加载

评论 #5293223 未加载

评论 #5294675 未加载

grey-areaabout 12 years ago

评论 #5293888 未加载

评论 #5294121 未加载

评论 #5296178 未加载

评论 #5295207 未加载

评论 #5295059 未加载

评论 #5293652 未加载

cmosabout 12 years ago

评论 #5296035 未加载

cedsavabout 12 years ago

评论 #5292970 未加载

评论 #5292938 未加载

评论 #5293087 未加载

mootothemaxabout 12 years ago

评论 #5293165 未加载

hackoderabout 12 years ago

评论 #5293046 未加载

Yareabout 12 years ago

Morendilabout 12 years ago

lkrubnerabout 12 years ago

caseysoftwareabout 12 years ago

评论 #5293101 未加载

评论 #5293015 未加载

评论 #5293411 未加载

评论 #5293406 未加载

评论 #5293611 未加载

评论 #5294648 未加载

kibwenabout 12 years ago

评论 #5294547 未加载

laumarsabout 12 years ago

cantlinabout 12 years ago

bambaxabout 12 years ago

评论 #5293819 未加载

JohnBootyabout 12 years ago

评论 #5295507 未加载

评论 #5295552 未加载

islonabout 12 years ago

评论 #5293574 未加载

评论 #5293255 未加载

评论 #5295069 未加载

newishuserabout 12 years ago

评论 #5293012 未加载

mootothemaxabout 12 years ago

评论 #5293117 未加载

评论 #5293864 未加载

munificentabout 12 years ago

leothekimabout 12 years ago

pmelendezabout 12 years ago

评论 #5293707 未加载

alan_cxabout 12 years ago

Tichyabout 12 years ago

malux85about 12 years ago

Whoever cancelled the backups was equally responsible

评论 #5292857 未加载

评论 #5292815 未加载

评论 #5293060 未加载

评论 #5292852 未加载

kylloabout 12 years ago

Sounds like no one else at that company had any more of a clue what they were doing than you did. The whole scenario is horrifying.

Kestyabout 12 years ago

Spooky23about 12 years ago

rhizomeabout 12 years ago

eric_bullingtonabout 12 years ago

hcarvalhoalvesabout 12 years ago

You should fire the company for not having a staging environment nor up-to-date backups.

lallouzabout 12 years ago

评论 #5293079 未加载

评论 #5293031 未加载

aidosabout 12 years ago

Yhippaabout 12 years ago

superflitabout 12 years ago

ownagefoolabout 12 years ago

vinceguidryabout 12 years ago

KenLabout 12 years ago

johngaltabout 12 years ago

jtchangabout 12 years ago

greghinchabout 12 years ago

chris_mahanabout 12 years ago

noonespecialabout 12 years ago

ferrouswheelabout 12 years ago

评论 #5295054 未加载

KVFinnabout 12 years ago

desireco42about 12 years ago

评论 #5293501 未加载

systematicalabout 12 years ago

S_A_Pabout 12 years ago

mac1175about 12 years ago

samstaveabout 12 years ago

fotoblurabout 12 years ago

nigglerabout 12 years ago

blisterpeanutsabout 12 years ago

gearoidocabout 12 years ago

This post just made me feel 10x smarter (not that I blame the author - the blame here lies at the feet of the "Senior" Devs).

cmbausabout 12 years ago

Any manager who doesn't take responsibility for this isn't a manager you'd want to work for. The manager should be fired.

评论 #5296322 未加载

doktrinabout 12 years ago

评论 #5294161 未加载

jgeertsabout 12 years ago

tetsuseusabout 12 years ago

alyrikabout 12 years ago

fayyazklabout 12 years ago

clavalleabout 12 years ago

评论 #5293717 未加载

pjaabout 12 years ago

I doubt the problems this company had started when they employed the author of this blog post!

anton-107about 12 years ago

评论 #5293995 未加载

elomarnsabout 12 years ago

tn13about 12 years ago

jdmarescoabout 12 years ago

zulfishahabout 12 years ago

navid_dicholsabout 12 years ago

lognabout 12 years ago

monkeyonahillabout 12 years ago

flaymanabout 12 years ago

banachtarskiabout 12 years ago

developingJimabout 12 years ago

TheTechBoxabout 12 years ago

stretchwithmeabout 12 years ago

lucb1eabout 12 years ago

ThePhysicistabout 12 years ago

danielnaabout 12 years ago

hkmurakamiabout 12 years ago

jamiabout 12 years ago

VexXtremeabout 12 years ago

imsofutureabout 12 years ago

anovikovabout 12 years ago

toomuchcoffeeabout 12 years ago

ry0ohkiabout 12 years ago

gte910habout 12 years ago

Shorelabout 12 years ago

andyhmltnabout 12 years ago

msdet11about 12 years ago

friendly_chapabout 12 years ago

snambiabout 12 years ago

bart42_0about 12 years ago

danielweberabout 12 years ago

praptakabout 12 years ago

bfrenchakabout 12 years ago

elhowellabout 12 years ago

Wow dude, that's quite a story. That must have been an awful feeling, I hope you're doing better now

taurussaiabout 12 years ago

jiggy2011about 12 years ago

ArenaSourceabout 12 years ago

I can't believe this history, but if it's true, don't worry, you just did what they deserve

jinfiestoabout 12 years ago

shaurzabout 12 years ago

lexilewtanabout 12 years ago

zaidfabout 12 years ago

SonicSoulabout 12 years ago

misleading_nameabout 12 years ago

bwbabout 12 years ago

ballstothewallsabout 12 years ago

This is the most ridiculous thing ever. Why weren't there backups? Sure, the author was the one who "pulled the trigger" but the management "loaded the gun" by not making sure there were back-ups.

elicashabout 12 years ago

I think it's admirable that you stayed long enough to help fix everything before quitting, despite it being rough -- even though, as others have said, others screwed up even bigger than you did.

kunilabout 12 years ago

But why did you cleared users table at the first place? I don't get it.

scottschulthessabout 12 years ago

mydpyabout 12 years ago

My name is Myles. I read this and felt like I was looking into a crystal ball. Fortunately, my work doesn't require I interact with the production database (yet). Gulp.

Nikolas0about 12 years ago

To be honest as a CEO I would fire myself for letting someone in the team work that wrong (I mean in the production server)Plus there is no excuse for not having backups...

seivanabout 12 years ago

nighthawkabout 12 years ago

the monumental fuck up was cancelling mysql backup and having all engineers work directly with the production database, what you did was INEVITABLE..

nekitamoabout 12 years ago

Using LinkedIn, you can easily figure out the name of the company and the name of the game. Using CrunchBase you can figure out the name of the CEO.

deciobabout 12 years ago

This is the most incredible story I have read in a long time. To have such a business relying all on one database and no backups... unbelievable!

engtechabout 12 years ago

I'm guessing the game was one of these? Likely Age of Champions.<a href="http://www.klicknation.com/games/" rel="nofollow">http://www.klicknation.com/games/</a>

zensavonaabout 12 years ago

How is it that a company with 'millions in revenue' is directing a junior developer to develop on a production database with no backups?

Aardwolfabout 12 years ago

In the article it says your coworkers looked differently at YOU. Did anyone look differently at their database without back ups though?

coldclimateabout 12 years ago

Did you click the wrong button - yes. Was this your fault - no. So many things wrong here.I hope he came out ok in the long run, it's a hell of a story.

cafardabout 12 years ago

He did well to leave a company that a) had such practices in place and b) would hang out an inexperienced employee to dry like that.

capexabout 12 years ago

The senior engineers got to own the mistakes of their juniors. That's how teams are built. This clearly didn't happen in this case.

ommunistabout 12 years ago

You are accidental hero and should be proud! You freed thousands of souls from one of the worse digital addictions.

meshkoabout 12 years ago

I really hope this is some kind of hoax and no real company was operating like that in 2010.

kordlessabout 12 years ago

I'm getting internal server error.

评论 #5293054 未加载

madaoabout 12 years ago

What I want to know is, why didnt the guy who canceled the database backups get fired also?

geldedusabout 12 years ago

shocksabout 12 years ago

I'm getting a 500 error. Anyone else? Anyone got a mirror or able to paste the content?

评论 #5293405 未加载

stretchwithmeabout 12 years ago

This is handing a heart for transplant to the Post Office and hoping for the best.

joeblauabout 12 years ago

On the bright side, now you know not to test on the production database :).

alexrsonabout 12 years ago

If your data is not backed up it may as well not exist.

outside1234about 12 years ago

this is a great example of why you run the "five whys?" after a failure like this.The CEO/CTO should have fired himself as the answer of one of those.

smalleganabout 12 years ago

This reads like a PSA for backups and RI.

coolSCVabout 12 years ago

This is why you have backups.

jblotusabout 12 years ago

just awesome

hawleyalabout 12 years ago

TIFU

daemonfire300about 12 years ago

rorrrabout 12 years ago