I kind of take issue with the "No amount of load testing could adequately prepare the server team behind Diablo 3 for firepower of this magnitude".<p>If your load-testing does not prepare you for the worst then your load-testing plan is garbage.<p>They already knew how many copies had been pre-ordered and could make a pretty good guess how many copies would be sold and activated on the first night. Take that worst case estimate, now double it and test for that.<p>"But the cost of supporting worst-case scenarios!!" some may cry. This is where rented servers / cloud setups are useful for elastic scale without breaking the bank.<p>There are companies that can simulate load from users across the globe, I have no doubt that blizzard would have the connections / influence /cash to set-up a kick-arse load-test system.
I'm very curious what it is in Diablo that's causing the issues. It seems a bit odd to me that they're experiencing issues this severe for two reasons:<p>1) Diablo was developed from the fairly unusual position of knowing it would face a launch to millions of users before development ever started. I'd have imagined having this "scale it to infinity" mentality from the starting line would have helped a lot.<p>2) The whole game is conceptually VERY easy to shard. Unlike WoW, there is very little interaction between players that are not in a party together, and the maximum party size is four.<p>I wonder whether the failures have anything to do with the achivement tracking/broadcasting. It's the only component I can think of that breaks out of these obvious sharding boundaries, and I can kind of imagine how large friend lists might cause problems. Additionally, it seems achievement progress was lost from the time leading up to one of the downtimes.<p>I know it's easy to speculate from here, and there are probably very legitimate reasons for all of this. As an outsider, it seems like these particular failures are things that, in general, just happen. Still, I would have expected Blizzard in particular to be better prepared for this. It's a bit surprising.
I know all about servers being under so much load that everything falls apart. Working at a nascent mobile ad network whose traffic doubles every month, and whose monthly number of requests amount to 10 figures, I know all about it.<p>And yet… I feel like Blizzard could have made an effort to make its <i>single player</i> game run offline. The multiplayer is fantastic, but give us something to fallback on.
My favorite Blizzard launch story actually involves Microsoft.<p>Years ago, before the days of the cloud and well-understood fail over mechanisms, a very enterprise-y product happened to share datacenter space with Blizzard. One fine day, Blizzard shipped an update to WoW and from what I hear, it took down networking across the DC and left everyone scrambling.<p>Try explaining to your customers that your business critical service just went down because Azeroth got a new continent.
I can't stress enough how I admire and respect server/dev ops people. Their job is among the hardest and people definitely overlook their importance way too often. I wouldn't even know how I would go about finding them
This whole episode has been a massive face plant for Blizzard.<p>Consider: they have access to all of the sales data so they know how many copies of the game could potentially be played at launch.<p>They ran an open beta so they should have a good idea of how everything scales relative to total simultaneous user count.<p>They have extensive experience with all aspects of patching, scaling, and server operations through World of Warcraft and Starcraft II.<p>They intentionally decided to go with a "single player" experience that required connectivity and incurred server load.<p>Given all of that, there really is no excuse to fail as hard as they did on launch day. It is 2012, the standards are pretty high for getting things right with digital distribution and with online games. More so, if you make a bold decision to force connectivity for a single player game you damned sure better get it right or you are going to destroy your credibility.<p>Blizzard is enormously lucky that they have a very strong history of compelling games, these sorts of issues could easily cause an upstart game studio to go out of business.
Yea SRE people are just fine.
But whoever decision maker decided it was a good idea to run a single player game online just need to get a clue. And don't worry, people who pirate will use a server emulator as they've done for every previous such protection.
It is probably too expensive (development wise) to scale for this many concurrent users as it will only happen once (or twice in case of expansions) during the whole lifetime of the game.
Given how much people love Blizzard, I wonder if they really care /that/ much about launch day issues?<p>They have massive amounts of experience in this field, I'm sure they had the capability to make launch day run much smoother, so why didn't it? Perhaps they thought 'unprecedented demand for new game forces it temporarily offline' sounds like a nice headline in the paper.<p>It's just one day after all, and all the people who play on launch day have already spent their money, and probably aren't really the type to get a refund.
+1 for server teams! sysadmins never get appreciated enough.<p>However, as far as the Diablo 3 launch goes, some thoughts:<p>+ Blizzard have been doing this for years<p>+ They know how many people have pre-ordered and pre-installed the game<p>+ The game is singleplayer, yet they decided upon this online requirement (no offline play, thanks Blizzard)<p>All in all it's pretty annoying to purchase something and not be able to play it because they simply haven't upgraded their infrastructure for the load they should have expected.
Does anyone have any information on what their infrastructure looks like? What do they use to manage their servers? I guess a lot of this information would be "proprietary" but scaling something this large would be a great read. I manage about 1000 non-critical (think kiosks) servers, and deploy the code to them; and it is relatively painless, I would be curious to know how the big boys do things.
Strange to see them called "server teams". Devops - maybe. Devs - someone's got to fix the actual code issues. Ops - if it's a platform configuration issue. But whenever I read "server team", I'm thinking of the DC ops racking the actual hardware.<p>Is it a common name for devops in other companies?