Ex-Amazon engineer of several years here.<p>This is a pretty interesting article, but it's important to know that Amazon's internal tooling changes pretty fast, even if it's mostly several years behind state-of-the-art.<p>Exhibit A: Apollo<p>Apollo used to be <i>insane</i>. It was designed for the use case of deploying changes to thousands of C++ CGI servers on thousands of website hosts, worrying about compiling for different architectures, supporting special fleets with overrides to certain shared libraries, etc etc. It had an entire glossary of strange terms which you needed to know in order to operate it. Deployments to our global fleet involved clicking through tens of pages, copy-and-pasting info from page to page, duplicating actions left right and centre, and hoping that you didn't forget something.<p>When I left, most of that had been swept away and replaced with a continuous deployment tool. Do a bit of setup, commit your code to the internal Git repo, watch it be picked up, automated tests run, then deployments created to each fleet. Monitoring tools automatically rolled back deploys if certain key metrics changed.<p>Auto scaling became a reality too, once the Move to AWS project completed. You still needed budgetary approval to up your maximum number of servers (because for our team you were talking thousands of servers per region!) but you could keep them in reserve and only deploy them as needed.<p>Manually copying Apollo config for environment setup was still kind of a thing though. The ideas of CloudFormation hadn't quite filtered down yet.<p>Exhibit B: logs<p>My memory's a bit hazy on this one. There certainly was a lot of centralized logging and monitoring infrastructure. Pretty sure that logs got pulled to a central, searchable repository after they'd existed on the hosts for a small amount of time. But, yes, for realtime viewing you'd definitely be looking at using a tool to open a bunch of terminals.<p>The monitoring tools got a huge revamp about halfway through my tenure, gaining interactive dashboarding and metrics drill-down features which were invaluable when on-call. I'm currently implementing a monitoring system, so my appreciation for just how well that system worked is pretty high!<p>Exhibit C: service discovery<p>Amusingly, a centralized service discovery tool was one of the tools that used to exist, and had fallen into disrepair by the time this person was working there.<p>This was a common pattern in Amazon. Contrary to the 'Amazon doesn't experiment' conclusion, Amazon had a tendency to experiment too well - the Next Big Thing was constantly being released in beta, adopted by a small number of early adopters, and then disappearing for lack of funding/maintenance/headcount.<p>I can't think of any time I hard-wired load balancer host names though. Usually they would be set up in DNS. We did used to have some custom tooling to discover our webserver hosts and automatically add/remove them from load balancers, but that was made obsolete by the auto-scaling / continuous deployment system years before I left.<p>As for the question of "can we shut this down? who uses it?" - ha, yes, I seem to remember having that issue. I think that, before my time, it wasn't really a problem: to call a service you needed to consume its client library, so you could just look in the package manager to see which services declared that as a dependency. With the move to HTTP services that got lost. It was somewhat mitigated over the years by services moving to a fully authenticated model, with client services needing to register for access tokens to call their dependencies, but that was still a work in progress a few years ago.<p>Exhibit D: containers<p>Almost everything in Amazon ran on a one-host-per-service model, with the packages present on the host dictated by Apollo's dependency resolution mechanism, so containers weren't needed to isolate multiple programs' dependencies on the same host.<p>Screwups caused by different system binaries and libraries on different generations of host were a thing, though, and were particularly unpleasant to diagnose. Again, that mostly went away once AWS was a thing and we didn't need to hold onto our hard-won bare-metal servers.<p>'Amazon Does Not Experiment'<p>Amazon doesn't really do open source very well. The company is dominated by <i>extremely</i> twitchy lawyers. For instance, my original employment contract stated that I could not talk about any of the technology I used at my job - including which programming languages I used! Unsurprisingly, nobody paid attention to that. That meant that for many years, the company gladly consumed open source, but any question of contributing back was practically off the table as it might have risked exposing which open source projects were used internally.<p>A small group of very motivated engineers, backed up by a lot of open-source-friendly employees, gradually changed that over the years. My first ever Amazon open source contribution took over a year to be approved. The ones I made after that were more on the order of a week.<p>Other companies might regard open sourcing entire projects as good PR, but Amazon doesn't particularly seem to see it that way. Thus, it's not given much in the way of funding or headcount. AWS is the obvious exception, but that's because AWS's open source libraries allow people to spend more money on AWS.<p>Instead, engineers within Amazon are pushed to generate ideas and either patent them, or make them into AWS services. The latter is good PR <i>and</i> money.<p>As for different languages: it really depends on the team. I know a team who happily experimented with languages, including functional programming. But part of the reason for the pushback is that a) Amazon has an incredibly high engineer turnover, both due to expansion and also due to burnout, so you need to choose a language that new engineers can learn in a hurry, and b) you need to be prepared for your project to be taken over by another team, so it better be written in something simple. So you better have a very good justification if you want to choose something non-standard.<p>Overall, Amazon is a pretty weird place to work as an engineer.<p>I would definitely not recommend it to anybody whose primary motivation was to work on the newest, shiniest technologies and tooling!<p>On the other hand, the opportunities within Amazon to work at massive scale are pretty great.<p>One of the 'fun' consequences of Amazon's massive scale is the "we have special problems" issue. At Amazon's scale, things genuinely start breaking in weird ways. For instance, Amazon pushed so much traffic through its internal load balancers that it started running into LB software scaling issues, to the point where eventually they gave up and began developing their own load balancers! Similarly, source control systems and documentation repositories kept being introduced, becoming overloaded, then replaced with something more performant.<p>But the problem is that "we have special problems" starts to become the default assumption, and Not Invented Here starts to creep in. Teams either don't bother searching for external software that can do what they need, or dismiss suggestions with "yeah, that won't work at Amazon scale". And because Amazon is so huge, there isn't even a lot of weight given to figuring out how other Amazon teams have solved the same problem.<p>So you end up with each team reinventing their own particular wheel, hundreds of engineer-hours being logged building, debugging and maintaining that wheel, and burned-out engineers leaving after spending several years in a software parallel universe without any knowledge of the current industry state-of-the-art.<p>I'm one of them. I'm just teaching myself Docker at the moment. It's pretty great.