At a real estate webhost in 2014, we had a small web farm behind a single load balancer. I've written some previous posts about the interesting architecture choices made by the lead architect in previous HN posts. Along with these, it was the days of "move fast and break things", so developers got admin access to servers and would develop against live sites. Fun times to keep a web farm online all night long.<p>Partly because it was such a small operation, we heavily instrumented the web servers with PRTG, along with hitting a number of key sites every minute, on each web server. "When XYZRealty goes down, so do all of these other sites!" "We'll put a sensor on XYZRealty."<p>This gave us great data about the health of the servers, including identifying bad apples, and even aiding in performance testing of new modules. We were able to catch memory leaks and processing spikes before they broke our sites. And when 64-bit modules were ready to replace the 32-bit modules, we had baseline data ready to compare and evaluate.<p>Not that this won't scale - quite the contrary. Though it creates some data and requires dedication to maintain.