I'll admit I found it a little tricky to orient myself within the Heroic docs. There are pages describing the high-level architecture, the installation process, then each aspect of the configuration has a doc page and an example. These are all fine, but I came away wishing there was also some kind of super-high-level introduction along the line of: "What exactly is this thing, what was the motivation for its development, who is it for, and what can it do?"<p>It turns out these do exist! But as blog postings you need to go searching for...<p>1. Monitoring at Spotify: The story so far [<a href="https://labs.spotify.com/2015/11/16/monitoring-at-spotify-the-story-so-far/" rel="nofollow">https://labs.spotify.com/2015/11/16/monitoring-at-spotify-th...</a>] in which they describe wanting to move from an approach based around the monitoring of discrete hosts, to one where they could be thinking in terms of the health of services across the entire infrastructure. Discusses various design/architecture decisions they took, specifically in terms of supporting <i>alerting</i> and <i>graphing</i> services.<p>2. Monitoring at Spotify: Introducing Heroic [<a href="https://labs.spotify.com/2015/11/17/monitoring-at-spotify-introducing-heroic/" rel="nofollow">https://labs.spotify.com/2015/11/17/monitoring-at-spotify-in...</a>] in which they discuss the "federation" features of Heroic, how it manages the collection of metric data from the hosts, why they used Elasticsearch and how they mitigate its known issues.