Nothing. The answer is you don't monitor production processes directly, ever, it's a waste of your time and effort. Certainly this sort of foolishness should not be used to page an employee off-hours if that's where you're headed.<p>The only thing you need to monitor is: whether a server answers the network request it was designed to. Outside of that you might optionally want to know whether the disk is full, ram is maxed (thus putting linux into swap) or if the cpu runs too high to cope with losing some servers at peak, but really that's all optional if you're in ec2 and can just spin up more servers on a moment's notice.<p>You can gather all this data for yourself with Newrelic or if you want you can send data to graphite or if you're old-fashioned you can use Icinga in place of Nagios because it keeps history in a database. If the developers want to know about the process for the application they implemented you can put Newrelic on the server for them, and put the system Newrelic thing on there too, just don't pay attention to it or pretend it's important until something breaks.<p>The important catch here, the thing that is critical to this whole line of thinking: you have to have thought things through before you built them, focused on having one service per os and real redundancy throughout the environment, and then critically your kick should be fast enough that if a server has some kind of problem in production you don't fix it you just re-kick it. That means your kick throws the os on there, then triggers salt or ansible or chef to configure every single detail and then triggers a deploy of internally-developed applications. That means you have to test the kick to death before you can rely on it to rebuild you something live. If the problem is recurring you can use immediate tools, jdump or whatever, to get some data, give it to the application's developers, and let them try to recreate it in staging while you go ahead and re-kick the prod server and go back to writing documentation for lesser ops to not read, drinking at your desk, reading hackernews, acting as a cia listening post for cat pictures or whatever else passes the time.