TechEcho

4 comments

blinksabout 17 years ago

Yay for automated monitoring software. Nagios (<a href="http://www.nagios.org/" rel="nofollow">http://www.nagios.org/</a>) does this for networks (and is extendable for some other things). At my old job we used Hobbit (<a href="http://hobbitmon.sourceforge.net/" rel="nofollow">http://hobbitmon.sourceforge.net/</a>) to watch our Java server instances (memory usage, etc.). There’s no reason why these monitoring programs couldn’t be used to monitor internal program statistics, as long as those stats were made available.Generally you monitor from your internal network, and then provide some hook for the monitor to get information that’s only accessible from there. (SSH or a limited-access URL, etc.)Monitoring programs are super-powerful and generally complex. Check them out — it’s a good skill to have when working with production software.(I also posted this on the article.)

angstromabout 17 years ago

I've worked with threshold logic like that for collecting and analyzing traffic on telephone switches where an alarm or notification would be generated if the threshold was broken.Personally I would never want to debug something like that using a statistical probability that something might have gone wrong. Better to fail gracefully with something like multiple chains so that when a request chain goes down it gets logged, cleaned up, and recreated.Worst case scenario they get a request timeout warning.

pmoriciabout 17 years ago

This sounds like circuit breakers for software. Instead of an over current condition you've got excessive busynesses.

Tichyabout 17 years ago

Maybe frequent backups would be a better solution?

评论 #163566 未加载

评论 #163559 未加载

Probabilistic Assertions: Crashing When Something Feels Wrong

4 comments

Probabilistic Assertions: Crashing When Something Feels Wrong

4 comments