I've implemented a few different versions of getting traces from processes in python. At my last job I eventually settled on one which wrote out the thread states every few seconds to RabbitMQ. Whenever something was up you could just attach a listener and figure out what's going on.