Summary: Run gdb on your program, hit control-c and run 'thread apply all bt' a few times, to understand where your program is spending time.<p>I've actually used this a lot in the past, it's surprisingly useful!
If you can run your app on Linux, it'll probably run with little trouble on OpenSolaris or FreeBSD (even if it's just a VM). For profiling, dtrace is definitely worth the effort.