Another quick hint: always make things work first then profile and optimize it. This happens when my colleagues were building a stream processing engine that <i>scales up</i>. At first, they start with all those fancy fine-grained locks but it turns out the final result just does not scale at all. They have to retreat to the version with some big locks and then figure out what was happening and how to solve it.