Another quick hint: always make things work first then profile and optimize it. This happens when my colleagues were building a stream processing engine that <i>scales up</i>. At first, they start with all those fancy fine-grained locks but it turns out the final result just does not scale at all. They have to retreat to the version with some big locks and then figure out what was happening and how to solve it.
Am I the only one that found that there wasn't really much actual actionable information in here? There's basically no specific information about how they scaled along the axes they established were important