A fight I've had over and over again with folks is that I think chunking is more fundamental to "using multiple cores to get work done faster" than most of what people think is fundamental.<p>Even though it looks like an optimization, it's something you have to address no matter what methods you use to control execution. This it makes sense to plan for chunking at the very start because it is absolutely predictable that you won't get a real speedup unless your workload is already chunked.