It's really silly that they depend on the STL implementations, because I don't believe STL implementations actually implement in-place merges in-place. The actual problem of stable in-place merge is extremely hard, so I was really surprised it would show up in a Dr. Dobbs article.
Anyone have any quick comparisons of this to Batcher's sorting method? Quickly consulting wikipedia, I see that those have gained traction in the GPGU community. I was curious if/when those techniques would start to get use.
While the approximately 10x increase of the parallel in-place merge sort over the non-parallel version shown on the benchmarks may be meaningful, it also shows that parallelism reduces constant factors, not exponents.