Rotating videos using DRM/KMS on the Pi3 isn't trivial. I had to revert to using the VideoCore co-processor and its massive 64x64 register to improve performance to level comparable with the older firmware implementation. This blog post describe why that's necessary and how it works.