I significantly improved the forward and backward pass, as well as the inference of SSIM.<p>Optimized Fused-SSIM (Forward) 2.49ms (Backward) 2.68ms (Inference) 1.43ms
Original Fused-SSIM (Forward) 3.66ms (Backward) 3.52ms (Inference) 2.59ms<p>This is the new sota and probably the fastest implementation out there. I would love to see if someone could improve here and get another factor. More details are found in the repo.