This is really cool! When we were trying to launch the GSPMD feature for PyTorch/XLA at Google, one of our biggest bottlenecks was network overhead, but we didn't really have any robust tools to dig into it and perform root cause analysis. I'm loving the tools I see come out of Trainy.