The premise of this article is that when you train a model all cores need to be on a single cluster in the same physical location. Is that actually the case? I figured these jobs are massively parallelized and could be distributed to compute clusters around the world.