I'm struck by how very dissimilar each successive "match" in the video is. If this is a neural network that makes matches based on video, how on earth is it picking matches from completely different training videos for every subsequent impact? It might happen if the training videos only contained a single impact sample, but that's not the case here, the paper says each training video has 48 actions.<p>I can totally understand if the authors wanted to make sure it doesn't pick the exact same sample multiple times in a row and penalized duplicates, but I don't see any mention of that in the paper, and even if they did, I'd expect to see subsequent matches from the same training video, rather than picks from completely different videos.