I wonder if overlapping the patches would improve accuracy further as a way to kind of anti alias the data learned / inferred. In other words, if position 0 is 0,0 - 16,16 and position 1 is 16,0 - 32,16 instead we used 12,0-28,16 for position 1 where it overlaps 4 pixels of the previous position. You’d have more patches / it would be more expensive compute wise, but it might dealias any artificial aliasing that the patches create during both training and inference.