I am currently working on my master's degree on computer science and studying on this exact topic.<p>In order to measure core-to-core latency, we should also learn how the cache coherence works on Intel. I am currently experimenting with microbenchmarks on Skylake microarchitecture. Due to the scalability issues with ring interconnect on CPU dies in previous models, Intel opted for 2D mesh interconnect microarchitecture in recent years. In this microarchitecture, CPU die is split into tiles each accommodating cores, caches, CHA, snoop filter etc. I want to emphasize the role of CHA here. Each CHA is responsible for managing coherence of a portion of the addresses. If a core tries to fetch a variable that is not in its L1D or L2 cache, the CHA managing the coherence of the address of the variable being fetched will be queried to learn whereabouts of the variable. If the data is on the die, the core currently owning the variable will be told to forward that variable to the requesting core. So, even though the cores that communicate with each other are physically contiguous, the location of the CHA that manages the coherence of the variable they will pass back and forth also is important due to cache coherence mechanism.<p>Related links:<p><a href="https://gac.udc.es/~gabriel/files/DAC19-preprint.pdf" rel="nofollow">https://gac.udc.es/~gabriel/files/DAC19-preprint.pdf</a><p><a href="https://par.nsf.gov/servlets/purl/10278043" rel="nofollow">https://par.nsf.gov/servlets/purl/10278043</a>