I'm no expert on PCIe, but its been described to me as a network.<p>PCIe has switches, addresses, and so forth. Very much like IP-addresses, except PCIe operates on a significantly faster level.<p>At its lowest-level, PCIe x1 is a single "lane", a singular stream of zeros-and-ones (with various framing / error correction on top). PCIe x2, x4, x8, and x16 are simply 2x, 4x, 8x, or 16 lanes running in parallel and independently.<p>-------<p>PCIe is a very large and complex protocol however. This "serial" comms can become abstracted into Memory-mapped I/O. Instead of programming at the "packet" level, most PCIe operations are seen as just RAM.<p>> even virtual memory<p>So you understand virtual memory? PCIe abstractions go up to and include the virtual memory system. When your OS sets aside some virtual-memory for PCIe devices, when programs read/write to those memory-addresses, the OS (and PCIe bridge) will translate those RAM reads/writes into PCIe messages.<p>--------<p>I now handwave a few details and note: GPUs do the same thing on their end. GPUs can also have a "virtual memory" that they read/write to, and translates into PCIe messages.<p>This leads to a system called "Shared Virtual Memory" which has become very popular in a lot of GPGPU programming circles. When the CPU (or GPU) read/write to a memory address, it is then automatically copied over to the other device as needed. Caching layers are layered on top to improve the efficiency (Some SVM may exist on the CPU-side, so the GPU will fetch the data and store it in its own local memory / caches, but always rely upon the CPU as the "main owner" of the data. The reverse, GPU-side shared memory, also exists, where the CPU will communicate with the GPU).<p>To coordinate access to RAM properly, the entire set of atomic operations + memory barriers have been added to PCIe 3.0+. So you can perform "compare-and-swap" to shared virtual memory, and read/write to these virtual memory locations in a standardized way across all PCIe devices.<p>PCIe 4.0 and PCIe 5.0 are adding more and more features, making PCIe feel more-and-more like a "shared memory system", akin to cache-coherence strategies that multi-CPU / multi-socket CPUs use to share RAM with each other. In the long term, I expect Future PCIe standards to push the interface even further in this "like a dual-CPU-socket" memory-sharing paradigm.<p>This is great because you can have 2-CPUs + 4 GPUs on one system, and when GPU#2 writes to Address#0xF1235122, the shared-virtual-memory system automatically translates that to its "physical" location (wherever it is), and the lower-level protocols pass the data to the correct location without any assistance from the programmer.<p>This means that a GPU can do things like perform a linked-list traversal (or tree traversal), even if all of the nodes of the tree/list are in CPU#1, CPU#2, GPU#4, and GPU#1. The shared-virtual-memory paradigm just handwaves the details and lets PCIe 3.0 / 4.0 / 5.0 protocols handle the details automatically.