I <i>think</i> the sources of the claimed benefits here are 1) the OS can stay out of some reads entirely, because PCIe devices directly answering requests to read/write memory is a thing, and 2) for small reads [or writes, as arcticbull notes below] you don't have to send the whole page between the SSD and the host system, freeing up bandwidth.<p>The gains measured in the paper are using emulation, so, like, they didn't really hack SSD firmware. There's got to be some cost on the controller side to service these new requests. And they seem to be accounting for this, but of course taking away some overheads doesn't take away the underlying limits of the storage medium or controller.<p>A couple things make something like this more interesting in the future: PCIe 4 doubles bandwidth and is supposed to be coming in some AMD chipsets soon (and the PCIe 5 spec has been finalized and, once it's really shipping, provides another doubling), and of course Intel/Micron's new storage medium has zippier read/writes than Flash.<p>I wonder whether arrangements like this over PCIe end up a middle ground between using swap/mmap'ing vs. way pricier setups with nonvolatile DIMMs.