科技回声

13 条评论

Given that PCIe allows data to be piped directly from one device to another without going through the host CPU[1][2], I guess it might make sense to just have the GPU read blocks straight from the NVMe (or even NVMe-of[3]) rather than having the CPU do a lot of work.edit: blind as a bat, says so right in the paper of course:PMem is mapped directly to the GPU, and NVMe memory is accessed via Peer to Peer-DMA (P2PDMA)[1]: <a href="https://nvmexpress.org/wp-content/uploads/Enabling-the-NVMe-CMB-and-PMR-Ecosystem.pdf" rel="nofollow">https://nvmexpress.org/wp-content/uploads/Enabling-the-NVMe-...</a>[2]: <a href="https://lwn.net/Articles/767281/" rel="nofollow">https://lwn.net/Articles/767281/</a>[3]: <a href="https://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabrics.pdf" rel="nofollow">https://www.nvmexpress.org/wp-content/uploads/NVMe_Over_Fabr...</a>

评论 #39872218 未加载

评论 #39882815 未加载

评论 #39873984 未加载

multimind大约 1 年前

A friend of mine used to work for a GPU database startup as an integration engineer. He got frustrated because GPU drivers ( not just AMD but also Nvidia ) are intrinsically unstable and not designed for long flawless runs. If a few bits have a wrong value in a deep neural network or a pixel is wrong in a game, it does not matter much. In databases ( or file systems for that matter ) it does mean everything! It is hard to believe at first, but his former company now offers solutions without GPU acceleration that simply work, but they also lost their USP.

评论 #39874243 未加载

评论 #39877564 未加载

评论 #39873191 未加载

west0n大约 1 年前

According to this paper, GPU4FS is a file system that can run on the GPU and be accessed by applications. Since GPUs cannot make system calls, GPU4FS uses shared video memory (VRAM) and a parallel queue implementation. Applications running on the GPU can utilize GPU4FS after modifying their code, eliminating the need for a CPU-side file system when accessing the file system. The experiments are done on Optane memory.It would be interesting to know if this approach could optimize the performance of training and inference for large models.

评论 #39872810 未加载

评论 #39871500 未加载

molticrystal大约 1 年前

While it is not a 1:1 comparison there has been a driver for windows that allows the creation of a ram drive from vram for NVIDIA cards.>GpuRamDrive>Create a virtual drive backed by GPU RAM.<a href="https://github.com/prsyahmi/GpuRamDrive">https://github.com/prsyahmi/GpuRamDrive</a>Fork with AMD support:<a href="https://github.com/brzz/GpuRamDrive/">https://github.com/brzz/GpuRamDrive/</a>Fork that has fixes and support for other cards and additional features:<a href="https://github.com/Ado77/GpuRamDrive">https://github.com/Ado77/GpuRamDrive</a>

评论 #39877145 未加载

评论 #39876501 未加载

afr0ck大约 1 年前

I didn't fully read the paper, but few questions come into mind.1) How does this work differ from Mark Silberstein's GPUfs from 2014 [1]?2) Does this work assume the storage device is only accessed by the GPU? Otherwise, how do you guarantee consistency when multiple processes can map, read and write the same files? You mention POSIX. POSIX has MAP_SHARED. How is this situation handled?3) Related to (2), on the device level, how do you sync CPU (on an SMP, multiple cores) and GPU accesses?[1] <a href="https://dl.acm.org/doi/10.1145/2553081" rel="nofollow">https://dl.acm.org/doi/10.1145/2553081</a>

评论 #39887834 未加载

yeison大约 1 年前

How to get hired by NVIDIA! If it does work it's a brilliant idea.

KingOfCoders大约 1 年前

Like Microsoft DirectStorage?

评论 #39872181 未加载

_kdave大约 1 年前

I'm glad that research papers don't start with "we've analyzed linux kernel 2.6.18 sources (because this is what we had on our lab machines) and determined that ext3 is the best filesystem for our research purpose and now present you with a novel idea of using high-tech device on that". The paper acknowledges modern features, takes design from other filesystems (mentioned BTRFS and tree structures). Overall the idea is interesting and promising.

ec109685大约 1 年前

Interesting they would discuss system call overhead of opening a file, reading from it and closing it. Seems like in almost all cases the open and close calls would be overwhelmed by the other operations.

评论 #39872127 未加载

评论 #39874835 未加载

hieu229大约 1 年前

I hope GPU files leads to faster database

brcmthrowaway大约 1 年前

Is this implementing a file system using shader code? Thats insaneare shaders turing complete ? ;)

amelius大约 1 年前

A GPU seems overkill when the bottleneck is the I/O.

评论 #39874362 未加载

touisteur大约 1 年前

Now this is all fun, but has anyone managed to make these mechanisms work with Multicast PCIe ? I really need GPUdirect and StorageDirect to support this, until PCIe catches up to today's (or Blackwell's) NVLink ... around PCIe 12?

13 条评论

magicalhippo大约 1 年前

评论 #39872218 未加载

评论 #39882815 未加载

评论 #39873984 未加载

multimind大约 1 年前

评论 #39874243 未加载

评论 #39877564 未加载

评论 #39873191 未加载

west0n大约 1 年前

评论 #39872810 未加载

评论 #39871500 未加载

molticrystal大约 1 年前

评论 #39877145 未加载

评论 #39876501 未加载

afr0ck大约 1 年前

评论 #39887834 未加载

yeison大约 1 年前

How to get hired by NVIDIA! If it does work it's a brilliant idea.

KingOfCoders大约 1 年前

Like Microsoft DirectStorage?

评论 #39872181 未加载

_kdave大约 1 年前

ec109685大约 1 年前

评论 #39872127 未加载

评论 #39874835 未加载

hieu229大约 1 年前

I hope GPU files leads to faster database

brcmthrowaway大约 1 年前

Is this implementing a file system using shader code? Thats insaneare shaders turing complete ? ;)

amelius大约 1 年前

A GPU seems overkill when the bottleneck is the I/O.

评论 #39874362 未加载

touisteur大约 1 年前

Full-scale file system acceleration on GPU [pdf]

13 条评论

Full-scale file system acceleration on GPU [pdf]

13 条评论