The core concept that you ship computation to data rather than the other way around is what made Google so impressive when it launched. There are lots of algorithms that do well in that model. Back when I was at NetApp I did a design of a system where the "smart storage" essentially labeled blocks with an MD5 hash when you went to store them. That allowed you to rapidly determine if you already had the block stored and could safely toss the one being written[1]. Really fast de-duplication and good storage compression.<p>At Blekko they had taken this concept to the next logical step and built a storage array out of triply replicated blocks (called 'buckets') that were distributed by their hashid. You could then write templated perl code that operated in parallel over hundreds (or thousands) of buckets and giving a composite result. It always surprised me that IBM didn't care about that system when they acquired Blekko, it was pretty cool. If you implemented it in these Samsung drives it would make for a killer data science appliance. That design almost writes itself.<p>Also in the storage space, there was the CMU "Active disk" architecture[2] which was supposed to replace RAID. There was a startup spin-off from this work but I cannot recall its name anymore, sigh.<p>These days it would useful to design a simulator for systems like this and derive a calculus for analyzing their performance with respect to other architectures. Probably a masters thesis and maybe a PhD or two in that work.<p>[1] Yes MD5 hash collisions are a thing but not for identical length documents (aka an 8K block), and yes NetApp got a patent issued for it.<p>[2] <a href="https://www.pdl.cmu.edu/PDL-FTP/Active/ActiveDisksBerkeley98.pdf" rel="nofollow">https://www.pdl.cmu.edu/PDL-FTP/Active/ActiveDisksBerkeley98...</a>
I think putting something like SQLite on the actual storage device could be a super efficient way to directly express your intent to the actual durable storage system and bypass mountains of virtual bullshit.<p>The optimization opportunities are pretty obvious to me. Imagine if SQLite journaling was aware of how long the supercapacitor in the SSD would last, potentially even with real-time monitoring of device variables. You could have your entire WAL sitting in DRAM on the drive as long as it has enough stored energy to flush to NAND upon external power loss.
I've been strongly interested in computational fabrics for at least 15 years... this looks interesting, but very, very locked down.<p>It is my understanding that FPGA vendors have fought the open source community every step of the way. I would hate to see the future of computing locked up in a new spiffy prison.
Storage is starting to get extremely exciting again. The KV SSD's, this, and Intel's Optane are opening up a lot of new avenues for extremely high performance storage.
Sorry if I missed it, but I'm not seeing it: what's the bandwidth here? i.e., the time to read, process and write back the whole contents of the disk (using just the FPGA)?
Can anyone quantify the advantages this yields in terms of latency and bandwidth, compared to plugging a regular SSD into an external FPGA (via PCIe or whatever interface)?
I'm still waiting to become skilled enough or end up invested in a project enough to merit dedicated super fast SSD storage, or some kind of exotic storage appliance!
I wonder how hard it would be to port the server-side code of FoundationDB to one of these devices; architecturally FDB seems well suidted to this (at least until predicates show up) as it is already extremely constrained as to the expectations on the storage nodes; they basically provide just (time-bounded) versioned KV access.
Looks like this is a Xylinx KU15P - not shabby, but about 1/2 the size of the 3-die monstrosities that are in the AWS FPGA instances you can rent for ~$1.50 an hour - so useful for disk stuff closely coupled to the drive, but maybe not as a general compute resource (depending on actual price of course)
They really should extend KVS for this - it's going to be very difficult to leverage if the XSS interface is underneath the filesystem (as shown in the diagram) especially for RDBMS where the database is (usually) a single big flat file as far as the filesystem is concerned.
Reminds me of Micron's Automata Processor: <a href="https://www.cs.virginia.edu/~skadron/Papers/wang_APoverview_CODES16.pdf" rel="nofollow">https://www.cs.virginia.edu/~skadron/Papers/wang_APoverview_...</a>
So one can look at the chip on beside the storage as an cpu offload built inside the drive, instead of a coprocessor on the motherboard. I’m not seeing a huge use case here except the narrowest of uses like decryption, compression.
So no mention of syCL support... Only offering ~C in 2020 is an insult to computer science.<p>Unrelated: when will Nvidia allow to seamlessly offload Java or another GC based language to the GPU?
<a href="https://developer.nvidia.com/blog/grcuda-a-polyglot-language-binding-for-cuda-in-graalvm/" rel="nofollow">https://developer.nvidia.com/blog/grcuda-a-polyglot-language...</a>
GrCuda seems promising but it would only allow interoperability with Java on the CPU, not offload Java to the GPU, right?
Such advances would make gpu computing order of magnitudes more developper friendly and therefore much more mainstream.