The core concept that you ship computation to data rather than the other way around is what made Google so impressive when it launched. There are lots of algorithms that do well in that model. Back when I was at NetApp I did a design of a system where the "smart storage" essentially labeled blocks with an MD5 hash when you went to store them. That allowed you to rapidly determine if you already had the block stored and could safely toss the one being written[1]. Really fast de-duplication and good storage compression.<p>At Blekko they had taken this concept to the next logical step and built a storage array out of triply replicated blocks (called 'buckets') that were distributed by their hashid. You could then write templated perl code that operated in parallel over hundreds (or thousands) of buckets and giving a composite result. It always surprised me that IBM didn't care about that system when they acquired Blekko, it was pretty cool. If you implemented it in these Samsung drives it would make for a killer data science appliance. That design almost writes itself.<p>Also in the storage space, there was the CMU "Active disk" architecture[2] which was supposed to replace RAID. There was a startup spin-off from this work but I cannot recall its name anymore, sigh.<p>These days it would useful to design a simulator for systems like this and derive a calculus for analyzing their performance with respect to other architectures. Probably a masters thesis and maybe a PhD or two in that work.<p>[1] Yes MD5 hash collisions are a thing but not for identical length documents (aka an 8K block), and yes NetApp got a patent issued for it.<p>[2] <a href="https://www.pdl.cmu.edu/PDL-FTP/Active/ActiveDisksBerkeley98.pdf" rel="nofollow">https://www.pdl.cmu.edu/PDL-FTP/Active/ActiveDisksBerkeley98...</a>