While this is a good concept in theory, I'd be skeptical about building on top of such a system. The primary reason is the slowness of R. I built heavy duty data mining systems using a stack of kdb/q and R. In my experience, R, when used for simple clustering algorithms like k-means and k-medoids slowed down my system by nearly 70 times. This is despite running parallelized versions of these algorithms (by means of the SPRINT R package) using mpiexec.<p>IMO, there is a very big gap in this space. There is an urgent need for high performant data inference languages. MATLAB is decent, but is still clunky for my taste. Plus, I prefer the simplicity of a file mapped column oriented database like the one offered by kdb. As KDB is too expensive for me right now, I'm considering building on top of the excellent J language/JDB database stack for my big data needs.
For what it's worth, PostgreSQL had this since 2003. <a href="http://www.joeconway.com/plr/" rel="nofollow">http://www.joeconway.com/plr/</a><p>IMHO scripts running in a database server never work all that well - debugging is a nightmare. At least this has been my experience from trying PG plpython a few years ago.<p>Link to the original announcement email:
<a href="http://www.postgresql.org/message-id/3E514A46.2040604@joeconway.com" rel="nofollow">http://www.postgresql.org/message-id/3E514A46.2040604@joecon...</a>
I'm always reluctant to these kinds of ideas, of executing code on/within my database server.<p>I know it's apparently sandboxed, but that didn't work out too well for ElasticSearch recently: <a href="https://jordan-wright.github.io/blog/2015/03/08/elasticsearch-rce-vulnerability-cve-2015-1427/" rel="nofollow">https://jordan-wright.github.io/blog/2015/03/08/elasticsearc...</a>.