Going to take this opportunity to plug my related project [Likely](www.liblikely.org), a DSL for lowering machine learning inference algorithms.<p>One of the projects we've built on top of it is a static compiler for Caffe model files. This allows you to execute Caffe models _without_ a runtime dependency on Caffe libraries. Thus you can target OSes and CPUs not supported by mainline Caffe. If you have commercial interest in this capability please reach out to me.
Forgive my ignorance, but it seems like this is just attempting to take advantage of the optimization done by LLVM, yes?<p>What I would love is a simple way of writing standalone functions that compile into a cross-platform LLVM file that I can call from a variety of other languages on a variety of other systems. In particular, if I train a recurrent network on text data for a chat bot, I want to be able to use that LLVM file + model in a game I release for the PC and for Android without worrying about the NDK/gcc/clang/Windows/OSX build nightmare. The ability to easily and quickly define a model in TensorFlow, write a Python function that takes an array of data, and spits out an array of data would be incredible and would mean that all the work I'm doing for a native Rust library is unneeded.<p>Admittedly, with Bazel I could create a C++ wrapper for the function which loads the library. It's just... that produces a 150mb shared library with all the dependencies and it's also a pain in the ass.
Cool idea, but is this of any benefit?<p>Isn't this essentially what TensorFlow does internally, except it inserts CUDA primitives at the right positions...
Maybe P2P Tensflow as a service would be a neat idea?<p>E.g. I have data, TensorFlow model anybody can bid who can do quickly the cheapest compute power. E.g. EC2 spot, Google Cloud preemptive or some NVIDIA CUDA spare computer.
I'm not sure that LLVM is the correct way to go about this. Don't get wrong, it can be used, but most of the work of the frameworks work on very large tensors/multi-arrays. As such optimization of the computation graph for such arrays, although very similar to standard optimization, has also and some significant differences. I do believe, however, that all frameworks should start using the same graph IR representation and optimize procedure, with potentially having different back ends based on hardware and different front ends based on language. I in fact tried to achieve this some time ago, and is still in progress, but lately have no time to work on it. Still the post is really great.