A quite bright bachelor's student of mine did something similar for his undergraduate thesis[0]. Conceptually, he represented neural networks as functions from input to output, and connecting two networks required that the output of the first matched the input of the second, and produced a new network. Now, to enable gradient descent or other kinds of optimisation, the networks were not <i>literally</i> functions, but rather pairs of functions for running the network forwards and backwards. While limited in some ways (e.g. no recurrent networks), it was quite convenient, and close in performance to TensorFlow on a single GPU.<p>[0]: <a href="https://futhark-lang.org/student-projects/duc-bsc-thesis.pdf" rel="nofollow">https://futhark-lang.org/student-projects/duc-bsc-thesis.pdf</a>