story time: I tried something like this, putting a nnet output as the transfer weight of another neural net and training the first one feeding in the second net input as input and training it on the second net error, but couldn't train the first network because I didn't know how to derive the transfer function for the back propagation algorithm.<p>so I opted for training the first net using a randomized genetic algorithm and function descent on it, which as an afterthought is dangerously close on how biology kind of work, but it was exceptionally slow.<p>so I split up the training batches, went to the uni computer room and left the job running on every computer by night to collect result by morning. in the morning I'd collect the best genes from each machine, mix them all for another few round of training, select the best in the population and reseed them on all the machines by night.<p>after a week of painstakingly organizing, seeding and collecting results, the network never managed to converge around the problem, but boy it was fun trying! The problem was driving a car around a lap of a track using five "distance from kerb" sensor as input angled at 30deg from each other starting from center.<p>I remember I was inspired by an image recognition company, which was using a training network for training network for motion detection over security cameras, so this approach wasn't exactly novelty even back then (2001ish).<p>anyway, this got me noticed by a lab assistant and got a thesis on how to optimize neural network to run in 4.4bit fixed math for use in extra low power devices. that one worked! too bad nothing ever came out of it.<p>edit: some fixin