TechEcho

15 comments

LoSboccaccalmost 9 years ago

story time: I tried something like this, putting a nnet output as the transfer weight of another neural net and training the first one feeding in the second net input as input and training it on the second net error, but couldn't train the first network because I didn't know how to derive the transfer function for the back propagation algorithm.so I opted for training the first net using a randomized genetic algorithm and function descent on it, which as an afterthought is dangerously close on how biology kind of work, but it was exceptionally slow.so I split up the training batches, went to the uni computer room and left the job running on every computer by night to collect result by morning. in the morning I'd collect the best genes from each machine, mix them all for another few round of training, select the best in the population and reseed them on all the machines by night.after a week of painstakingly organizing, seeding and collecting results, the network never managed to converge around the problem, but boy it was fun trying! The problem was driving a car around a lap of a track using five "distance from kerb" sensor as input angled at 30deg from each other starting from center.I remember I was inspired by an image recognition company, which was using a training network for training network for motion detection over security cameras, so this approach wasn't exactly novelty even back then (2001ish).anyway, this got me noticed by a lab assistant and got a thesis on how to optimize neural network to run in 4.4bit fixed math for use in extra low power devices. that one worked! too bad nothing ever came out of it.edit: some fixin

评论 #11917049 未加载

评论 #11917137 未加载

评论 #11917778 未加载

tanseyalmost 9 years ago

Just read (skimmed) this paper yesterday actually.Looks interesting-- but there are no timing graphs! It's kind of a strawman argument to say "We can't use Newton's method because it's too slow to calculate the Hessian," and then go and present all your performance graphs in terms of number of iterations.

评论 #11921140 未加载

drmeisteralmost 9 years ago

Neither the paper nor the comments have mentioned this: <a href="https://en.wikipedia.org/wiki/Truncated_Newton_method" rel="nofollow">https://en.wikipedia.org/wiki/Truncated_Newton_method</a>The Truncated Newton method uses an inner solver that only runs for a few iterations to approximate the Hessian. The approximate Hessian is used to approximately solve Newtons equation. I've implemented it and it works very well. When it gets close to the solution the convergence is very fast.I mention it because it sounds similar to what the paper discusses but you use conjugate gradients in the inner solver and Newton's equation in the outer solver.

swehneralmost 9 years ago

The title could have been chosen as "Learn to learn by gradient descent by gradient descent" or "Learning learning by gradient descent by gradient descent"

评论 #11917004 未加载

评论 #11917127 未加载

BucketSortalmost 9 years ago

Can all algorithms then be cast as learning problems and their optimal versions produced this way? Seems like amazing work, but I don't know enough to confirm.

评论 #11917333 未加载

评论 #11918524 未加载

评论 #11917221 未加载

评论 #11916990 未加载

Loicalmost 9 years ago

Very interesting, I am eager to read about the future research on non "simple convex problems". This where it could provide benefits as in the industry we have a lot of them, a lot of domain knowledge to go around the local minima etc. and a robust ML based approach could really help there instead of being obliged to accumulate in our algorithms years of trial and errors.

josephdvivianoalmost 9 years ago

Yo dawg, I heard you like gradient descent, so I put optimizers on your optimizer so you can learn while you learn.

oiuytrewqalmost 9 years ago

No timing results, no comparisons with Nesterov type methods. To all the commenters that have said "this looks promising": this doesn't look promising at all. Why do you think in all the years of people optimizing things with gradient descent no one has tried this? Answer: they have, and it doesn't work.

评论 #11922779 未加载

merrakshalmost 9 years ago

In spite of this, optimization algorithms are still designed by hand.Well, they are tuned automatically. There are derivative-free optimization algorithms that have been designed to tune optimization algorithms on a set of instances.

nathan_f77almost 9 years ago

I was just thinking about this the other day! If machine learning can be applied to almost any problem, then surely it could be applied recursively to optimize itself. I'm glad to see that someone worked on this.

latenightcodingalmost 9 years ago

The first time I read about gradient descent and optimization algorithms, this was the first thing that came to my mind, this looks promising.

ifdefdebugalmost 9 years ago

I actually do hope that all those learning techniques do have limitations and that intelligence cannot be achieved in principle by machine learning. OK, I see the problems with that statement (first of all, define intelligence) but for instance, I hope the so-called "singularity" cannot be reached in principle and if somebody could prove it once and for all, please do so.

评论 #11917159 未加载

评论 #11917328 未加载

评论 #11917897 未加载

penetrarthuralmost 9 years ago

When I just finished Andrew Ng ML course and had to solve a real life problem, this was actually the first thing that came into my mind. Too bad I couldn't formulate the problem then(still can't) and that was basically the end of my ML career.

LukaAlalmost 9 years ago

Next article: "Learning to learn to learn by gradient descent by gradient descent by gradient descent".Then "Learning to learn to learn to learn by gradient descent by gradient descent by gradient descent by gradient descent" and keep going. Turtles all the way down!P.s: I understand the beauty of this article, but I was surprised none get this irony :-)

评论 #11918187 未加载

评论 #11917433 未加载

xianshoualmost 9 years ago

Yo dawg, I heard you liked gradients, so I put some learning in your learning so you can descend while you descend.<a href="http://knowyourmeme.com/memes/xzibit-yo-dawg" rel="nofollow">http://knowyourmeme.com/memes/xzibit-yo-dawg</a>

评论 #11917598 未加载

15 comments

LoSboccaccalmost 9 years ago

评论 #11917049 未加载

评论 #11917137 未加载

评论 #11917778 未加载

tanseyalmost 9 years ago

评论 #11921140 未加载

drmeisteralmost 9 years ago

swehneralmost 9 years ago

The title could have been chosen as "Learn to learn by gradient descent by gradient descent" or "Learning learning by gradient descent by gradient descent"

评论 #11917004 未加载

评论 #11917127 未加载

BucketSortalmost 9 years ago

Can all algorithms then be cast as learning problems and their optimal versions produced this way? Seems like amazing work, but I don't know enough to confirm.

评论 #11917333 未加载

评论 #11918524 未加载

评论 #11917221 未加载

评论 #11916990 未加载

Loicalmost 9 years ago

josephdvivianoalmost 9 years ago

Yo dawg, I heard you like gradient descent, so I put optimizers on your optimizer so you can learn while you learn.

oiuytrewqalmost 9 years ago

评论 #11922779 未加载

merrakshalmost 9 years ago

nathan_f77almost 9 years ago

latenightcodingalmost 9 years ago

The first time I read about gradient descent and optimization algorithms, this was the first thing that came to my mind, this looks promising.

Learning to learn by gradient descent by gradient descent

15 comments

Learning to learn by gradient descent by gradient descent

15 comments