A long time ago I had huge interest in genetic algorithms and on other soft optimization techniques. It is what inspired me to take up graduate studies in the first place. But very quickly I got thoroughly disillusioned by the community around it.<p>What I am going to say is going to be very unpopular to the audience of this post. What turned me off and left a bad taste is the tendency for the community to push it as magic, much like what a shady snake oil salesmen would do.<p>Rather than building the theory on analysis, classification and measurable properties and definitions of objective functions for which GA with specific mutation and crossover operators are a good solution, the way of the community is to push it as snake oil. Often taking examples of cooked up functions that arent really important to globally optimize over and totally ignoring speed of optimization with other competing tools.<p>The method of choice of the community seemed to impress the audience, not with how fast and effective the methods are compared to the traditional, but with colorful analogies. Heck, a bead of perspiration rolling down an armpit optimizes a function, that does not mean that is how we should optimize any function on a computer that way.<p>This is really unfortunate, because if one goes back to GA's genesis, <i>classifying and characterizing functions that yield themselves to GA was a big part of David Goldberg's book</i>. There are indeed functions that are very amenable to GA and other such methods. Characterize them mathematically, rather than what seems to me, cook up two adhoc operations name them 'crossover' and 'mutation' and look cool.<p>That is hardly progress.<p><i>On the other hand if you prove a theorem stating that for this types of problems, if you let GA run for this amount of time you are going to be close to the optimum with high probability. That would be phenomenally useful.</i><p><i>At its basic, GA is a parallelized and randomized exploration of the space. Tell me why is that particular way of randomization and parallelization the right thing to do, or for what classes of problems is it the right thing to do. Without this, they are anecdotes.</i><p>I will give you a specific example of what I think is a better way. Consider the clustering algorithm K-means. Obtaining its global minimum is an NP hard problem and has been a target for GA, ant colony, swarm optimization and what have you. All of them are blown way off the park by simply better initialization of the standard K-means algorithm. For example just initialize it with a set of furthest first points and it will give good results. Not only that, it will provably be very close to the global optimization, but people did not know that in the beginning. Some people had to say to themselves "I am going to find out more about the structure of the problem so that I can find efficient methods". These make progress.<p>Take another example, non-negative matrix factorization: another objective function that is NP hard to optimize. I will make a different point with this example (although the same point could be made with the k-means example). The cases where the simple local algorithm for NMF does not converge to near global optimization are provably uninteresting to begin with. Even if the global optimum was indeed found, it would serve no purpose.<p>Another example global optimization of low rank matrix factorization models for recommender system, same problem again, optimal solution NP hard to obtain. Again simple methods do provably well if you utilize the structure of the problem.<p>Often the difficult function that one is trying to optimize can be entirely circumvented by modeling the phenomena with a better behaved function, model the problem differently so that better techniques apply. This not always possible, but I see people not even trying, apparently it is easier to publish a few papers when you spray it with colorful analogies. An example: protein folding. It is an misguided effort to find the globally optimized shape, nature does not fold proteins optimally.<p>Another example: consider classifiers. Misclassification as a function is the worst nightmare, non differentiable, nonconvex, with huge swaths of zero gradient. So what do people in machine learning do, bound it on top by a well behaved function and optimize <i>that</i> function. They dont stop there. They first validate empirically that optimizing the surrogate has good performance and then prove theoretically that this comes with guarantees on performance.<p>I am more convinced that analyzing important objective functions (that are important to optimize), studying their properties and motivating performant optimizations schemes is a far better contribution than coming up with colorful analogies of genetics, evolution, swarms etc without characterizing precisely what it is that makes it work and when. Demystify the problem and the technique rather than mystify and please compare running times and resources with more traditional optimization.<p>Again, I am not saying that GA, or swarm optimization methods are wrong to use. There are indeed cases where it is exactly the correct thing to use. Consider gene/dna assembly, here GA is indeed the right thing to use. GA works well when the optimal solution looks like assembly of small good solutions. Swarm works well when the optimization function looks like a smooth well behaved function overlayed with a high frequency fluctuation. I wish this is what the community focused on.