Using Genetic Algorithms to Break Things

86 pointsby Argentum01almost 11 years ago

11 comments

jal278almost 11 years ago

As a researcher studying evolutionary algorithms, one thing these algorithms are quite adept at is finding holes in the objective functions you give to them. They are like automated lawyers seeking loopholes.You want to evolve a controller for a robot that walks far, it will find exploits in the physics engine that defy gravity or somehow catapult the robot through the air (this sort of exploit is the bane of those doing research in virtual creatures [1]). You want to evolve a talented player for an atari game like beamrider, and evolution will find an infinite score-loop exploit [2]. It's meeting the letter of the law in maximizing your measure, but it's missing the forest for the trees.This kind of problem has led some in the community to look at driving evolution by other means -- e.g. my dissertation was on evolving neural network controls for robots by choosing the most novel behaviors instead of those that performed the best by the fitness measure [3].[1] <a href="https://archive.org/details/sims_evolved_virtual_creatures_1994" rel="nofollow">https://archive.org/details/sims_evolved_virtual_creatures_1...</a>[2] <a href="http://www.cs.utexas.edu/users/ai-lab/?atari" rel="nofollow">http://www.cs.utexas.edu/users/ai-lab/?atari</a>[3] <a href="http://eplex.cs.ucf.edu/noveltysearch/userspage/" rel="nofollow">http://eplex.cs.ucf.edu/noveltysearch/userspage/</a>

评论 #7795885 未加载

评论 #7795159 未加载

评论 #7795904 未加载

stcredzeroalmost 11 years ago

Thomas Ray, who wrote the A-Life program Tierra found that genetic algorithms were really good at finding bugs in his code and exploiting them. One bug where that happened was a register he forgot to clear between simulated viruses. A virus evolved to exploit the data and use it to attack other viruses.

评论 #7795656 未加载

fizwhizalmost 11 years ago

I'm in complete agreement. A short while back I mused on whether we could leverage genetic algorithms to actually find a single point of failure in the systems we're designing (linked here: <a href="https://news.ycombinator.com/item?id=7660998" rel="nofollow">https://news.ycombinator.com/item?id=7660998</a>). Heck, I can totally see people using genetic algorithms to come up with nuanced unit/functional test input-data just to see what happens (sorta like Haskell's quickCheck but on steroids spanning subsystems). Tired of using a simple-minded tool like jMeter that just hits your service with 1000 concurrent requests that look similar? Let's use GA to simulate how a certain "variety" and "volume"of requests can cause a cascading failure while your app's JVM is undergoing garbage collection. That's a contrived example, but you get what i mean, right? For real though, I wonder how many people are using this on a daily basis to spruce up their test cases (or as you said, rewarding things like colliding with walls). It would be interesting to see how much success they've had with this.

评论 #7795123 未加载

justin66almost 11 years ago

"We will not discuss genetic algorithms further, to discourage you from considering them for your applications."Skiena in The Algorithm Design Manual, at the end of the three-paragraph blurb where he talks about them.

评论 #7795765 未加载

maspwralmost 11 years ago

As another example of using genetic algorithms for breaking things, a friend of mine wrote his dissertation [1] on using genetic algorithms for file format fuzzing. I believe there may even be some source code available on GitHub.[1] <a href="http://trace.tennessee.edu/utk_graddiss/1347/" rel="nofollow">http://trace.tennessee.edu/utk_graddiss/1347/</a>

sreanalmost 11 years ago

A long time ago I had huge interest in genetic algorithms and on other soft optimization techniques. It is what inspired me to take up graduate studies in the first place. But very quickly I got thoroughly disillusioned by the community around it.What I am going to say is going to be very unpopular to the audience of this post. What turned me off and left a bad taste is the tendency for the community to push it as magic, much like what a shady snake oil salesmen would do.Rather than building the theory on analysis, classification and measurable properties and definitions of objective functions for which GA with specific mutation and crossover operators are a good solution, the way of the community is to push it as snake oil. Often taking examples of cooked up functions that arent really important to globally optimize over and totally ignoring speed of optimization with other competing tools.The method of choice of the community seemed to impress the audience, not with how fast and effective the methods are compared to the traditional, but with colorful analogies. Heck, a bead of perspiration rolling down an armpit optimizes a function, that does not mean that is how we should optimize any function on a computer that way.This is really unfortunate, because if one goes back to GA's genesis, classifying and characterizing functions that yield themselves to GA was a big part of David Goldberg's book. There are indeed functions that are very amenable to GA and other such methods. Characterize them mathematically, rather than what seems to me, cook up two adhoc operations name them 'crossover' and 'mutation' and look cool.That is hardly progress.On the other hand if you prove a theorem stating that for this types of problems, if you let GA run for this amount of time you are going to be close to the optimum with high probability. That would be phenomenally useful.At its basic, GA is a parallelized and randomized exploration of the space. Tell me why is that particular way of randomization and parallelization the right thing to do, or for what classes of problems is it the right thing to do. Without this, they are anecdotes.I will give you a specific example of what I think is a better way. Consider the clustering algorithm K-means. Obtaining its global minimum is an NP hard problem and has been a target for GA, ant colony, swarm optimization and what have you. All of them are blown way off the park by simply better initialization of the standard K-means algorithm. For example just initialize it with a set of furthest first points and it will give good results. Not only that, it will provably be very close to the global optimization, but people did not know that in the beginning. Some people had to say to themselves "I am going to find out more about the structure of the problem so that I can find efficient methods". These make progress.Take another example, non-negative matrix factorization: another objective function that is NP hard to optimize. I will make a different point with this example (although the same point could be made with the k-means example). The cases where the simple local algorithm for NMF does not converge to near global optimization are provably uninteresting to begin with. Even if the global optimum was indeed found, it would serve no purpose.Another example global optimization of low rank matrix factorization models for recommender system, same problem again, optimal solution NP hard to obtain. Again simple methods do provably well if you utilize the structure of the problem.Often the difficult function that one is trying to optimize can be entirely circumvented by modeling the phenomena with a better behaved function, model the problem differently so that better techniques apply. This not always possible, but I see people not even trying, apparently it is easier to publish a few papers when you spray it with colorful analogies. An example: protein folding. It is an misguided effort to find the globally optimized shape, nature does not fold proteins optimally.Another example: consider classifiers. Misclassification as a function is the worst nightmare, non differentiable, nonconvex, with huge swaths of zero gradient. So what do people in machine learning do, bound it on top by a well behaved function and optimize that function. They dont stop there. They first validate empirically that optimizing the surrogate has good performance and then prove theoretically that this comes with guarantees on performance.I am more convinced that analyzing important objective functions (that are important to optimize), studying their properties and motivating performant optimizations schemes is a far better contribution than coming up with colorful analogies of genetics, evolution, swarms etc without characterizing precisely what it is that makes it work and when. Demystify the problem and the technique rather than mystify and please compare running times and resources with more traditional optimization.Again, I am not saying that GA, or swarm optimization methods are wrong to use. There are indeed cases where it is exactly the correct thing to use. Consider gene/dna assembly, here GA is indeed the right thing to use. GA works well when the optimal solution looks like assembly of small good solutions. Swarm works well when the optimization function looks like a smooth well behaved function overlayed with a high frequency fluctuation. I wish this is what the community focused on.

评论 #7795643 未加载

评论 #7795217 未加载

评论 #7795884 未加载

评论 #7796453 未加载

评论 #7796194 未加载

Cieplakalmost 11 years ago

This is a great read (handbook of neuroevolution through erlang): <a href="http://www.cabafx.com/trading-ebooks-collection/newpdf/Handbook%20of%20Neuroevolution%20Through%20Erlang.pdf" rel="nofollow">http://www.cabafx.com/trading-ebooks-collection/newpdf/Handb...</a>I highly recommend the hard copy

lalosalmost 11 years ago

Good and short intro to genetic algorithms by coding a tetris ai player <a href="http://luckytoilet.wordpress.com/2011/05/27/coding-a-tetris-ai-using-a-genetic-algorithm/" rel="nofollow">http://luckytoilet.wordpress.com/2011/05/27/coding-a-tetris-...</a>

kriroalmost 11 years ago

I think the best use of GAs is sort of rapid prototyping. Throw them at the problem to get a better feel for the search space then think about it deeper and find the best algorithm you can come up with.With the appropriate disclaimers that there's usually better algorithms I think they are pretty valuable from a pedagogical point of view (actually I'm thinking more of nature inspired algorithms in general not just GAs). It helps if you can relate an algorithm to something in nature because most algorithms are very abstract and hard to grasp. I also think nature inspired algorithms can help people think about problem solving in creative ways (just wander through your garden and see if you can replicate some of the stuff in algorithmic form etc.)

has2k1almost 11 years ago

I like the implied statistical wisdom, i.e. If you fail often, then help us find where we are failing. Rather than, if your passing solutions are brilliant then help us find passing solutions.An aside, for things like unit tests where in most cases the result of the test is binary i.e fail/pass. How do you think about and ultimately represent the fitness criteria.

turnersralmost 11 years ago

This line of thought has a long research history. Check it out at <a href="http://scholar.google.com/scholar?q=fuzzing+machine+learning+vulnerabilities" rel="nofollow">http://scholar.google.com/scholar?q=fuzzing+machine+learning...</a> .