Emergent Tool Use from Multi-Agent Interaction

333 pointsby gdbover 5 years ago

24 comments

Inufuover 5 years ago

Nice visualizations and explanation!You might want to make it clearer that the agents don't actually receive any visual observations, but rather directly the xy positions of all other agents and objects.This also seems very similar to "Capture the Flag: the emergence of complex cooperative agents" (<a href="https://deepmind.com/blog/article/capture-the-flag-science" rel="nofollow">https://deepmind.com/blog/article/capture-the-flag-science</a>)?Regarding the conclusion:> We’ve provided evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale. These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to extremely complex and human-relevant behavior.This has been well established for a while already, e.g. the DeepMind Capture the Flag paper above, AlphaGo discovering the history of Go openings and techniques as it learns from playing itself, AlphaZero doing the same for chess, etc.

评论 #21000376 未加载

doogliusover 5 years ago

The state space here looks pretty small, it seems to me that with so much training it's just a case of brute-force search. When I think of "tool use" in regards to the intelligence of early humans, I imagine something more like [0] where the state space is enormous and it takes a good deal of reasoning and planning to get to a desired result.[0] <a href="https://www.youtube.com/watch?v=BN-34JfUrHY" rel="nofollow">https://www.youtube.com/watch?v=BN-34JfUrHY</a>

评论 #21003210 未加载

SmooLover 5 years ago

Amazing. Very cool to see this sort of emergent behavior.I also very much enjoyed this section:"We propose using a suite of domain-specific intelligence tests that target capabilities we believe agents may eventually acquire. Transfer performance in these settings can act as a quantitative measure of representation quality or skill, and we compare against pretraining with count-based exploration as well as a trained from scratch baseline."Along with the videos, I can't help but get a very 'Portal' vibe from it all. "Thank you for helping us help you help us all." - GLaDOS

评论 #21001159 未加载

haylelover 5 years ago

Looks awesome. I tried coding up a multi-agent system for my CS degree and it was incredibly complicated. I was trying to implement an algorithm I found to give each agent emotions of fear, anger, happiness and sadness in order to change their behaviours... it was way more difficult than I expected but you can read more about it here if you're also interested in this stuff. The 3D graphics in this example are way cooler than my 2D shapes.<a href="https://medium.com/@dshields/working-with-emotional-models-in-an-artificial-life-simulation-c6309a586e55" rel="nofollow">https://medium.com/@dshields/working-with-emotional-models-i...</a>

评论 #21001580 未加载

tlbover 5 years ago

The animations are nice, compared to a default visualization with dots and lines moving around. Was this done just for the public release, or was it worth it to researchers to have an eye-pleasing visualization while doing the experiments?

评论 #20999180 未加载

corey_moncureover 5 years ago

One plausible, perhaps optimal strategy in the second arena is for the hiders to build a shelter around the seekers and lock them in place, circumventing the whole cat and mouse over ramps and ramp surfing (which the seekers would never be able to access). I wonder why this strategy is not arrived at.

评论 #21001762 未加载

评论 #20997959 未加载

lettergramover 5 years ago

Even my work with basic circuits for sea slugs led to “cooperative” behavior:<a href="https://austingwalters.com/modeling-and-building-robotic-sea-slug/" rel="nofollow">https://austingwalters.com/modeling-and-building-robotic-sea...</a>I think sometimes we see what we want to see. Not saying it’s not interesting work, just that it’s less round breaking than you may think.

sebringjover 5 years ago

I'm completely amazed by that. The hint of a simulated world seems so matrix-like as well, imagine some intelligent thing evolving out of that. Wow.

brianpgordonover 5 years ago

This is incredible. The various emergent behaviors are fascinating. I remember being amazed a decade ago by the primitive graphics in artificial life simulators like Polyworld:<a href="https://en.wikipedia.org/wiki/Polyworld" rel="nofollow">https://en.wikipedia.org/wiki/Polyworld</a><a href="https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s" rel="nofollow">https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s</a>It seems that OpenAI has a great little game simulated for their agents to play in. The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!

评论 #20999206 未加载

评论 #21001501 未加载

评论 #20998297 未加载

YeGoblynQueenneover 5 years ago

This is visually very impressive, of course, but what is the significance of this work? I am not very familiar with intelligent agents research so I don't understand to what extent learning cooperative tool use in an adversarial environment (if I understand correctly what is shown) represents an important advancement of the state of the art in intelligent agents research, or not.In any case this is a simulation- so it's basically impossible to take the learned model and use it immediately in a real-world environment with true physics and arbitrary elements, let alone with unrestricted dimensions (the agents in the article are for the most part restricted to a limited play area). So if I understand this correctly the trained model is only good for the specific simulated environment and would not work as well under even slightly different conditions.

rkagererover 5 years ago

I love how the 3D visualization and game selection make their research immediately relatable - right down to the cute little avatars!"We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator"I always suspected to evolve intelligence you need an environment rich in complexity. Intelligence we're familiar with (e.g. humans) evolved in a primordial soup packed with possibilities and building blocks (e.g. elaborate rules of physics, amino acids, etc). It's great to see this concept being explored.It reminds me of Adrian Thompson's experiments in the 90's running generational genetic algorithms on a real FPGA instead of mere simulations [1].After 5000 generations he coaxed out a perfect tone recognizer. He was able to prune 70% of the circuit (lingering remnants of earlier mutations?) to find it still worked with only 32 gates - an unimaginable feat! Engineers were baffled when they reverse-engineered what remained: if I recall correctly, transistors were run outside of saturation mode, and EM effects were being exploited between adjacent components. In short, the system took a bunch of components designed for digital logic but optimized them using the full range of analog quirks they exhibited.More recent attempts to recreate his work have reportedly been hampered by modern FPGA's which make it harder to exploit those effects as they don't allow reconfiguration at the raw wiring level [2].In Thompson's own words:"Evolution has been free to explore the full repertoire of behaviours available from the silicon resources provided, even being able to exploit the subtle interactions between adjacent components that are not directly connected.... A 'primordial soup' of reconfigurable electronic components has been manipulated according to the overall behavior it exhibits"---[1] Paper: <a href="http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=6691182CC83AE8577D7C44EB9D847DA1?doi=10.1.1.50.9691&rep=rep1&type=pdf" rel="nofollow">http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=669...</a>Less technical article: <a href="https://www.damninteresting.com/on-the-origin-of-circuits/" rel="nofollow">https://www.damninteresting.com/on-the-origin-of-circuits/</a>[2] <a href="https://www.reddit.com/r/MachineLearning/comments/2t5ozk/what_ever_happened_with_the_evolutionary/cnxfg1s?utm_source=share&utm_medium=web2x" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/2t5ozk/wha...</a>

breckover 5 years ago

What is the size of these “strategies”, measured in weights,bytes, or whatever measurement you look at?

评论 #20998750 未加载

评论 #20997499 未加载

markkatover 5 years ago

Has any intelligence arisen without multi-agent interaction?Probably belongs in our definition of intelligence.

The_rationalistover 5 years ago

Am I misunderstanding something?Instead of teaching the "AI" intelligent rules or rules for creating rules for maximising their goals. They teach them nothing, which means they have 0 usable high level knowledge. And the "AI" pure bruteforce for finding empirically best solutions for this ridiculously simple universe.How is that advancing research? This is just a showcase of what modern hardware can do, and also a showcase of how far we are from teaching intelligence. My brain understand the semantics of this universe and would have been able to find most strategies without simulating the game more than once in my head. So definitely this is a showcase of how far (bruteforce is like step 0) we (or at least openAI) are from making AGI.

评论 #21000976 未加载

评论 #21003040 未加载

mooneaterover 5 years ago

Finally Auotcirricula gets some love! Discussed in some detail in <a href="https://www.talkrl.com/episodes/natasha-jaques" rel="nofollow">https://www.talkrl.com/episodes/natasha-jaques</a>

评论 #20998708 未加载

homieg33over 5 years ago

I wonder if it’s possible to incorporate a monkey see monkey do aspect to the learning algorithm that could observe human’s playing the game and incorporate that information into its models?

评论 #20998984 未加载

评论 #20999036 未加载

ReDeiPiratiover 5 years ago

Great viz, design & structure! But for the first time, I had the impression that you didn't report anything new or different. All the takeaways of this work were pretty obvious given the last couple of years research. Am I missing anything?

cr0shover 5 years ago

I have a friend who observed similar emergent behavior in an a-life (gene-based from what I understand) simulation he created, in an environment of "tanks in a maze" (or something like that).The "genes" consisted a simplified assembler (run on a VM) that could describe a program the tank would use to control itself - it could sense other tanks within line-of-sight to a certain degree, it could sense walls, it could fire its cannon, move in a particular direction, sense when another tank had a bearing (cannon pointed) on itself, etc.He set up 100 random tanks (with random "genes"/programs) and let the simulation run. Top scorers (who had the most kills) would be used to seed the next "generation", using a form of sexual "mating" and (pseudo-) random mutation. Then that generation would run.He said he ran the simulation for days at a time. One day he noticed something odd. He started to notice that certain tanks had "evolved" the means to "teleport" from location to location on the map. He didn't design this possibility in - what had happened was (he later determined) that a bug he had left in the VM was being exploited to allow the tanks to instantaneously move within their environment. He thought it was interesting, so he left it as-is and let the simulation continue.After a long period of running, my friend then noticed something very odd. Some tanks were "wiggling" their turrets - other tanks would "wiggle" in a similar fashion. After a while all he could deduce was that in some manner, they were communicating with each other, similar to "bee dancing", and starting to form factions against each other......it was at that point he decided things were getting much too strange, and he stopped the experiment.Sadly, he no longer has a copy of this software, but I believe his story, simply because I have seen quite a bit of other code and have worked closely with him on various projects since (as an adult) to know that such a system was well within his capability of creating.At the time, he was probably only 16 or 17 years old, the computer was a 386, and this was sometime in the early 1990s. I believe the software was likely a combination of QuickBasic 4.5 and 8086 assembler running under DOS, as that was his preferred environment at the time.I've often considered recreating the experiment, using today's technology, just to see what would happen (at the time he related this to me, as an adult, he asked me how difficult it would be to make a more physical version of this "game"; I'm still not sure if he meant scale model tanks, or full-sized - knowing him, though, he would have loved to play with the latter).

jpetruccover 5 years ago

As always, crazy interesting stuff coming out of OpenAI!This is the type of stuff that amazes me - I really wish I had more of an opportunity to play with AI/ML in my day to day work.

评论 #20998486 未加载

eiopaover 5 years ago

I dig the fine-tuning tests!Did you end up using this as a way to estimate how "healthy" the agents are, or was this explored after the system was already working well?

fedebehrensover 5 years ago

Does anyone know if there are some accessible GitHub projects that can do something similar to this? Would like to set up a new project with my nephew :)

westurnerover 5 years ago

I, for one, really appreciate the raytracing in these visualizations. I wish for more box surfing examples.

Learyover 5 years ago

Anyone thinks the hiders will learn to box the seekers in entirely before the rounds start?

adamnemecekover 5 years ago

This is just adjoint functors. Pls work out automatic integration. Dual numbers is where the path starts.