科技回声

9 条评论

Some interesting points from this paper:- All simulated agents use the same neural net with the same weights, albeit with randomized rewards and conditioning vector to allow them to behave as different types of vehicles with different types of aggressiveness. This is like driving in a world where everyone is different copies of you, but some of your copies are in rush while others are patient. This allows backprop to optimize for a sort of global utility across the entire population.- There is no modeling of occlusion effects. Instead, agents are given the state of nearby agents, but corrupted by random noise. In the real world, occluded nearby agents can be extremely close (think about a child running out from behind a parked car). The paper comments on this.> Both Waymax and nuPlan construct observations, maps, and other actors with auto-labeling tools from realworld perception data. This brings occlusion, incorrect or missing traffic-light states, and obstacles revealed at the last moment. Despite the minimalistic noise modeling in GIGAFLOW, the GIGAFLOW policy generalizes zero-shot to these conditions.- The resulting policy simulates agents that are human-like, even though the system has never seen humans drive. This is a great result when one considers other reinforcement learning projects produce extremely high performance agents that humans would consider to be abusive or pathological.

nine_k3 个月前

Can there be "smart toys" for models that help them self-improve in a particularly efficient way?

评论 #42972178 未加载

评论 #42970500 未加载

评论 #42969547 未加载

评论 #42970254 未加载

评论 #42971053 未加载

评论 #42969514 未加载

seaucre3 个月前

This is interesting, and I have always thought this approach worth exploring given the "bitter lesson" in other ML domains, but I think we should be skeptical until we see such models deployed and operating effectively on real-world vehicles.

dhbradshaw3 个月前

Interesting to see this coming out of Apple

mitthrowaway23 个月前

Something about dreams that fascinates me is that I usually am genuinely surprised by events that occur in dreams. I interact with other characters whose motivation I cannot understand and whose actions I cannot fully anticipate. It feels like there's a foreign entity acting as DM.This isn't fake surprise. Sometimes I'll wake up and think, "who on earth were those guys and what were they trying to do? And yet their actions make sense..." or, "who came up with that punchline? It's legitimately funny and I never saw it coming, so it can't have been me..."And yet I know it's all being generated by my own brain somehow. Through some kind of privileged access level.And then I think about the bicameral brain structure. Does our brain have two halves so that it can function in a self-play training mode during sleep? Are each halves of my brain experiencing the same dream from opposite points of view?Apologies for the tangent; this is almost totally unrelated to the article and probably something well known to neuroscience for decades. But still, it fascinates me, and the more we learn about the effectiveness of self-play in AI, the more I wonder.

评论 #42970175 未加载

评论 #42970987 未加载

评论 #42969858 未加载

评论 #42969653 未加载

评论 #42969706 未加载

评论 #42970318 未加载

评论 #42969690 未加载

评论 #42971269 未加载

评论 #42976071 未加载

评论 #42970623 未加载

linux_devil3 个月前

Maybe not directly related , I find genertic algorithms and other optimisation algorithms such as Ant Colony Optimisation algorithms intersecting with this approach of self-play and leading to robust autonomy.

The28thDuck3 个月前

The concept of being able to simulate 42 years of “experience” in one hour seems so foreign to me. Something about it creeps me out.

评论 #42969680 未加载

评论 #42976155 未加载

评论 #42971014 未加载

评论 #42974731 未加载

评论 #42969657 未加载

评论 #42969607 未加载

dang3 个月前

[stub for offtopicness]

评论 #42969814 未加载

评论 #42969402 未加载

surume3 个月前

[flagged]

评论 #43009853 未加载

9 条评论

markisus3 个月前

nine_k3 个月前

Can there be "smart toys" for models that help them self-improve in a particularly efficient way?

评论 #42972178 未加载

评论 #42970500 未加载

评论 #42969547 未加载

评论 #42970254 未加载

评论 #42971053 未加载

评论 #42969514 未加载

seaucre3 个月前

dhbradshaw3 个月前

Interesting to see this coming out of Apple

mitthrowaway23 个月前

评论 #42970175 未加载

评论 #42970987 未加载

评论 #42969858 未加载

评论 #42969653 未加载

评论 #42969706 未加载

评论 #42970318 未加载

评论 #42969690 未加载

评论 #42971269 未加载

评论 #42976071 未加载

评论 #42970623 未加载

linux_devil3 个月前

The28thDuck3 个月前

The concept of being able to simulate 42 years of “experience” in one hour seems so foreign to me. Something about it creeps me out.

Robust autonomy emerges from self-play

9 条评论

Robust autonomy emerges from self-play

9 条评论