Recently I had a chance to listen to a set of talks powering Waymo Technology. I think the average academic roboticist will be shocked by the complete lack of end to end deep learning models or even large models powering Waymo. It’s interesting to me that the only working self driving car on the market right now, basically has painstakingly listed every possible road obstacle, has coded every possible driving logic to it, and manually addressed every edge case. maybe Tesla’s end to end approach will work, and that will be the way moving forward, but the real world seems to provide an almost limitless amount of edge cases that neural networks don’t seem great at handling. In fact the winning approach to humanoids, if Waymo is proven to be the right approach might be listing every possible item a humanoid can see an environment, detecting them and then planning for them.
Surprised that there isn't any explicit discussion of <i>why</i> dexterity is so hard, beyond sensory perception. One of the root causes (IMHO the biggest one) is that modeling contact, ie the static and dynamic friction between two objects, is extremely complicated. There are various modeling strategies but their results are highly sensitive to various tuning parameters which makes it very hard to learn in simulation. From what I remember, the OpenAI Rubik's Cube solver basically learned across a giant set of worlds of many different possible tuning parameters for the contact models and was able to generalize okay to the real world, in various situations.<p>It seems most likely that this sort of boring domain randomization will be what works, or works well enough, for solving contact in this generation of robotics, but it would be much more exciting if someone figures out a better way to learn contact models (or a latent representation of them) in real time.
For some perspective we have not yet scaled robot training. The amount of data that Pi is using to train their impressively capable robots is in the range of thousands of hours of data. In contrast language models are trained over trillions of tokens comprising the entirety of human knowledge. So if you're saying things like "this still seems hard" just remember we have yet to hit this with the data hammer. Simulation is proving a great way to augment / bootstrap robot dexterity but it still pales in comparison to data in the real world. So, as the author points out, we may get capability scaling like Waymo where one company painstakingly collects real data over a decade, but we may also see the rapid progress in simulators and simulator <i>speed</i> overtake for practical household / industrial tasks. My bet is on the latter.
You need feedback. I started with industrial robotics in the 90s and then having done a bunch of CNC and motion control: positioning is easy. The big problem to solve is enabling the robot to feel what it's doing and understand how it relates to the coordinate space. That's why we're dexterous, we can close our eyes and feel our hands in 3D space instead of just knowing a position in some coordinate system. We can put on a pair of gloves without looking by feel alone. I picture a robot arm similar to when you arm goes numb from sleeping on it. You can see it but it's dead. That's how a robot feels.
There are half a dozen successful commercially available surgical robot products out there. None try to mimic a surgeon's hands.<p>Even if biomimicry turns out to be a useful strategy in designing general purpose robots, I would bet against humans being the right shape to mimic. And that's assuming general purpose robots will ever be more useful than robots designed or configured for specific tasks.
Its because after they saw how big a suckers everyone is for "AI" of course they can sell dumbasses a $60k vaguely human shaped thing that still wont be able to do laundrey or dishes or answer the door or screw in a screw or step over a puppy
Do these challenges apply to surgical robots? There's a lot of interest in essentially creating automated Davincis, for which there is a great deal of training data and for which the robots are prepositioned.<p>Maybe all this setup means that completing surgical tasks doesn't counter as dexterity.
Just today noticed without looking that I can tell from feel that there are 2 objects in a bag instead of one tells me we have likely 1000x different type of sensor and w we combine them all to form a meaning, and dexterity goes hand in hand with it