Hi all, excited to share our latest work, OK-Robot, which is an open and modular framework to perform navigation and manipulation with a robot assistant in practically any homes without having to teach the robot anything new! You can simply unbox the target robot, install OK-Robot, give it a "scan" (think a 60 second iPhone video), and start asking the robot to move arbitrary things from A to B. We already tested it out in 10 home environments in New York city, and one environment each in Pittsburgh and Fremont.<p>We based everything off of the current best machine learning models, and so things don't quite work perfectly all the time, so we are hoping to build it together with the community! Our code is open: <a href="https://github.com/ok-robot/ok-robot">https://github.com/ok-robot/ok-robot</a> and we have a Discord server for discussion and support: <a href="https://discord.gg/wzzZJxqKYC" rel="nofollow">https://discord.gg/wzzZJxqKYC</a> If you are curious what works and what doesn't work, take a quick look at <a href="https://ok-robot.github.io/#analysis" rel="nofollow">https://ok-robot.github.io/#analysis</a> or read our paper for a detailed analysis: <a href="https://arxiv.org/abs/2401.12202" rel="nofollow">https://arxiv.org/abs/2401.12202</a><p>P.S.: while the code is open the project unfortunately isn't fully open source since one of our dependencies, AnyGrasp, has a closed-source, educational license. Apologize in advance, but we used it since that was the best grasping model we could have access to!<p>Would love to hear more thoughts and feedback on this project!
Robots like this will have a small market until they can handle obstacles. The cat toy that the cat left in the middle of the floor, the papers that an open window blew off the table, the toys the kids left scattered about, the pencil that rolled off the desk while you were away, the dirty laundry you left laying on the floor, the ridge between carpet and hardwood floors, doors left open or closed, and more. That means there may be several tasks that intervene before a primary task can accomplished (move the toys, pick up the papers, pick up the laundry, open the door). Some obstacles will semi-permanently block a wheeled robot, such as cables, things stacked that you don't want moved, furniture, a sleeping pet, stacked unopened packages from the mail, etc. I believe this means general purpose home robots can not have wheels, they must have legs, perhaps more than two legs for stability. It may sound weird but I think the ideal design might be somewhere between a large friendly spider and a dog. It's odd how robotics has mostly fallen into this idea that the world is two dimensional and flat. They've idealized away the really difficult problems of dealing with mobility in a 3D world. Note that everything this robot does involves only planar horizontal surfaces. Basically it looks like a person had to go through the rooms and clean them up for the robot to function. Roomba's have the same problem.
That's very cool. I have almost no experience with robotics, so excuse the silly questions:<p>- How does it know what objects are? Does it use some sort of realtime object classifier neural net? What limitations are there here?<p>- Does the robot know when it can't perform a request? I.e. if you ask it to move a large box or very heavy kettlebell?<p>- How well does it do if the object is hidden or obscured? Does it go looking for it? What if it must move another object to get access to the requested one?
For solving long term tasks like finding things that aren't there, you can turn the annotated scene into a templated description and feed it to a large-enough model trained on interactive fiction.<p>You are standing in a kitchen. Ahead of you to your right there is a large refrigerator with the handle on the right side. There is a set of cabinets to your left with a plate sitting on the counter above them.<p>> get beer<p>You don't see any beer here.<p><< COT: I know that beer is often found in the fridge. I should try opening the refrigerator<p>> open fridge<p>Opening the refrigerator reveals 4 cans of beer.<p>> get beer<p>taken<p>Obviously we're still several years from this working, but it's very exciting to consider. Interactive Fiction narrative fed by real sensors plus chain-of-thought blocks as internal monologue.
The failures analysis is super well done, nice work! Curious what qualifies as hardware failure, e.g. there's 5 trials where the "Realsense gave bad depth", and how that's determined.
I've been watching this project for a while now, great progress!<p>I envision an integration with a mobility aid (eg, a wheelchair) for someone with limited control over their limbs. Imagine a "smart" exoskeleton that can help with otherwise impossible tasks -- it could be a game-changer for so many people.
I very much want a stabilized platform vehicle that I can send point-to-point with a payload on it.<p>So, a gyro-stabilized platform like a segway that I can send back and forth from point A to point B on a not-terrible-but-rough (walking path) route.<p>I have tried to stay abreast of the options in the past and have never seen anything that matches this ... does anyone know if there is anything new that matches this use-case ?<p>(the use-case is a tray of drinks and hors d'oeuvres that needs to go from one part of a property to another without spilling ... needs to be minimally all-terrain)
I know nothing about robotics, but can someone ELI5 why the robot makes so many extraneous movements? E.g. the video that shows it moving Takis from the desk to the nightstand, it approaches the desk, and then the arm mechanism moves all the way down (an unnecessary maneuver), then rises again before reaching the level needed to pick up the Takis.
For a long time, I wanted to use a robot with a gripper to make tea. Is there any 6DOF robot available within a reasonable price of <$1000 to do so ?
why are these general purpose robots always so slow? Intuitively we expect machines to be able to do tasks faster than humans, but even the 5x speed video is much slower than a human could do the task.
It's cool but what's the point for a normal person? Useful for warehouses and manufacturing but I don't see myself ever needing such things