The key thing seems not to be the specific algorithm, but the idea of using images obtained during performance for training - an algorithm that can do that. An early prototype algorithm, with lots of room for tweaking - and there are likely radically different learning algorithms, as yet untried or undiscovered, that work better. It seems that in the past, performance images has been religiously separated from training image.<p>It reminds me of early approaches for robot walking, which tried to plan everything out, and more recent approaches of incorporating feedback - which turned out to be much simpler and work better. Sort of waterfall vs. agile.<p>It seems a tad unreliable (his "mouse pointer" was lost a few times while still on screen), but this is still a prototype. It's really impressive how the panda was tracked in 360 orientations - probably helped by the distinctive colouring.<p>New input devices (this, kinect, multi-touch) and applications that can really use them, may be a main source of disruptive innovation in computers for the next decade or two.
This is massively ground breaking. You'll get it if you've used motion tracking on several game interfaces and had to make perfectly white backgrounds with bright lights to make it work. This is incredibly accurate - really game changing stuff.
As this doesn't seem like an April fools joke (some of the papers were published last year :-)) its interesting to think about it in the context of what it might change. That being said I don't doubt for a minute that the university has locked up as much of the technology as possible in patents but that is another story. We can speculate about what it will be like in 20 years when people can do this without infringing :-)<p>Clearly it could be applied immediately to robotic manufacturing. Tracking parts, understanding their orientation, and manipulating them all get easier when its 'cheap' to add additional tracking sensors.<p>Three systems sharing data (front, side, top) would give some very good expressive options for motion based UIs or control.<p>Depending on how well the computational load can be reduced to hardware small systems could provide for head mounted tracking systems. (see CMUCam [1] for small)<p>The training aspect seems to be a weak link, in that some applications would need to have the camera 'discover' what to track and then track it.<p>A number of very expensive object tracking systems used by law enforcement and the military might get a bit cheaper.<p>Photographers might get a mode where they can specify 'take the picture when this thing is centered in the frame' for sports and other high speed activities.<p>Very nice piece of work.<p>[1] <a href="http://www.cs.cmu.edu/~cmucam/" rel="nofollow">http://www.cs.cmu.edu/~cmucam/</a>
Interesting that TFA mentions "Minority Report-like interfaces" several times when: 1.) The Minority Report interface is the canonical example of a UI that is very impressive visually, and is beautifully mediagenic; but is hideously fatiguing and impractical in a real world scenario. (Hold your hand out at arm's length. Okay, now hold that pose for eight hours.) 2.) The MR UI has actually been commercialized, and has entirely failed to take the world by storm.<p>Also, computer vision demos are trivially easy to fake, and it's even easier to make an impressive demo <i>video</i>. You can have the guy who invented it spend a couple hours in front of the camera trying it over and over, then edit it down to three minutes of the system working perfectly. It wouldn't be nearly as impressive when you have an untrained user trying it live, in the field.
From his webpage at Surrey: "We have received hundreds of emails asking for the source code ranging from practitioners, students, researchers up to top companies. The range of proposed projects is exciting and it shows that TLD is ready to push the current technology forward. This shows that we have created something "bigger" than originally expected and therefore we are going to postpone the release of our source code until announced otherwise. Thank you for understanding."<p>Also, the message where he states the source code is under GPL 2.0 dissapeared. Seems that he chose to leave Richard Stallman empty handed and go to the dark side.
<i>With something like this we could have truly “Minority Report” style human-computer interface.</i><p>Actually, the guy who invented the Minority Report interface commercialized it and has been selling it for years. Product website: <a href="http://oblong.com" rel="nofollow">http://oblong.com</a> Edit better video: <a href="http://www.ted.com/talks/john_underkoffler_drive_3d_data_with_a_gesture.html" rel="nofollow">http://www.ted.com/talks/john_underkoffler_drive_3d_data_wit...</a>
Technical details here, with links to relevant papers at the bottom.
<a href="http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html" rel="nofollow">http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html</a>
Ok so the fact that he has produced this himself, using off-the-shelf commodity laptops etc is really great.<p>But this technology doesn't seem new to me - technology already exists for surveillance cameras in police and military helicopters to track an object like a car and keep it in vision as the helicopter turns and maneuvers.<p>Likewise, facial recognition - both statically and within a video stream - isn't new either.<p>Not taking anything away from the guy, but just wondering what it is I'm not getting that is new/amazing with this particular implementation?
The face recognition part was <i>too</i> good for not picking up the face of other people. Or was it detecting just the <i>most</i> similar face?<p>But facial recognition aside, the uses are endless. If it can be brought to the same level Kinect drivers are at, but with <i>finger tracking</i> and <i>no custom hardware</i>, this could change everything.
Bah! I was hoping to download the source (from here: <a href="http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html" rel="nofollow">http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html</a>) and check out his algorithm, but he requires you to email him with his project. If anyone knows how the algorithm works, or where it is described in detail, I'd love to read that!<p>Absolutely amazing stuff!
Every time something like this comes out, I feel us taking a step away from "video camera mounted on a robot where the eyes should be" and a step toward real perception. I always wonder though, if a computer can one day recognize all different types of hands, could it draw a new one?
World becoming a better place with such code available for public to be built up on and not only to military in homing heads. I guess it is one point for "Make something free that was initially available for pay?" Just like "plenty of fish" doing...
<a href="http://www.vti.mod.gov.rs/vti/lab/e-tv.htm" rel="nofollow">http://www.vti.mod.gov.rs/vti/lab/e-tv.htm</a>
The video where system tracks Roy from IT Crowd sucking his fingers is epic:)
<a href="http://www.youtube.com/user/ekalic2#p/u/2/tKXX3A2WIjs" rel="nofollow">http://www.youtube.com/user/ekalic2#p/u/2/tKXX3A2WIjs</a>
It must be shown what to track. That is, you (or some other external system) define the "object" to be tracked by clicking on a bounding box.<p>A good addition would be an algorithm that automatically delineated "objects" in the visual field, then passed them to Predator.<p>Which raises another question: how many "objects" can Predator simultaneously track (with given horsepower)?
Uhhh... 'Predator?' What's his next project, SkyNet Resource Planning? This seems like an April fools to me. I mean I'm sure he's done work in the area... but the article is dated April 1 and the previous literature didn't mention 'Predator.' I could be wrong, but it seems too advanced, and scary.