Predator Object Tracking Algorithm

460 pointsby helwrabout 14 years ago

21 comments

6renabout 14 years ago

The key thing seems not to be the specific algorithm, but the idea of using images obtained during performance for training - an algorithm that can do that. An early prototype algorithm, with lots of room for tweaking - and there are likely radically different learning algorithms, as yet untried or undiscovered, that work better. It seems that in the past, performance images has been religiously separated from training image.It reminds me of early approaches for robot walking, which tried to plan everything out, and more recent approaches of incorporating feedback - which turned out to be much simpler and work better. Sort of waterfall vs. agile.It seems a tad unreliable (his "mouse pointer" was lost a few times while still on screen), but this is still a prototype. It's really impressive how the panda was tracked in 360 orientations - probably helped by the distinctive colouring.New input devices (this, kinect, multi-touch) and applications that can really use them, may be a main source of disruptive innovation in computers for the next decade or two.

评论 #2403918 未加载

评论 #2403942 未加载

评论 #2403874 未加载

评论 #2403919 未加载

d2about 14 years ago

This is massively ground breaking. You'll get it if you've used motion tracking on several game interfaces and had to make perfectly white backgrounds with bright lights to make it work. This is incredibly accurate - really game changing stuff.

评论 #2403693 未加载

评论 #2403627 未加载

评论 #2403791 未加载

ChuckMcMabout 14 years ago

As this doesn't seem like an April fools joke (some of the papers were published last year :-)) its interesting to think about it in the context of what it might change. That being said I don't doubt for a minute that the university has locked up as much of the technology as possible in patents but that is another story. We can speculate about what it will be like in 20 years when people can do this without infringing :-)Clearly it could be applied immediately to robotic manufacturing. Tracking parts, understanding their orientation, and manipulating them all get easier when its 'cheap' to add additional tracking sensors.Three systems sharing data (front, side, top) would give some very good expressive options for motion based UIs or control.Depending on how well the computational load can be reduced to hardware small systems could provide for head mounted tracking systems. (see CMUCam [1] for small)The training aspect seems to be a weak link, in that some applications would need to have the camera 'discover' what to track and then track it.A number of very expensive object tracking systems used by law enforcement and the military might get a bit cheaper.Photographers might get a mode where they can specify 'take the picture when this thing is centered in the frame' for sports and other high speed activities.Very nice piece of work.[1] <a href="http://www.cs.cmu.edu/~cmucam/" rel="nofollow">http://www.cs.cmu.edu/~cmucam/</a>

评论 #2403836 未加载

sbierwagenabout 14 years ago

Interesting that TFA mentions "Minority Report-like interfaces" several times when: 1.) The Minority Report interface is the canonical example of a UI that is very impressive visually, and is beautifully mediagenic; but is hideously fatiguing and impractical in a real world scenario. (Hold your hand out at arm's length. Okay, now hold that pose for eight hours.) 2.) The MR UI has actually been commercialized, and has entirely failed to take the world by storm.Also, computer vision demos are trivially easy to fake, and it's even easier to make an impressive demo video. You can have the guy who invented it spend a couple hours in front of the camera trying it over and over, then edit it down to three minutes of the system working perfectly. It wouldn't be nearly as impressive when you have an untrained user trying it live, in the field.

mrleinadabout 14 years ago

From his webpage at Surrey: "We have received hundreds of emails asking for the source code ranging from practitioners, students, researchers up to top companies. The range of proposed projects is exciting and it shows that TLD is ready to push the current technology forward. This shows that we have created something "bigger" than originally expected and therefore we are going to postpone the release of our source code until announced otherwise. Thank you for understanding."Also, the message where he states the source code is under GPL 2.0 dissapeared. Seems that he chose to leave Richard Stallman empty handed and go to the dark side.

评论 #2409734 未加载

sp332about 14 years ago

With something like this we could have truly “Minority Report” style human-computer interface.Actually, the guy who invented the Minority Report interface commercialized it and has been selling it for years. Product website: <a href="http://oblong.com" rel="nofollow">http://oblong.com</a> Edit better video: <a href="http://www.ted.com/talks/john_underkoffler_drive_3d_data_with_a_gesture.html" rel="nofollow">http://www.ted.com/talks/john_underkoffler_drive_3d_data_wit...</a>

评论 #2403628 未加载

评论 #2403561 未加载

jallmannabout 14 years ago

Technical details here, with links to relevant papers at the bottom. <a href="http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html" rel="nofollow">http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html</a>

评论 #2404231 未加载

dotBenabout 14 years ago

Ok so the fact that he has produced this himself, using off-the-shelf commodity laptops etc is really great.But this technology doesn't seem new to me - technology already exists for surveillance cameras in police and military helicopters to track an object like a car and keep it in vision as the helicopter turns and maneuvers.Likewise, facial recognition - both statically and within a video stream - isn't new either.Not taking anything away from the guy, but just wondering what it is I'm not getting that is new/amazing with this particular implementation?

评论 #2404087 未加载

BoppreHabout 14 years ago

The face recognition part was too good for not picking up the face of other people. Or was it detecting just the most similar face?But facial recognition aside, the uses are endless. If it can be brought to the same level Kinect drivers are at, but with finger tracking and no custom hardware, this could change everything.

评论 #2404224 未加载

pyrhhoabout 14 years ago

Bah! I was hoping to download the source (from here: <a href="http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html" rel="nofollow">http://info.ee.surrey.ac.uk/Personal/Z.Kalal/tld.html</a>) and check out his algorithm, but he requires you to email him with his project. If anyone knows how the algorithm works, or where it is described in detail, I'd love to read that!Absolutely amazing stuff!

评论 #2403505 未加载

评论 #2403858 未加载

评论 #2404339 未加载

donnyg107about 14 years ago

Every time something like this comes out, I feel us taking a step away from "video camera mounted on a robot where the eyes should be" and a step toward real perception. I always wonder though, if a computer can one day recognize all different types of hands, could it draw a new one?

评论 #2405284 未加载

direxorgabout 14 years ago

World becoming a better place with such code available for public to be built up on and not only to military in homing heads. I guess it is one point for "Make something free that was initially available for pay?" Just like "plenty of fish" doing... <a href="http://www.vti.mod.gov.rs/vti/lab/e-tv.htm" rel="nofollow">http://www.vti.mod.gov.rs/vti/lab/e-tv.htm</a>

exitabout 14 years ago

> Can Predator be used to stabilize and navigate a Quadcopter?> That is not straightforward.anyone know why not?

评论 #2403700 未加载

评论 #2405271 未加载

elvirsabout 14 years ago

The video where system tracks Roy from IT Crowd sucking his fingers is epic:) <a href="http://www.youtube.com/user/ekalic2#p/u/2/tKXX3A2WIjs" rel="nofollow">http://www.youtube.com/user/ekalic2#p/u/2/tKXX3A2WIjs</a>

giardiniabout 14 years ago

It must be shown what to track. That is, you (or some other external system) define the "object" to be tracked by clicking on a bounding box.A good addition would be an algorithm that automatically delineated "objects" in the visual field, then passed them to Predator.Which raises another question: how many "objects" can Predator simultaneously track (with given horsepower)?

helwrabout 14 years ago

Here is the code: <a href="https://github.com/abelsson/TLD" rel="nofollow">https://github.com/abelsson/TLD</a>

chopsabout 14 years ago

Wow, this is pretty amazing stuff. I sincerely hope this guy makes a pile of money off this.

mottersabout 14 years ago

This looks impressive. I've written tracking systems previously, so can appreciate that it's not an easy problem to solve.

bossjonesabout 14 years ago

Extremely impressive. Can't wait to see how this is applied to everyday problems. Kudo's to this gentleman.

marcomonteiroabout 14 years ago

This looks awesome! I want to build this into apps for iPad 2.

Tychoabout 14 years ago

Uhhh... 'Predator?' What's his next project, SkyNet Resource Planning? This seems like an April fools to me. I mean I'm sure he's done work in the area... but the article is dated April 1 and the previous literature didn't mention 'Predator.' I could be wrong, but it seems too advanced, and scary.

评论 #2407314 未加载