TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

You Only Look Once: Unified, Real-Time Object Detection

64 pointsby yogrishover 9 years ago

5 comments

mike_hearnover 9 years ago
This is very cool. It looks from the videos like the next step for them is to provide some sort of temporal stability so that detected objects don&#x27;t get temporarily forgotten across frames and so the bounds expand and contract smoothly. It&#x27;s obvious that the detection is being run frame-at-a-time.<p>I also wonder to what extent merging the detection with underlying P-frame information from the video codecs would help. Knowing that a segment of video just moved to the left would mean the detected object could be moved to the left by the same amount, even if it was passing behind another object. Calculating the movement vectors independently seems silly if you can get that data from the underlying video codec itself.
dplarsonover 9 years ago
They named their method &quot;YOLO&quot;…<p>Edit: to add something more &quot;helpful&quot; to this comment, their paper links to a YouTube channel [1] that shows demos of their method, which I think is great.<p>[1] <a href="https:&#x2F;&#x2F;goo.gl&#x2F;bEs6Cj" rel="nofollow">https:&#x2F;&#x2F;goo.gl&#x2F;bEs6Cj</a>
评论 #10825992 未加载
clickokover 9 years ago
This is really cool, even inspiring. Not just because it&#x27;s one of the first examples I&#x27;ve seen of accurate, real-time detection powered by neural nets, but because they&#x27;re getting these results via black magic, basically.<p>The objective function is defined heuristically, and involves about five different sub-objectives (top of page four). Some of the parameters chosen seem to be rough guesses, as does the decision to scale up the images to twice the resolution when moving from classification (the pre-training task) to detection.<p>It seems miraculous that a process of estimating and refinement, guided by experience, can work on tasks where you have no mathematical guarantee that a good solution can be found. Maybe in time we&#x27;ll build the theory that explains just why deep learning works so well, but for now I&#x27;m just kinda awed and impressed every time one of these stories comes out.
评论 #10826201 未加载
bradneubergover 9 years ago
In the paper they use the abbreviation mAP without explaining what it is or providing a reference, such as &quot;Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors&quot;; do folks know what mAP is?
评论 #10825150 未加载
jbottover 9 years ago
One of the authors has additional information posted here: <a href="http:&#x2F;&#x2F;pjreddie.com&#x2F;darknet&#x2F;yolo&#x2F;" rel="nofollow">http:&#x2F;&#x2F;pjreddie.com&#x2F;darknet&#x2F;yolo&#x2F;</a>