Author here! First many thanks for upvoting my post.<p>I only recently started delving into how object detection using deep learning worked and was overwhelmed by the amount of prior information (including knowledge about past papers and deep learning networks) one had to have to understand how some of these techniques worked. So I decided to write a post that would help build an intuitive understanding of a given object detection technique (SSD in this case), while not spooking people with math formulas.<p>Would really appreciate some feedback from anyone who has read the post.
Thanks for this! CNN object detection is a tough subject to crack ATM..<p>I've been playing with the tensorflow object detection project for work, but the tutorials are in various states of broken right now.. The COCO models in their zoo have pretty good results detecting much of what they know about in a scene, however the new Open Image Dataset model only hits on a few major scene elements and.. appears to have a footware fetish.<p>I'm simultaneously attempting to train a new model based on resnet101 and the open image dataset, and taking the deep learning course by Andrew NG on coursera trying to build a better understanding of the networks in these models. It's rough going and posts like this are invaluable; thanks for putting it together!
There's also the YOLO ("you only look once") algo for object localization, a little bit faster and with higher accuracy, comparison to SSD inside.<p>Demo (real time detection): <a href="https://pjreddie.com/darknet/yolo/" rel="nofollow">https://pjreddie.com/darknet/yolo/</a>