A rather tangential comment - this paper is an example of how <i>NOT</i> to write an abstract. An abstract is expected to tell me what new piece of knowledge I can learn by reading more. The content of this abstract is only 20% of what a real abstract should be .. the first half of the first sentence is almost all that's needed (could include which archa it beats). The rest of the abstract needs to cover this (perhaps one sentence each) -<p>1. Intro - a note on the overall problem domain - object detection in this case and bit zoomed in to the DL space.
2. Related work - work so far in the domain .. without critizin it.
3. Problem statement - what is the knowledge gap in the related work this paper is talking about.
4. Solution - how did we address the gap.
5. Validation - how do we claim our solution addressed the gap it was intended to address.<p>This paper's abstract covers only the last part and sporadically a bit of 2. What I want to know is this abstract is "what is the new learning in the yolov7 arch?"<p>Perhaps the bigger picture here is that it points to metrics chasing as a proxy for a "research agenda" in the ML community.
Probably the most interesting trick from the paper is using the head as a soft supervisor for earlier layers of the network, with the intuition being that if the earlier layers learn to imitate the higher capacity later layers, it frees up the capacity of the later layers to better learn the residual and provides more dense supervisory signal.
As someone who got only his feet wet with OpenCV like 20 years ago, so basic shape recognition and no AI involved, what read/software, etc. would you suggest to catch up and play with current technology without being inundated by theory that I'm sure I couldn't grasp?
Github repo mentions "teaser: Yolov7-mask" showing segmentation as well. Highly relevant to my interests. Sadly I can't easily discern any other info on this topic.<p>Anyone knows any more, maybe?
> the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher<p>Yikes. It's not clear to me if that's the upper limit on accuracy or a limit imposed by requiring that it run at 30 FPS, but still...yikes.
In YOLOv7, YOLO and v7 don't go well together. No, not at all. YOLO normally means "You Only Live Once", and v7 means it's lived at least six times before this.<p>While the author likely didn't have that intention, that's what came across.<p>Even for YOLO meaning "You Only Look Once" YOLO and v7 do not go together well.