I'm just going to call this out as bullshit. This isn't YOLOv5. I doubt they even did a proper comparison between their model and YOLOv4.<p>Someone asked it to not be called YOLOv5 and their response was just awful [1]. They also blew off a request to publish a blog/paper detailing the network [2].<p>I filed a ticket to get to the bottom of this with the creators of YOLOv4: <a href="https://github.com/AlexeyAB/darknet/issues/5920" rel="nofollow">https://github.com/AlexeyAB/darknet/issues/5920</a><p>[1] <a href="https://github.com/ultralytics/yolov5/issues/2" rel="nofollow">https://github.com/ultralytics/yolov5/issues/2</a><p>[2] <a href="https://github.com/ultralytics/yolov5/issues/4" rel="nofollow">https://github.com/ultralytics/yolov5/issues/4</a>
I welcome forward progress in the field, but something about this doesn't sit right with me. The authors have an unpublished/unreviewed set of results and they're already co-opting the YOLO name (without the original author) for it and all of this to promote a company? I guess this was inevitable when there's so much money in ML but it definitely feels against the spirit of the academic research community that they're building upon.
We made a site that lets you collaboratively tag a bunch of images, called tagpls.com. For example, users decided to re-tag imagenet for fun: <a href="https://twitter.com/theshawwn/status/1262535747975868418" rel="nofollow">https://twitter.com/theshawwn/status/1262535747975868418</a><p>And the tags ended up being hilarious: <a href="https://pbs.twimg.com/media/EYXRzDAUwAMjXIG?format=jpg&name=large" rel="nofollow">https://pbs.twimg.com/media/EYXRzDAUwAMjXIG?format=jpg&name=...</a><p>(I'm particularly fond of <a href="https://i.imgur.com/ZMz2yUc.png" rel="nofollow">https://i.imgur.com/ZMz2yUc.png</a>)<p>The data is freely available via API: <a href="https://www.tagpls.com/tags/imagenet2012validation.json" rel="nofollow">https://www.tagpls.com/tags/imagenet2012validation.json</a><p>It exports the data in yolo format (e.g. it has coordinates in yolo's [0..1] range), so it's straightforward to spit it out to disk and start a yolo training run on it.<p>Gwern recently used tagpls to train an anime hand detector model: <a href="https://www.reddit.com/r/AnimeResearch/comments/gmcdkw/help_build_an_anime_hand_detector_by_tagging/" rel="nofollow">https://www.reddit.com/r/AnimeResearch/comments/gmcdkw/help_...</a><p>People seem willing to tag things for free, mostly for the novelty of it.<p>The NSFW tags ended up being shockingly high quality, especially in certain niches: <a href="https://twitter.com/theshawwn/status/1270624312769130498" rel="nofollow">https://twitter.com/theshawwn/status/1270624312769130498</a><p>I don't think we could've paid human labelers to create tags that thorough or accurate.<p>All the tags for all experiments can be grabbed via <a href="https://www.tagpls.com/tags.json" rel="nofollow">https://www.tagpls.com/tags.json</a>, so over time we hope the site will become more and more valuable to the ML community.<p>tagpls went from 50 users to 2,096 in the past three weeks. The database size also went from 200KB a few weeks ago to 1MB a week ago and 2MB today. I don't know why it's becoming popular, but it seems to be.
Has anyone (beyond maybe self-driving software) tried using object tagging as a way to start introducing physics into a scene? E.g. human and bicycle have same motion vector, increases likelihood that human is riding bicycle. Bicycle and human have size and weight ranges that could be used to plot trajectory. Bicycles riding in a straight line and trees both provide some cues as to the gravity vector in the scene. Etc. etc.<p>Seems like the camera motion is probably already solved with optical flow/photogrammetry stuff, but you might be able to use that to help scale the scene and start filtering your tagging based on geometric likelihood.<p>The idea of hierarchical reference frames (outlined a bit by Jeff Hawkins here <a href="https://www.youtube.com/watch?v=-EVqrDlAqYo&t=3025" rel="nofollow">https://www.youtube.com/watch?v=-EVqrDlAqYo&t=3025</a> ) seems pretty compelling to me for contextualizing scenes to gain comprehension. Particularly if you build a graph from those reference frames and situate models tuned to the type of object at the root of each each frame (vertex). You could use that to help each model learn, too. So if a bike model projects a 'riding' edge towards the 'person' model, there wouldn't likely be much learning. e.g. [Person]-(rides)->[Bike] would have likely been encountered already.<p>However if the [Bike] projects the (rides) edge towards the [Capuchin] sitting in the seat, the [Capuchin] model might learn that capuchins can (ride) and furthermore they can (ride) a [Bike].
There seems to be an unfair comparison between the various network architectures. The reported speed and accuracy improvements should be taken with a bit of scepticism for two reasons.<p>* This is the first yolo implemented in Pytorch. Pytorch is the fastest ml framework around, so some of YOLOv5's speed improvements may be attributed to the platform it was implemented on rather than actual scientific advances. Previous yolos were implemented using darknet, and EfficientDet is implemented in TensorFlow. It would be necessary to train them all on the same platform for a fair speed comparison.<p>* EfficientDet was trained on the 90-class COCO challenge (1), while YOLOv5 was trained on 80 classes (2).<p>[1] <a href="https://github.com/ultralytics/yolov5/blob/master/data/coco.yaml" rel="nofollow">https://github.com/ultralytics/yolov5/blob/master/data/coco....</a><p>[2] <a href="https://github.com/google/automl/blob/master/efficientdet/inference.py#L42" rel="nofollow">https://github.com/google/automl/blob/master/efficientdet/in...</a>
EfficientDet was open sourced March 18 [1], YOLOv4 came out April 23 [2], and now YOLOv5 is out only 48 days later.<p>In our initial look, YOLOv5 is 180% faster, 88% smaller, similarly accurate, and easier to use (native to PyTorch rather thank Darknet) than YOLOv4.<p>[1] <a href="https://venturebeat.com/2020/03/18/google-ai-open-sources-efficientdet-for-state-of-the-art-object-detection/" rel="nofollow">https://venturebeat.com/2020/03/18/google-ai-open-sources-ef...</a>
[2] <a href="https://arxiv.org/abs/2004.10934" rel="nofollow">https://arxiv.org/abs/2004.10934</a>
<i>In February 2020, PJ Reddie noted he would discontinue research in computer vision.</i><p>He actually stopped working on it because of ethical concerns. I'm inspired that he made this principled choice despite being quite successful in this field.<p><a href="https://syncedreview.com/2020/02/24/yolo-creator-says-he-stopped-cv-research-due-to-ethical-concerns/" rel="nofollow">https://syncedreview.com/2020/02/24/yolo-creator-says-he-sto...</a>
> In February 2020, PJ Reddie noted he would discontinue research in computer vision.<p>It would be fair to state also why he chose to discontinue developing YOLO, as it is relevant.
Two interesting links from the article:<p>1. How to train YOLOv5: <a href="https://blog.roboflow.ai/how-to-train-yolov5-on-a-custom-dataset/" rel="nofollow">https://blog.roboflow.ai/how-to-train-yolov5-on-a-custom-dat...</a><p>2. Comparing various YOLO versions <a href="https://yolov5.com/" rel="nofollow">https://yolov5.com/</a>
Latency is measured for batch=32 and divided by 32? This means that 1 batch will be processed in 500 milliseconds.
I have never seen a more fake comparison.
Why benchmark using 32-bit FP on a V100? That means it’s not using tensor cores, which is a shame since they were built for this purpose.
There’s no reason not to benchmark using FP16 here.
I really like the work done by AlexAB on darknet YOLOv4 and the original author Joseph Radmon with YOLOv3. These guys need a lot more respect than any other version of YOLO.
This is not the first time something is fishy. Back in the early stages of the repo. They were advertising on the front page that they are achieving similar MAP to the original C++ version. But only to be found out they haven't train it on COCO dataset and test it.
I am very interested on loading YOLO into a Raspberry Pi + Coral.ai, anyone knows a good tutorial on how to get started? I tried before and with Darknet it was not easy at all, but now with pytorch there seem to be ways of loading that into Coral. I am familiar with Raspberry Pi dev, but not much with ML or TPUs, so I think it'd be mostly a tutorial on bridging the different technologies.<p>(might need to wait a couple of months since this was just released)
If anyone's interested in the direct GitHub link to the repository: <a href="https://github.com/ultralytics/yolov5" rel="nofollow">https://github.com/ultralytics/yolov5</a>
Just recently IBM announced with a loud PR move that the company is getting out of the face recognition business. Guess what? Wall Street doesn't want to keep subsidizing IBM's subpar face recognition technology when open source and Google solutions are pushing the state of the art.