Think: resnet+imagenet but for videos: https://github.com/moabitcoin/ig65m-pytorch<p>We ported the r(2+1)d model (from CVPR 2018, see https://arxiv.org/abs/1711.11248) and the weights pre-trained by Facebook Research on over 65 million Instagram videos (from CVPR 2019, https://arxiv.org/abs/1905.00561) to PyTorch and released architecture, weights, tools for conversion, and feature extraction example.<p>The official Facebook Research codebase can be found at https://github.com/facebookresearch/vmz These models and pre-trained weights are immensly powerful e.g. for fine-tuning on action recognition tasks or extracting features from 3d data such as videos.<p>We hope the PyTorch models and weights are useful for folks out there and are easier to use and work with compared to the goal driven, caffe2 based, research'y official code base.