What ML papers do you think someone who has a some experience - but would still consider themselves relatively inexperienced in ML, could implement?<p>ideally, a list of papers that could take 2-5 hours each and a few hundred lines of code?
Great question!<p>I seem to be in a similar situation as an experienced software engineer who has jumped into the deep end of ML. It seems most resources either abstract away too much detail or too little. For example, building a toy example that just calls gensim.word2vec doesn't help me transfer that knowledge to other use cases. Yet on the other extreme, most research papers are impenetrable walls of math that obscure the forest for the trees.<p>Thus far, I would also recommend Andrej Karpathy's Zero to Hero course (<a href="https://karpathy.ai/zero-to-hero.html" rel="nofollow">https://karpathy.ai/zero-to-hero.html</a>). He assumes a high level of programming knowledge but demystifies the ML side.<p>--<p>P.S.
If anyone is, by chance, interested in helping chip away at the literacy crisis (e.g., 40% of US 4th graders can't read even at a basic level), I would love to find a collaborator for evaluating the practical application of results from the ML fields of cognitive modeling and machine teaching. These seemingly simple ML models offer powerful insight into the neural basis for learning but are explained in the most obtuse ways.
A lot depends on what you're interested in.<p>Some papers that are runnable on a laptop CPU (so long as you stick to small image sizes/tasks):<p>1) Generative Adversarial Networks (<a href="https://arxiv.org/abs/1406.2661" rel="nofollow">https://arxiv.org/abs/1406.2661</a>). Good practice to have a custom training loops, different optimisers and networks etc.<p>2) Neural Style Transfer (<a href="https://arxiv.org/abs/1508.06576" rel="nofollow">https://arxiv.org/abs/1508.06576</a>). Nice to be able to manipulate pretrained networks and intercept intermediate layers.<p>3) Deep Image Prior (<a href="https://arxiv.org/abs/1711.10925" rel="nofollow">https://arxiv.org/abs/1711.10925</a>). Nice low-data exercise in building out an autoencoder.<p>4) Physics Informed Neural Networks (<a href="https://arxiv.org/abs/1711.10561" rel="nofollow">https://arxiv.org/abs/1711.10561</a>). If you're interested scientific applications, this might be fun. It's good exercise in calculating higher order derivatives of neural networks and using these in loss functions.<p>5) Vanilla Policy Gradient (<a href="https://arxiv.org/abs/1604.06778" rel="nofollow">https://arxiv.org/abs/1604.06778</a>) is the easiest reinforcement learning algorithm to implement and can be used as a black-box optimiser in a lot of settings.<p>6) Deep Q Learning (<a href="https://arxiv.org/abs/1312.5602" rel="nofollow">https://arxiv.org/abs/1312.5602</a>) is also not too hard to implement and was the first time I had heard about DeepMind, as well as being a foundational deep reinforcement learning paper .<p>Open AI gym (<a href="https://github.com/openai/gym">https://github.com/openai/gym</a>) would help get started with the latter two.
It's best to choose something you personally find interesting, for example, I'm interested in audio generation, so I'd pick some papers that describe a music/voice generation model or algorithm, but to you it might be something completely different.<p>When you do decide on a paper, take a look at Phil Wang's implementation style: <a href="https://github.com/lucidrains?tab=repositories">https://github.com/lucidrains?tab=repositories</a>, he has hundreds of papers implemented.<p>If you don't already have a GPU machine, you can rent 40GB A100 instance for $1.1/hr or 24GB A10 for $0.6/hr: <a href="https://lambdalabs.com/service/gpu-cloud" rel="nofollow">https://lambdalabs.com/service/gpu-cloud</a>.
> ideally, a list of papers that could take 2-5 hours each and a few hundred lines of code?<p>I think you are severely underestimating the time required, unless you are quite experienced, know exactly what to look for, or the paper is just a slight variation on previous work that you are already familiar with.<p>Even seasoned researchers can easily spend 30+ hours on trying to reproduce a paper, because papers almost never contain all the details that went into the experiments. You are left with a lot of fiddling and iteration. Of course, if you only care about roughly reproducing what the authors did, and don't care about getting the same results, the time can be much shorter. If the code is available that's even better, but looking at it is cheating since wrestling with issues yourself is a big part of the learning process.<p>A few people here mentioned Andrej's lectures, and I also think they are amazing, but they are not a replacement for getting stuck and solving problems yourself. You can easily watch these lectures and think "I get it!" because everything is so well explained, but you'll probably still be stuck when you run into your own problems trying to reproduce papers from scratch. There's no replacement for the experience you gain by struggling :)<p>It's like watching a math lecture and thinking you get it, but then getting stuck at the exercise problems. The real learning happens when you force yourself to struggle through the exercises.
Biggest productivity boosters for me:<p>- devdocs.io for pytorch<p>- conda for packaging w CUDA<p>- einops<p>- tensorboard<p>- huggingface datasets<p>Interesting models/structures:<p>- resnet<p>- unet<p>- transformers<p>- convnext<p>- vision transformers<p>- ddpm<p>- novel optimizers<p>- generative flow nets<p>- “your classifier is secretly an energy-based model and yiu should treat it like one” paper<p>- self-supervision<p>- distance metric learning<p>Places where you can read implementations:<p>- lucidrains’ github<p>- timm computer vision models library<p>- fastai<p>- labml (annotated quite nicely)<p>Biggest foreseeable headaches:<p>- not suuper easy to do test-driven development<p>- data normalization (floating point error, not using eg batchnorm)<p>- sensitivity of model performance to (hyper)params (layer sizes, learning rates, optimizer, etc)<p>- impatience<p>- lack of data<p>I’d also recommend watching Mark Saroufim live code in PyTorch, on YouTube. My 2 cents, you can only get really fast as well as good at this with a lot of experience. A lot of rules-of-thumb have to come together just right for the whole system to work.
2-5 hours for a few hundred lines of tricky math code sounds like way too little. Not to mention, having to read and understand the paper first. Depending on the difficulty of the paper and your level of skill in the field, I'd say implementing a paper should take 20-200 hours.
Andrej Karpathy is currently releasing videos for a course [1] that goes from zero to GPT.<p>1. <a href="https://karpathy.ai/zero-to-hero.html" rel="nofollow">https://karpathy.ai/zero-to-hero.html</a>
Not sure if it's beginner friendly but I found implementing NeRF from scratch a good exercise. Especially since it reveals many details that are not immediately obvious from the paper.
I would recommend diffusion: try starting with Lilian Weng's blog post and writing up the process for yourself. For all it's abilities, the code for DDPM is surprisingly simple.
This is not exactly what you are looking for but you should browse Papers with Code:<p><a href="https://paperswithcode.com/" rel="nofollow">https://paperswithcode.com/</a>
I'd love for someone to do a good quality PyTorch enabled implementation of Sampled AlphaZero/MuZero [1]. RLLib has an AlphaZero, but it doesn't have the parallelized MCTS you really want to have and the "Sampled" part is another twist to it. It does implement a single player variant though, which I needed. This would be amazing for applying MCTS based RL to various hard combinatorial optimization problems. Case in point, AlphaTensor uses their internal implementation of Sampled AlphaZero.<p>An initial implementation might be doable in 5 hours for someone competent and familiar with RLLib's APIs, but could take much longer to really polish.<p>[1]: <a href="https://arxiv.org/abs/2104.06303" rel="nofollow">https://arxiv.org/abs/2104.06303</a>
I have implemented YOLO v1 and train/tested it on synthetic images with geometric forms. Implementing the loss function thought me a lot on how backpropagation really works. I used keras/tf.
Highly recommend this resource for RL<p><a href="https://spinningup.openai.com/en/latest/" rel="nofollow">https://spinningup.openai.com/en/latest/</a>
Just finished assignment 2 of cs224n[1], which has you derive gradients and implement word2vec. I thought it was a pretty good exercise. You could read the glove paper and try implementing that as well.<p>Knowing how to step through backpropagation in a neural network gets you pretty far in conceptual understanding of a lot of architectures. Imo there’s no substitute for writing out the gradients by hand to make sure you get what’s going on, if only in a toy example.<p>[1] <a href="https://web.stanford.edu/class/cs224n/" rel="nofollow">https://web.stanford.edu/class/cs224n/</a>
I have done this a few times now. Alone (e.g. <a href="https://github.com/paulmorio/geo2dr">https://github.com/paulmorio/geo2dr</a>) and in collaboration with others (e.g. <a href="https://github.com/benedekrozemberczki/pytorch_geometric_temporal">https://github.com/benedekrozemberczki/pytorch_geometric_tem...</a>) primarily as a way to learn about the methods I was interested in from a research perspective whilst improving my skills in software engineering. I am still learning.<p>Starting out I would recommend implementing fundamental building blocks within whatever 'subculture' of ML you are interested in whether that be DL, kernel methods, probabilistic models, etc.<p>Let's say you are interested in deep learning methods (as that's something I could at least speak more confidently about). In that case build yourself an MLP layer, then an RNN layer, then a GNN layer, then a CNN layer, and an attention layer along with some full models with those layers on some case studies exhibiting different data modalities (images, graphs, signals). This should give you a feel for the assumptions driving the inductive biases in each layer and what motivates their existence (vs. an MLP). It also gives you the all the building blocks you can then extend to build every other DL layer+model out there. Another reason is that these fundamental building blocks have been implemented many times so you have a reference to look to when you get stuck.<p>On that note: here are some fun GNN papers to implement in order of increasing difficulty (try building using vanilla PyTorch/Jax instead of PyG).
- SGC (from <a href="https://arxiv.org/abs/1902.07153" rel="nofollow">https://arxiv.org/abs/1902.07153</a>)
- GCN (from <a href="https://arxiv.org/abs/1609.02907" rel="nofollow">https://arxiv.org/abs/1609.02907</a>)
- GAT (from <a href="https://arxiv.org/abs/1710.10903" rel="nofollow">https://arxiv.org/abs/1710.10903</a>)<p>After building the basic building blocks these should each take about 2-5 hours (reading paper + implementation). Probably quicker at the end with all this practice. Good luck and remember to have fun!
I enjoyed doing this through the Coursera deep learning specialization:<p><a href="https://www.coursera.org/specializations/deep-learning" rel="nofollow">https://www.coursera.org/specializations/deep-learning</a><p>The lectures take you through each major paper, then you implement the paper in the homework. Much faster than reading the paper yourself.
Hey, feel free to reach out if you’d like to join an NLP project to gain more experience that I’m working on. Will provide mentorship and potentially coauthorship on the publication.
you could maybe write the whole thing in a few hours, but debugging what you wrote to recreate prior results will probably take much longer depending on choice of problem