(Partly copied from <a href="https://news.ycombinator.com/item?id=34640251" rel="nofollow">https://news.ycombinator.com/item?id=34640251</a>.)<p>On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.<p>It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).<p>Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: <a href="https://www-i6.informatik.rwth-aachen.de/publications/download/1223/Zeyer--2022.pdf" rel="nofollow">https://www-i6.informatik.rwth-aachen.de/publications/downlo...</a><p>Diffusion models is also another recent different kind of model.<p>Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:<p>There is CLIP to combine text and image modalities.<p>There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.<p>And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.