I was reading the Deep Double Descent paper by OpenAI: <a href="https://arxiv.org/abs/1912.02292" rel="nofollow">https://arxiv.org/abs/1912.02292</a>. The writing is so lucid. Even the structure of the paper is non-conventional. They have a results section before the Related Works section to give sneak peak of what the paper is about. More like telling a story. Even before starting the Introduction there is a self explanatory image spanning the entier two columns.<p>Can you link to more papers that are written in such style and are easy to read if you have the background knowledge?
The paper introducing support vector machines by Cortes & Vapnik was written exceptionally well in my opinion. It tells a part of the story of 60 years of pattern recognition (ML) succinctly from Fisher in 1936 to 1992.<p><a href="https://link.springer.com/content/pdf/10.1007/bf00994018.pdf" rel="nofollow">https://link.springer.com/content/pdf/10.1007/bf00994018.pdf</a>
A Mathematical Theory of Communication by Claude Shannon (<a href="https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf" rel="nofollow">https://people.math.harvard.edu/~ctm/home/text/others/shanno...</a>)
I'm not sure if it's truly a timeless paper, but "Attention is all you need" by Vaswani et al. has been super influential in recent years. Also "Deep Unsupervised Learning using Nonequilibrium Thermodynamics" by Sohl-Dickstein et al. (about Diffusion Models) and "NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis" by Mildenhall et al. were hugely influential to me.