Ask HN: What were the papers on the list Ilya Sutskever gave John Carmack?

396 点作者 alan-stark超过 2 年前

John Carmack's new interview on AI/AGI [1] carries a puzzle:“So I asked Ilya Sutskever, OpenAI’s chief scientist, for a reading list. He gave me a list of like 40 research papers and said, ‘If you really learn all of these, you’ll know 90% of what matters today.’ And I did. I plowed through all those things and it all started sorting out in my head.”What papers do you think were on this list?[1] https://dallasinnovates.com/exclusive-qa-john-carmacks-different-path-to-artificial-general-intelligence/

26 条评论

dang超过 2 年前

Recent and related:John Carmack’s ‘Different Path’ to Artificial General Intelligence - <a href="https://news.ycombinator.com/item?id=34637650" rel="nofollow">https://news.ycombinator.com/item?id=34637650</a> - Feb 2023 (402 comments)

sillysaurusx超过 2 年前

"The email including them got lost to Meta's two-year auto-delete policy by the time I went back to look for it last year. I have a binder with a lot of them printed out, but not all of them."RIP. If it's any consolation, it sounds like the list is at least three years old by now. Which is a long time considering that 2016 is generally regarded as the date of the deep learning revolution.

评论 #34647577 未加载

评论 #34646934 未加载

评论 #34682543 未加载

评论 #34655392 未加载

querez超过 2 年前

A lot of other posts here are biased to recent papers, and papers that had "a big impact", but miss a lot of foundations. I think this reddit post on the most foundational ML papers gives a lot more balanced overview: <a href="https://www.reddit.com/r/MachineLearning/comments/zetvmd/d_if_you_had_to_pick_1020_significant_papers_that/" rel="nofollow">https://www.reddit.com/r/MachineLearning/comments/zetvmd/d_i...</a>

sho_hn超过 2 年前

> "You’ll find people who can wax rhapsodic about the singularity and how everything is going to change with AGI. But if I just look at it and say, if 10 years from now, we have ‘universal remote employees’ that are artificial general intelligences, run on clouds, and people can just dial up and say, ‘I want five Franks today and 10 Amys, and we’re going to deploy them on these jobs,’ and you could just spin up like you can cloud-access computing resources, if you could cloud-access essentially artificial human resources for things like that—that’s the most prosaic, mundane, most banal use of something like this."So, slavery?

评论 #34647012 未加载

评论 #34647430 未加载

评论 #34647140 未加载

评论 #34647020 未加载

评论 #34649449 未加载

optimalsolver超过 2 年前

Carmack says he's pursuing a different path to AGI, then goes straight to the guy at the center of the most saturated area of machine learning (deep learning)?I would've hoped he'd be exploring weirder alternatives off the beaten path. I mean, neural networks might not even be necessary for AGI, but no one at OpenAI is going to tell Carmack that.

评论 #34647015 未加载

评论 #34643572 未加载

评论 #34646070 未加载

评论 #34646218 未加载

评论 #34646776 未加载

评论 #34650068 未加载

评论 #34647086 未加载

评论 #34646824 未加载

chrgy超过 2 年前

From ChatGPT, although personally I think this list is bit old but should be at the 60% mark at the very least Deep Learning:AlexNet (2012) VGGNet (2014) ResNet (2015) GoogleNet (2015) Transformer (2017) Reinforcement Learning:Q-Learning (Watkins & Dayan, 1992) SARSA (R. S. Sutton & Barto, 1998) DQN (Mnih et al., 2013) A3C (Mnih et al., 2016) PPO (Schulman et al., 2017) Natural Language Processing:Word2Vec (Mikolov et al., 2013) GLUE (Wang et al., 2018) ELMo (Peters et al., 2018) GPT (Radford et al., 2018) BERT (Devlin et al., 2019)

评论 #34652467 未加载

ilaksh超过 2 年前

My guess is that multimodal transformers will probably eventually get us most of the way there for general purpose AI.But AGI is one of those very ambiguous terms. For many people it's either an exact digital replica of human behavior that is alive, or something like a God. I think it should also apply to general purpose AI that can do most human tasks in a strictly guided way, although not have other characteristics of humans or animals. For that I think it can be built on advanced multimodal transformer-based architectures.For the other stuff, it's worth giving a passing glance to the fairly extensive amount of research that has been labeled AGI over the last decade or so. It's not really mainstream except maybe the last couple of years because really forward looking people tend to be marginalized including in academia.<a href="https://agi-conf.org" rel="nofollow">https://agi-conf.org</a>Looking forward, my expectation is that things like memristors or other compute-in-memory will become very popular within say 2-5 years (obviously total speculation since there are no products yet that I know of) and they will be vastly more efficient and powerful especially for AI. And there will be algorithms for general purpose AI possibly inspired by transformers or AGI research but tailored to the new particular compute-in-memory systems.

评论 #34645316 未加载

评论 #34647196 未加载

jimmySixDOF超过 2 年前

>90% of what matters todayStrikes me as the kind of thing where that last 10% will need 400 papers

评论 #34643248 未加载

评论 #34643768 未加载

评论 #34643672 未加载

评论 #34646682 未加载

codeviking超过 2 年前

This inspired us to do a little exploration. We used the top cited papers of a few authors to produce a list that might be interesting, and to do some additional analysis. Take a look: <a href="https://github.com/allenai/author-explorer">https://github.com/allenai/author-explorer</a>

hexhowells超过 2 年前

While not all papers, this list contains a lot of important papers, writings, and conversations currently in AI: <a href="https://docs.google.com/document/d/1bEQM1W-1fzSVWNbS4ne5PopB2b7j8zD4Jc3nm4rbK-U/edit" rel="nofollow">https://docs.google.com/document/d/1bEQM1W-1fzSVWNbS4ne5PopB...</a>

评论 #34660212 未加载

albertzeyer超过 2 年前

(Partly copied from <a href="https://news.ycombinator.com/item?id=34640251" rel="nofollow">https://news.ycombinator.com/item?id=34640251</a>.)On models: Obviously, almost everything is Transformer nowadays (Attention is all you need paper). However, I think to get into the field, to get a good overview, you should also look a bit beyond the Transformer. E.g. RNNs/LSTMs are still a must learn, even though Transformers might be better in many tasks. And then all those memory-augmented models, e.g. Neural Turing Machine and follow-ups, are important too.It also helps to know different architectures, such as just language models (GPT), attention-based encoder-decoder (e.g. original Transformer), but then also CTC, hybrid HMM-NN, transducers (RNN-T).Some self-promotion: I think my Phd thesis does a good job on giving an overview on this: <a href="https://www-i6.informatik.rwth-aachen.de/publications/download/1223/Zeyer--2022.pdf" rel="nofollow">https://www-i6.informatik.rwth-aachen.de/publications/downlo...</a>Diffusion models is also another recent different kind of model.Then, a separate topic is the training aspect. Most papers do supervised training, using cross entropy loss to the ground-truth target. However, there are many others:There is CLIP to combine text and image modalities.There is the whole field on unsupervised or self-supervised training methods. Language model training (next label prediction) is one example, but there are others.And then there is the big field on reinforcement learning, which is probably also quite relevant for AGI.

评论 #34642494 未加载

评论 #34645399 未加载

评论 #34646581 未加载

klaussilveira超过 2 年前

Following: <a href="https://twitter.com/u3dcommunity/status/1621524851898089478?s=20" rel="nofollow">https://twitter.com/u3dcommunity/status/1621524851898089478?...</a>

评论 #34643522 未加载

polskibus超过 2 年前

What about just asking Carmack on twitter?

评论 #34643865 未加载

评论 #34645611 未加载

KRAKRISMOTT超过 2 年前

Start tweeting at him until he shares

评论 #34642659 未加载

EvgeniyZh超过 2 年前

Attention, scaling laws, diffusion, vision transformers, Bert/Roberta, CLIP, chinchilla, chatgpt-related papers, nerf, flamingo, RETRO/some retrieval sota

评论 #34645578 未加载

评论 #34645577 未加载

username3超过 2 年前

They asked on Twitter and he didn’t reply. We need someone with a blue check mark to ask. <a href="https://twitter.com/ifree0/status/1620855608839897094" rel="nofollow">https://twitter.com/ifree0/status/1620855608839897094</a>

评论 #34647255 未加载

winwhiz超过 2 年前

I had read that somewhere else and this is as far as I got<a href="https://twitter.com/id_aa_carmack/status/1241219019681792010" rel="nofollow">https://twitter.com/id_aa_carmack/status/1241219019681792010</a>

throwaway4837超过 2 年前

Wow, crazy coincidence that you all read this article yesterday too. I was thinking of emailing one of them for the list, then I fell asleep. Cold emails to scientists generally have a higher success-rate than average in my experience.

cloudking超过 2 年前

Ilya's publications may be on the list <a href="https://scholar.google.com/citations?user=x04W_mMAAAAJ&hl=en" rel="nofollow">https://scholar.google.com/citations?user=x04W_mMAAAAJ&hl=en</a>

daviziko超过 2 年前

I wonder what would Ilya Sutskever would recommend as an updated list nowadays. I don't have a twitter account, otherwise I'd ask him myself :)

Phil_Latio超过 2 年前

Not in the list: <a href="https://arxiv.org/pdf/1805.09001.pdf" rel="nofollow">https://arxiv.org/pdf/1805.09001.pdf</a>

evc123超过 2 年前

<a href="https://arxiv.org/abs/2210.14891" rel="nofollow">https://arxiv.org/abs/2210.14891</a>

adt超过 2 年前

<a href="https://lifearchitect.ai/papers/" rel="nofollow">https://lifearchitect.ai/papers/</a>

vikashrungta超过 2 年前

I posted a list of papers on twitter, and will be posting a summary for each of them as well. here is the list <a href="https://twitter.com/vrungta/status/1623343807227105280" rel="nofollow">https://twitter.com/vrungta/status/1623343807227105280</a>Unlocking the Secrets of AI: A Journey through the Foundational Papers by @vrungta (2023)1. "Attention is All You Need" (2017) - <a href="https://arxiv.org/abs/1706.03762" rel="nofollow">https://arxiv.org/abs/1706.03762</a> (Google Brain) 2. "Generative Adversarial Networks" (2014) - <a href="https://arxiv.org/abs/1406.2661" rel="nofollow">https://arxiv.org/abs/1406.2661</a> (University of Montreal) 3. "Dynamic Routing Between Capsules" (2017) - <a href="https://arxiv.org/abs/1710.09829" rel="nofollow">https://arxiv.org/abs/1710.09829</a> (Google Brain) 4. "Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks" (2016) - <a href="https://arxiv.org/abs/1511.06434" rel="nofollow">https://arxiv.org/abs/1511.06434</a> (University of Montreal) 5. "ImageNet Classification with Deep Convolutional Neural Networks" (2012) - <a href="https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf" rel="nofollow">https://papers.nips.cc/paper/4824-imagenet-classification-wi...</a> (University of Toronto) 6. "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" (2018) - <a href="https://arxiv.org/abs/1810.04805" rel="nofollow">https://arxiv.org/abs/1810.04805</a> (Google) 7. "RoBERTa: A Robustly Optimized BERT Pretraining Approach" (2019) - <a href="https://arxiv.org/abs/1907.11692" rel="nofollow">https://arxiv.org/abs/1907.11692</a> (Facebook AI) 8. "ELMo: Deep contextualized word representations" (2018) - <a href="https://arxiv.org/abs/1802.05365" rel="nofollow">https://arxiv.org/abs/1802.05365</a> (Allen Institute for Artificial Intelligence) 9. "Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context" (2019) - <a href="https://arxiv.org/abs/1901.02860" rel="nofollow">https://arxiv.org/abs/1901.02860</a> (Google AI Language) 10. "XLNet: Generalized Autoregressive Pretraining for Language Understanding" (2019) - <a href="https://arxiv.org/abs/1906.08237" rel="nofollow">https://arxiv.org/abs/1906.08237</a> (Google AI Language) 11. T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (2020) - <a href="https://arxiv.org/abs/1910.10683" rel="nofollow">https://arxiv.org/abs/1910.10683</a> (Google Research) 12. "Language Models are Few-Shot Learners" (2021) - <a href="https://arxiv.org/abs/2005.14165" rel="nofollow">https://arxiv.org/abs/2005.14165</a> (OpenAI)

theusus超过 2 年前

like papers are that comprehensible.

mgaunard超过 2 年前

In my experience, all deep learning is overhyped, and most needs that are not already addressable by linear regressions can be done so with simple supervised learning.