The Bitter Lesson (2019)

98 pointsby radkapitalalmost 5 years ago

27 comments

>> In computer chess, the methods that defeated the world champion, Kasparov, in 1997, were based on massive, deep search."Massive, deep search" that started from a book of opening moves and the combined expert knowledge of several chess Grandmasters. And that was an instance of the minimax algorithm with alpha-beta cutoff, i.e. a search algorithm specifically designed for two-player, deterministic games like chess. And with a hand-crafted evaluation function, whose parameters were filled-in by self-play. But still, an evaluation function; because the minimax algorithm requires one and blind search alone did not, could not, come up with minimax, or with the concept of an evaluation function in a million years. Essentially, human expertise about what matters in the game was baked-in to Deep Blue's design from the very beginning and permeated every aspect of its design.Of course, ultimately, search was what allowed Deep Blue to beat Kasparov (3½–2½; Kasparov won two games and drew another). That, in the sense that the alpha-beta minimax algorithm itself is a search algorithm and it goes without saying that a longer, deeper, better search will inevitably eventually outperform whatever a human player is doing, which clearly is not search.But, rather than an irrelevant "bitter" lesson about how big machines can perfom more computations than a human, a really useful lesson -and one that we haven't yet learned, as a field- is why humans can do so well without search. It is clear to anyone who has played any board game that humans can't search ahead more than a scant few ply, even for the simplest games. And yet, it took 30 years (counting from the Dartmouth workshop) for a computer chess player to beat an expert human player. And almost 60 to beat one in Go.No, no. The biggest question in the field is not one that is answered by "a deeper search". The biggest question is "how can we do that without a search"?Also see Rodney Brook's "better lesson" [2] addressing the other successes of big search discussed in the article._____________[1] <a href="https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)#Design" rel="nofollow">https://en.wikipedia.org/wiki/Deep_Blue_(chess_computer)#Des...</a>[2] <a href="https://rodneybrooks.com/a-better-lesson/" rel="nofollow">https://rodneybrooks.com/a-better-lesson/</a>

评论 #23785424 未加载

评论 #23783785 未加载

评论 #23783951 未加载

评论 #23784240 未加载

kdohertyalmost 5 years ago

Potentially also of interest is Rod Brooks' response "A Better Lesson" (2019): <a href="https://rodneybrooks.com/a-better-lesson/" rel="nofollow">https://rodneybrooks.com/a-better-lesson/</a>

评论 #23793229 未加载

auggierosealmost 5 years ago

I guess it depends on what you trying to do. I had a computer vision problem where I was like, hell yeah, let's machine learn the hell out of this. 2 months later, and the results were just not precise enough. It took me 2 more months, and now I am solving the task easily on an iPhone via Apple Metal in milliseconds with a hand-crafted optimisation approach ...

评论 #23782081 未加载

ksdalealmost 5 years ago

I think it's plausible that many technological advances follow a similar. Something like the steam engine is a step-improvement, but many of the subsequent improvements are basically the obvious next step, implemented once steel is strong enough, or machining precise enough, or fuel is refined enough. How many times has the world changed qualitatively, simply in the pursuit of making things quantitatively bigger or faster or stronger?I can certainly see how it could be considered disappointing that pure intellect and creativity doesn't always win out, but I, personally, don't think it's bitter.I also have a pet theory that the first AGI will actually be 10,000 very simple algorithms/sensors/APIs duct-taped together running on ridiculously powerful equipment rather than any sort of elegant Theory of Everything, and this wild conjecture may make me less likely to think this a bitter lesson...

评论 #23790531 未加载

fxtentaclealmost 5 years ago

The current top contender on AI optical flow uses LESS CPU and LESS RAM than last year's leader. As such, I strongly disagree with the article.Yes, many AI fields have become better from improved computational power. But this additional computational power has unlocked architectural choices which were previously impossible to execute in a timely manner.So the conclusion may equally well be that a good network architecture results in a good result. And if you cannot use the right architecture due to RAM or CPU constraints, then you will get bad results.And while taking an old AI algorithm and re-training it with 2x the original parameters and 2x the data does work and does improve results, I would argue that that's kind of low-level copycat "research" and not advancing the field. Yes, there's a lot of people doing it, but no, it's not significantly advancing the field. It's tiny incremental baby steps.In the area of optical flow, this year's new top contenders introduce many completely novel approaches, such as new normalization methods, new data representations, new nonlinearities and a full bag of "never used before" augmentation methods. All of these are handcrafted elements that someone built by observing what "bug" needs fixing. And that easily halved the loss rate, compared to last year's architectures, while using LESS CPU and RAM. So to me, that is clear proof of a superior network architecture, not of additional computing power.

评论 #23786358 未加载

JoeAltmaieralmost 5 years ago

Got to believe, this is like heroin. Its a win until it isn't. Then where will AI researchers be? No progress for 20 (50?) years because the temptation to not understand but to just build performant engineering solutions, was so strong.In fact, is the researcher supposed to be building the most performant solution? This article seems alarmingly misinformed. To understand 'artificial intelligence' isn't a race to VC money.

评论 #23782383 未加载

评论 #23782515 未加载

astrophysicianalmost 5 years ago

I think what he's basically saying is that priors (i.e. domain knowledge + custom, domain-inspired models) help when you're data limited or when your data is very biased, but once that's not the case (e.g. we have an infinite supply of voice samples), model capacity is usually all that matters.

sytsealmost 5 years ago

The article says we should focus on increasing the compute we use in AI instead of embedding domain specific knowledge. OpenAI seems to have taken this lesson to heart. They are training a generic model using more compute than anything else.Many researchers predict a plateau for AI because it is missing the domain specific knowledge but this article and the benefits of more compute that OpenAI is demonstrating beg to differ.

评论 #23782750 未加载

aszenalmost 5 years ago

Interesting, I wonder what happens now that Moore's law is considered dead and we can't rely on computation power increasing year over year. To make further progess with general purpose search and learning methods we will need lots more computational power which may not be cheaply available. Then do we focus our efforts on developing more efficient learning strategies like the one we have in our minds ?I do agree with the part about not embedding human knowledge into our computer models, any knowledge worth learning about any domain the computer should be able learn on its own to make true progress in AI.

评论 #23783009 未加载

评论 #23783746 未加载

评论 #23782935 未加载

评论 #23783052 未加载

评论 #23782901 未加载

maestalmost 5 years ago

For contrast, take this Hofstadter quote:> This, then, is the trillion-dollar question: Will the approach undergirding AI today—an approach that borrows little from the mind, that’s grounded instead in big data and big engineering—get us to where we want to go? How do you make a search engine that understands if you don’t know how you understand? Perhaps, as Russell and Norvig politely acknowledge in the last chapter of their textbook, in taking its practical turn, AI has become too much like the man who tries to get to the moon by climbing a tree: “One can report steady progress, all the way to the top of the tree.”My take is that there is something intelectually unsatisfying about solving a problem by simply throwing more computational power at it, instead of trying to understand it better.Imagine in a parallel universe where computational power is extremely cheap. In this universe, people solve integrals exclusively by numerical integrations so there is no incentive to develop any of the Analysis theory we currently have. I would expect that to be a net negative in the long run as theories like Gen Relativity would be almost impossible to develop without the current mathematical apparatus.

评论 #23785190 未加载

dyukqualmost 5 years ago

Previous discussion: <a href="https://news.ycombinator.com/item?id=19393432" rel="nofollow">https://news.ycombinator.com/item?id=19393432</a>

koengalmost 5 years ago

This lesson can be applied to synthetic biology right now, though it is still in its infant stages.At least a few of the original synthetic biologists are a bit disappointed in the rise of high-throughput testing for everything, instead of "robust engineering". Perhaps what allows us to understand life isn't just more science, but more "biotech computation".

ruudaalmost 5 years ago

A slightly more recent post, that really opened my eyes to this insight (and references The Bitter Lesson) is this piece by Gwern on the scaling hypothesis: <a href="https://www.gwern.net/newsletter/2020/05#gpt-3" rel="nofollow">https://www.gwern.net/newsletter/2020/05#gpt-3</a>

throwaway7281almost 5 years ago

This reminds me of the Banko and Brill paper "Scaling to very very large corpora for natural language disambiguation" - <a href="https://dl.acm.org/doi/10.3115/1073012.1073017" rel="nofollow">https://dl.acm.org/doi/10.3115/1073012.1073017</a>.It is exactly the point and it is something not a lot of researchers really grok. As a researcher you are so smart, why can't you discover whatever you are seeking? I think in this decade, we see a couple more scientific discoveries by brute force which will hopefully will make the scientific type a bit more humble an honest.

cgearhartalmost 5 years ago

I have read this before and broadly agree with the point—it’s no use trying to curate expertise into AI. But I don’t think modeling p(y|x) or it’s friend p(y, x) is the end we’re looking for either. But, it’s unreasonably effective, so we keep doing it. (I don’t have an answer or an alternative; causality appeals to my intuition, but it’s really clunky and has seemingly not paid off.)

评论 #23781866 未加载

coldteaalmost 5 years ago

>At the time, this was looked upon with dismay by the majority of computer-chess researchers who had pursued methods that leveraged human understanding of the special structure of chess.This seems problematic as a concept in itself.Sure human players have a "human understanding of the special structure of chess". But what makes them play could be an equally "deep search" and fuzzy computations done in the brain that and not some conscious step by step reasoning. Or rather, their "conscious step by step reasoning" to my opinion probably relies on tops on subconscious deep search in the brain for pruning the possible moves, etc.I don't think anybody plays chess at any great level merely by making conscious step by step decisions.Similar to how when we want to catch a ball thrown at us, we do some thinking like "they threw it to our right, so we better move right" but we also have tons of subconscious calculations of the trajectory (nobody sits and explicitly calculates the parabolic formula when they're thrown a baseball).

francoispalmost 5 years ago

building a model for and with domain knowledge == premature optimization? In the end a win on kaggle or a published paper seems to depend on tweaking hyperparameters based on even more pointed DK: data set knowledge...I wonder what would be required to build a model that explores the search space of compilable programs in say python that sorts in correct order. Applying this idea of using ML techniques to finding better "thinking" blocks for silicon seems promising.

评论 #23786557 未加载

overhypalmost 5 years ago

I would like to offer what I believe is a counterpoint, but I am not a trained ML researcher so I am not sure if it is even a counter-point. Maybe it is just an observation.I recently participated in the following Kaggle competition:<a href="https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge/" rel="nofollow">https://www.kaggle.com/allen-institute-for-ai/CORD-19-resear...</a>Now, you can see the kinds of questions the contest expects the ML to answer, just to take an example:"Effectiveness of movement control strategies to prevent secondary transmission in health care and community settings"All I can say is that the contest results, on the whole, were completely underwhelming. You can check out the Contributions page to verify this for yourself. If the consequences of the failure weren't so potentially catastrophic, some might even call it a little comical. I mean, its not as if a pandemic comes around every few months, so we can all just wait for the computational power to catch up to solve these problems like the author suggests.Also, I couldn't help but feel that nearly all participants were more interested in applying the latest and greatest ML advancement (Bert QA!), often with no regard to the problem which was being solved.I wish I could tell you I have some special insight into a better way to solve it, given that there is a friggin pandemic going on, and we could all very well do with some real friggin answers! I don't have any such special insight at all. All I found out was that everyone was so obsessed with using the latest and greatest ML techniques, that there was practically no first principles thinking. At the end, everyone just sort of got too drained and gave up, which is reflected by a single participant winning pretty much the entire second round of 7-8 task prizes by the virtue of being the last man standing :-)I have realized two things.1) ML, at least when it comes to understanding text, is really overhyped2) Nearly everyone who works in ML research is probably overpaid by a factor of 100 (just pulling some number out of my you know what), given that the results they have actually produced have fallen so short precisely when they were so desperately needed

vlmutoloalmost 5 years ago

It’s funny when you’ve been thinking for months about how speech recognition could really benefit from integrating models of the human vocal tract…and then you read this

评论 #23782182 未加载

评论 #23786171 未加载

评论 #23781469 未加载

glitchcalmost 5 years ago

When it comes to games, exploitation (of tendencies, weaknesses), misdirection, subterfuge and yomi play a far bigger role in winning than actual skill. Humans are much better than computers at all of those. Perhaps a dubious honour, but an advantage nonetheless. We're only really in trouble when the machine learns to reliably replicate the same tactics.

评论 #23783465 未加载

sidpatilalmost 5 years ago

<a href="http://norvig.com/chomsky.html" rel="nofollow">http://norvig.com/chomsky.html</a>

avmichalmost 5 years ago

> When a simpler, search-based approach with special hardware and software proved vastly more effective, these human-knowledge-based chess researchers were not good losers.It's like calling Russia a loser in Cold War. Technically the effect is reached; practically the side which "lost" gained possibly largest benefits.

KKKKkkkk1almost 5 years ago

Today Elon Musk announced that Tesla is going to reach level-5 autonomy by the end of the year. SpecificallyThere are no fundamental challenges remaining for level-5 autonomy. There are many small problems. And then there's the challenge of solving all those small problems and then putting the whole system together. [0]I feel like this year is going to be another year in which the proponents of brute-force AI like Elon and Sutton will learn a bitter lesson.[0] <a href="https://twitter.com/yicaichina/status/1281149226659901441" rel="nofollow">https://twitter.com/yicaichina/status/1281149226659901441</a>

评论 #23784026 未加载

lambdatronicsalmost 5 years ago

TL;DR: AI needs a hand up, not a handout. "We want AI agents that can discover like we can, not which contain what we have discovered." I was internally protesting all the way through the note, until I got to that penultimate sentence.

评论 #23781806 未加载

annoyingnoobalmost 5 years ago

That is a wall of words, I can't even read it in that format.

totally_a_humanalmost 5 years ago

This page seems to be down. Is there a mirror?

mtgp1000almost 5 years ago

>We want AI agents that can discover like we can, not which contain what we have discovered. Building in our discoveries only makes it harder to see how the discovering process can be done.I think these lessons are less appropriate as our hardware and our understanding of neural networks improve. An agent which is able to [self] learn complex probabilistic relationships between inputs and outputs (i.e. heuristics) requires a minimum complexity/performance, both in hardware and neural network design, before any sort of useful[self] learning is possible. We've only recently crossed that threshold (5-10 years ago)>The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large marginAdmittedly, I'm not quite sure of the author's point. They seem to indicate that there is a trade-off between spending time optimizing the architecture and baking in human knowledge.If that's the case, I would argue that there is an impending perspective shift in the field of ML, wherein "human knowledge" is not something to hardcode explicitly, but instead is implicitly delivered through a combination of appropriate data curation and design of neural networks which are primed to learn certain relationships.That's the future and we're just collectively starting down that path - it will take some time for the relevant human knowledge to accumulate.