When GPT was released, it was a huge milestone in terms of starting a massive growth of AI LLM's. But why? What made this possible? We've done neural networks for years, but why have they suddenly become so good? Is it hardware? Technique? What what the defining moment?
It's this:<p><a href="https://en.wikipedia.org/wiki/Attention_Is_All_You_Need" rel="nofollow">https://en.wikipedia.org/wiki/Attention_Is_All_You_Need</a><p>They adapted a technique developed for translation, which had already been advancing a lot over the past decade or so.<p>"Attention" requires really big matrices, and they threw truly vast amounts of data at it. People had been developing techniques for managing that sheer amount of computation, including dedicated hardware and GPUs.<p>It's still remarkable that it got <i>so</i> good. It's as if there is some emergent phenomenon that appeared only when enough data was approached the right way. So it's not at all clear whether significant improvements will require another significant discovery, or if it's just a matter of evolution from here.