Like others in this thread have said, we're just starting to explore the technology. I view it as akin to early CPUs like the 6502 which only did the absolute minimum to today's monsters with large memory caches, predictive logic, dedicated circuits, thousands of binary calculation shortcuts and more all built in. Each small improvement adds up.<p>From a software perspective, I've wondered for a while if as LLM usage matures, there will be an effort to optimize hotspots like what happened with VMs, or auto indexing like in relational DBs. I'm sure there are common data paths which get more usage, which could somehow be prioritized, either through pre-processing or dynamically, helping speed up inference.<p>Also, GPT4 seems to include multiple LLMs working in concert. There's bound to be way more fruit to picked along that route as well. In short, there's tons of areas where improvements large and small can be made.<p>As always in computer science, the maxim, "Make it work, make it work well, then make it work fast," applies here as well. We're collectively still at step one.