With the rise of AI-powered tools such as Copilot and Cursor, can we expect:<p>The network effect to lead to a self-reinforcing cycle, where more users and better datasets fuel even better Copilot experiences.<p>- More developers using Copilot will contribute to a collective knowledge base that grows with each use.
- This collective knowledge base provides a richer dataset for future improvements, leading to a better Copilot experience for developers.
- As a result of these benefits, more people will be attracted to the language, creating a virtuous cycle of growth and improvement.<p>Would love to hear your thoughts.
This seems unlikely because LLMs don't produce high quality code, they produce average code. So they don't contribute to a better dataset, they contribute to a narrower dataset around the average. LLM tend to self poison, not to self improve. There is a good chance it already started because of the huge amount of chatgpt code that was put on github since 2021. Maybe if the LLM authors use some quality filter to discard 80%of the dataset it can be avoided.
They don't need that much data.<p>They operate in a higher dimensional space.<p>You can fine-tune a model trained on JS/Python and teach it Lua with little issue. If you have a proper rosetta for your language to a language that is well represented in the training corpus, it isn't an issue.
I was wondering if you could go the other way, could the statistical knowledge of what most people <i>want</i> when they type XYZ mean we could use it to design more powerful languages that are even less verbose.<p>I don't really know but I hope someone answers this question!