RL doesn't completely "work" yet, it still has a scalability problem. Claude can write a small project, but as it becomes larger, Claude gets confused and starts making mistakes.<p>I used to think the problem was that models can't learn over time like humans, but maybe that can be worked around. Today's models have large enough context windows to fit a medium sized project's complete code and documentation, and tomorrow's may be larger; good-enough world knowledge can be maintained by re-training every few months. The real problem is that even models with large context windows struggle with complexity moreso than humans; they miss crucial details, then become very confused when trying to correct their mistakes and/or miss other crucial details (whereas humans sometimes miss crucial details, but are usually able to spot them and fix them without breaking something else).<p>Reliability is another issue, but I think it's related to scalability: an LLM that cannot make reliable inferences from a small input data, cannot grow that into a larger output data without introducing cascading hallucinations.<p>EDIT: creative control is also superseded by reliability and scalability. You can generate any image imaginable with a reliable diffusion model, by first generating something vague, then repeatedly refining it (specifying which details to change and which to keep), each refinement closer to what you're imagining. Except even GPT-4o isn't nearly reliable enough for this technique, because while it can handle a couple refinements, it too starts losing details (changing unrelated things).