Look at how Alpha Go started with human data, and then they found a way to train it without that. I've been wondering if it might be possible to do a similar thing with LLMs by grounding them on real world video by having them predict what happens in the video. I suppose you'd still need some minimal language ability to bootstrap it from, but imagine it learning the laws of physics and mathematics from the ground up.