Ask HN: Could predicting software output be used for synthetic data?

1 pointsby maxutilityabout 1 year ago

I’ve been reading a lot about the cliff that AI frontier models face as training data sources dry up. I’ve seen synthetic data mentioned as an option but haven’t seen a lot of details (maybe I haven’t looked hard enough).<p>I’m curious whether you could create an unlimited resource of synthetic data and improve coding/logic performance by having an LLM generate code and then train on predicting (1) whether it compiles and (2) what outputs it would generate for an unlimited series of generated inputs.

1 comment

maxutilityabout 1 year ago

You could call it “bootstrapping is all you need” :)