anyone else concerned that training models on synthetic, LLM-generated data might push us into a linguistic feedback loop?
relying on LLM text for training could bias the next model towards even more overuse of words like "delve", "showcasing", and "underscores"...