If there's one Big Lie in the AI field, it's the idea that the models just "learn things" as if we invented magic that eats raw text and images and poops out intelligence. Not nearly enough attention is paid to the labeling and filtering infrastructure used to produce the datasets, a lot of which is human labor or derivatives thereof[0].<p>The fact that this labor is being offloaded to third-world kids who know how to steal their parents ID to earn beer money^W^W0.69 bobux is also a scandal, but it's the boring, frustratingly mundane kind that is common to all "gig economy" companies. Surely, not every kid is being permanently damaged by this exploitation, and there's certainly worse exploitation out there, but it's still exploitation. And these AI models aren't AI, they're products of lots of human labeling and filtering.<p>[0] e.g. reward modeling