If you want to skip all the culture wars stuff, the interesting bit here is the alleged description of how Gemini's image generation prompting is preprocessed:<p>"<i>Roughly, the “safety” architecture designed around image generation (slightly different than text) looks like this: a user makes a request for an image in the chat interface, which Gemini — once it realizes it’s being asked for a picture — sends on to a smaller LLM that exists specifically for rewriting prompts in keeping with the company’s thorough “diversity” mandates. This smaller LLM is trained with LoRA on synthetic data generated by another (third) LLM that uses Google’s full, pages-long diversity “preamble.” The second LLM then rephrases the question (say, “show me an auto mechanic” becomes “show me an Asian auto mechanic in overalls laughing, an African American female auto mechanic holding a wrench, a Native American auto mechanic with a hard hat” etc.), and sends it on to the diffusion model. The diffusion model checks to make sure the prompts don’t violate standard safety policy (things like self-harm, anything with children, images of real people), generates the images, checks the images again for violations of safety policy, and returns them to the user.<p>“Three entire models all kind of designed for adding diversity,” I asked one person close to the safety architecture. “It seems like that — diversity — is a huge, maybe even central part of the product. Like, in a way it is the product?”<p>“Yes,” he said, “we spend probably half of our engineering hours on this.”</i>"<p>If true (admittedly a big if), it torpedoes the suggestion made by many that Gemini's misbehavior was simply accidental teething problems that were missed in testing.