For the most part, a great article. But:<p>> If most references to doctors in the corpus are men, and most references to nurses are women, the models will discover this in their training and reflect or even enhance these biases. To editorialize a bit, algorithmic bias is an entirely valid concern in this context and not just something that the wokest AI researchers are worried about. Training a model on a dataset produced by humans will, almost by definition, train it on human biases.<p>> Are there workarounds? Sure. This is not my area of expertise, so I’ll be circumspect. But one approach is to change the composition of the corpus. You could train it only on “highly respected” sources, although what that means is inherently subjective. Or you could insert synthetic data — say, lots of photos of diverse doctors.<p>If most (but not all) doctors are men and most (but not all) nurses are women, then an algorithm which usually (but not always) produces pictures of male doctors and female nurses isn’t biased; it’s <i>correct</i>. And likewise, training it on non-representative (i.e., non-representative of reality) photos is just <i>lying</i>.