Fable, a social media platform alternative to Goodreads, recently launched a Reader Summaries feature that playfully roasts the user based on the books they have recently read. The content is generated by an AI model, and in some cases users have gotten offensive or racist jokes. This has resulted in the company's Head of Product to release this statement.<p>According to the video, the team relied on "safeguards" to prevent this from happening. Two were mentioned:<p>1. "We ask the model to avoid some topics."<p>2. "We also created a second pass offensive language filter."<p>It sounds like the company considered avoiding certain language to be a hard requirement for this feature. The decision to rely almost entirely on the instructions in the model prompt leave me scratching my head. Is this a simple matter of a Series A company shipping something too quickly, or did the engineering team really believe that a couple instructions could guarantee certain responses? (Or both?)