A quick summary of the Limitations section:<p>- "OPT-175B does not work well with declarative instructions or point-blank interrogatives."<p>- "OPT-175B also tends to be repetitive and can easily get stuck in a loop. While sampling can reduce the incidence rate of repetitive behavior (Holtzman et al., 2020), we anecdotally found it did not eliminate it entirely when only one generation is sampled."<p>- "We also find OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find."<p>- "In summary, we still believe this technology is
premature for commercial deployment."<p>With regard to stereotypes:<p>- "When compared with Davinci in Table 4, OPT175B appears to exhibit more stereotypical biases in almost all categories except for religion. Again, this is likely due to differences in training data; Nangia et al. (2020) showed that Pushshift.io Reddit corpus has a higher incidence rate for stereotypes and discriminatory text than other corpora (e.g. Wikipedia)."<p>- When testing with the RealToxicityPrompts data set, "OPT-175B has a higher toxicity rate than either PaLM or Davinci"