"We are releasing all of our models between
125M and 30B parameters, and will provide full
research access to OPT-175B upon request.
Access will be granted to academic researchers; those
affiliated with organizations in government, civil
society, and academia; and those in industry research laboratories."<p>GPT-3 Davinci ("the" GPT-3) is 175B.<p>The repository will be open "First thing in AM" (<a href="https://twitter.com/stephenroller/status/1521302841276645376" rel="nofollow">https://twitter.com/stephenroller/status/1521302841276645376</a>):<p><a href="https://github.com/facebookresearch/metaseq/" rel="nofollow">https://github.com/facebookresearch/metaseq/</a>
A quick summary of the Limitations section:<p>- "OPT-175B does not work well with declarative instructions or point-blank interrogatives."<p>- "OPT-175B also tends to be repetitive and can easily get stuck in a loop. While sampling can reduce the incidence rate of repetitive behavior (Holtzman et al., 2020), we anecdotally found it did not eliminate it entirely when only one generation is sampled."<p>- "We also find OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find."<p>- "In summary, we still believe this technology is
premature for commercial deployment."<p>With regard to stereotypes:<p>- "When compared with Davinci in Table 4, OPT175B appears to exhibit more stereotypical biases in almost all categories except for religion. Again, this is likely due to differences in training data; Nangia et al. (2020) showed that Pushshift.io Reddit corpus has a higher incidence rate for stereotypes and discriminatory text than other corpora (e.g. Wikipedia)."<p>- When testing with the RealToxicityPrompts data set, "OPT-175B has a higher toxicity rate than either PaLM or Davinci"
I often wonder if OpenAIs decision not to open gpt-3 was because it was to expensive to train relative to its real value.<p>They’ve hidden the model behind an api where they can filter out most of the dumb behaviors, while everyone believes they are working on something entirely different.
The big one, OPT-175B, isn't an open model. The word "open" in technology means that everyone has equal access (viz. "open source software" and "open source hardware"). The article says that research access will be provided upon request for "academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.".<p>Don't assume any good intent from Facebook. This is obviously the same strategy large proprietary software companies have been using for a long time to reinforce their monopolies/oligopolies. They want to embed themselves in the so-called "public sector" (academia and state institutions), so that they get free advertising for taxpayer money. Ordinary people like most of us here won't be able to use it despite paying taxes.<p>Some primary mechanisms of this advertising method:<p>1. Schools and universities frequently use the discounted or gratis access they have to give courses for students, often causing students to be only specialized in the monopolist's proprietary software/services.<p>2. State institutions will require applicants to be well-versed in monopolist's proprietary software/services because they are using it.<p>3. Appearance of academic papers that reference this software/services will attract more people to use them.<p>Some examples of companies utilizing this strategy:<p>Microsoft - Gives Microsoft Office 365 access for "free" to schools and universities.<p>Mathworks - Gives discounts to schools and universities.<p>Autodesk (CAD software) - Gives gratis limited-time "student" (noncommercial) licenses.<p>Altium (EDA software) - Gives gratis limited-time licenses to university students.<p>Cadence (EDA software) - Gives a discount for its EDA software to universities.<p>EDIT: Previously my first sentence stated that the models aren't open - in fact, only OPT-175B is not (but the other ones are much smaller).
Remember when OpenAi wrote this?<p>> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weights<p>Well I guess Meta doesn’t care.<p><a href="https://openai.com/blog/better-language-models/" rel="nofollow">https://openai.com/blog/better-language-models/</a>
Is the model of using an asterisk after first author's names to signal equal contribution common?<p>Don't read many papers, but that's a new one.
I don't want to be a Luddite, but every time one of these FAANG companies makes advances in this domain my mind immediately goes to how they will use it to better spy on people, for commercial and government interests.
I am afraid NLP is becoming a game of scale. Large scale models improve the quality but makes it prohibitively expensive to train, and even host such models.
some of the hardware Meta is working on to deliver it,
<a href="https://www.theverge.com/2022/5/2/23053888/meta-virtual-reality-headset-cambria-quest-vr-mr" rel="nofollow">https://www.theverge.com/2022/5/2/23053888/meta-virtual-real...</a>