OPT: Open Pre-trained Transformer Language Models

461 点作者 MasterScrat大约 3 年前

26 条评论

"We are releasing all of our models between 125M and 30B parameters, and will provide full research access to OPT-175B upon request. Access will be granted to academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories."GPT-3 Davinci ("the" GPT-3) is 175B.The repository will be open "First thing in AM" (<a href="https://twitter.com/stephenroller/status/1521302841276645376" rel="nofollow">https://twitter.com/stephenroller/status/1521302841276645376</a>):<a href="https://github.com/facebookresearch/metaseq/" rel="nofollow">https://github.com/facebookresearch/metaseq/</a>

评论 #31243763 未加载

评论 #31245852 未加载

thorum大约 3 年前

A quick summary of the Limitations section:- "OPT-175B does not work well with declarative instructions or point-blank interrogatives."- "OPT-175B also tends to be repetitive and can easily get stuck in a loop. While sampling can reduce the incidence rate of repetitive behavior (Holtzman et al., 2020), we anecdotally found it did not eliminate it entirely when only one generation is sampled."- "We also find OPT-175B has a high propensity to generate toxic language and reinforce harmful stereotypes, even when provided with a relatively innocuous prompt (Gehman et al., 2020), and adversarial prompts are trivial to find."- "In summary, we still believe this technology is premature for commercial deployment."With regard to stereotypes:- "When compared with Davinci in Table 4, OPT175B appears to exhibit more stereotypical biases in almost all categories except for religion. Again, this is likely due to differences in training data; Nangia et al. (2020) showed that Pushshift.io Reddit corpus has a higher incidence rate for stereotypes and discriminatory text than other corpora (e.g. Wikipedia)."- When testing with the RealToxicityPrompts data set, "OPT-175B has a higher toxicity rate than either PaLM or Davinci"

评论 #31244302 未加载

评论 #31245243 未加载

评论 #31244291 未加载

评论 #31247270 未加载

评论 #31244370 未加载

评论 #31245977 未加载

评论 #31244507 未加载

crazypython大约 3 年前

BigScience (a coalition including Huggingface) is training and releasing a 175B language model and finishes in 2 month.

lumost大约 3 年前

I often wonder if OpenAIs decision not to open gpt-3 was because it was to expensive to train relative to its real value.They’ve hidden the model behind an api where they can filter out most of the dumb behaviors, while everyone believes they are working on something entirely different.

评论 #31244584 未加载

评论 #31245149 未加载

评论 #31244483 未加载

mikolajw大约 3 年前

The big one, OPT-175B, isn't an open model. The word "open" in technology means that everyone has equal access (viz. "open source software" and "open source hardware"). The article says that research access will be provided upon request for "academic researchers; those affiliated with organizations in government, civil society, and academia; and those in industry research laboratories.".Don't assume any good intent from Facebook. This is obviously the same strategy large proprietary software companies have been using for a long time to reinforce their monopolies/oligopolies. They want to embed themselves in the so-called "public sector" (academia and state institutions), so that they get free advertising for taxpayer money. Ordinary people like most of us here won't be able to use it despite paying taxes.Some primary mechanisms of this advertising method:1. Schools and universities frequently use the discounted or gratis access they have to give courses for students, often causing students to be only specialized in the monopolist's proprietary software/services.2. State institutions will require applicants to be well-versed in monopolist's proprietary software/services because they are using it.3. Appearance of academic papers that reference this software/services will attract more people to use them.Some examples of companies utilizing this strategy:Microsoft - Gives Microsoft Office 365 access for "free" to schools and universities.Mathworks - Gives discounts to schools and universities.Autodesk (CAD software) - Gives gratis limited-time "student" (noncommercial) licenses.Altium (EDA software) - Gives gratis limited-time licenses to university students.Cadence (EDA software) - Gives a discount for its EDA software to universities.EDIT: Previously my first sentence stated that the models aren't open - in fact, only OPT-175B is not (but the other ones are much smaller).

评论 #31245022 未加载

评论 #31244379 未加载

评论 #31244668 未加载

评论 #31244570 未加载

coding123大约 3 年前

Can someone open a Bittorrent seed if you get it

LeicaLatte大约 3 年前

As someone who finds openai patronizing, this is welcome.

评论 #31247286 未加载

causality0大约 3 年前

Out of curiosity, what's the file size on that?

评论 #31243968 未加载

评论 #31243989 未加载

评论 #31245117 未加载

d--b大约 3 年前

Remember when OpenAi wrote this?> Due to concerns about large language models being used to generate deceptive, biased, or abusive language at scale, we are only releasing a much smaller version of GPT-2 along with sampling code. We are not releasing the dataset, training code, or GPT-2 model weightsWell I guess Meta doesn’t care.<a href="https://openai.com/blog/better-language-models/" rel="nofollow">https://openai.com/blog/better-language-models/</a>

评论 #31245016 未加载

评论 #31245067 未加载

评论 #31245694 未加载

评论 #31246252 未加载

评论 #31245357 未加载

urthor大约 3 年前

Is the model of using an asterisk after first author's names to signal equal contribution common?Don't read many papers, but that's a new one.

评论 #31247292 未加载

etaioinshrdlu大约 3 年前

What type of hardware would you need to run it?

评论 #31248087 未加载

f311a大约 3 年前

Just curious, will I be able to use it using my Nvidia card with 10GB of memory? Does it require multiple graphic cards?

评论 #31247981 未加载

评论 #31246023 未加载

p1esk大约 3 年前

We are also releasing our logbook detailing the infrastructure challenges we facedWhere’s the logbook?

评论 #31244332 未加载

评论 #31248460 未加载

einpoklum大约 3 年前

I don't want to be a Luddite, but every time one of these FAANG companies makes advances in this domain my mind immediately goes to how they will use it to better spy on people, for commercial and government interests.

jfmc大约 3 年前

We need robopsychologists.

评论 #31246761 未加载

israrkhan大约 3 年前

I am afraid NLP is becoming a game of scale. Large scale models improve the quality but makes it prohibitively expensive to train, and even host such models.

qgin大约 3 年前

If we’re already at the level of truly dangerous ml models… I don’t have a lot of hope for how the next decades are going to play out.

Pragati_08大约 3 年前

some of the hardware Meta is working on to deliver it, <a href="https://www.theverge.com/2022/5/2/23053888/meta-virtual-reality-headset-cambria-quest-vr-mr" rel="nofollow">https://www.theverge.com/2022/5/2/23053888/meta-virtual-real...</a>

sjg007大约 3 年前

How about smaller more performant models? There’s so much redundancy in language that it should be possible.

chrisMyzel大约 3 年前

Does anyone else think closed AI is turning into it's most weirdest forms and becoming a trend?

langsoul-com大约 3 年前

I hope someone released a DALLE model. That seems far more interesting to play with.

评论 #31245166 未加载

lol1lol大约 3 年前

I appreciate that they are releasing their log book detailing the challenges faced.

aleks5678大约 3 年前

Thanks Meta AI

ctreseler123大约 3 年前

announced because GPT4 makes this so very obsolete.

anubhav200大约 3 年前

Download link?

scrollbar大约 3 年前

Does this make Meta AI more “open” than OpenAI? Oh, the irony.

评论 #31243764 未加载

评论 #31243917 未加载

评论 #31243775 未加载

评论 #31243751 未加载

评论 #31244239 未加载

评论 #31244572 未加载