[Disclaimer: I am an author of the above paper and played a rather minimal role. I am also a prominent member of EleutherAI.]<p>"Instruction-tuning" is clearly in the air. Simultaneous work at Google (released less than two weeks ago) on a model they call FLAN can be found here: <a href="https://ai.googleblog.com/2021/10/introducing-flan-more-generalizable.html" rel="nofollow">https://ai.googleblog.com/2021/10/introducing-flan-more-gene...</a><p>EleutherAI attempted to do something similar several months ago, but didn't succeed: <a href="https://blog.eleuther.ai/tuning-on-eval-harness/" rel="nofollow">https://blog.eleuther.ai/tuning-on-eval-harness/</a><p>A careful analysis of the similarities and differences between the three approaches would be likely highly beneficial to the community.
The hosted demo has the default query, "How many hydrogen atoms are in a water molecule?" It said "two".<p>I asked it, "How many oxygen atoms are in a water molecule?". It said "two".
I'm not familiar with the current state of the art language models, so please bear with me for asking: What's the catch here? Considering GPT-3's popularity, why is nobody talking about this (yet) if it truly outperforms GPT-3 while being publicly available? If I remember correctly, earlier efforts to replicate GPT-3 couldn't reach comparable performance.<p>Perhaps it's still a huge hassle to perform inference using this model because of its size, so it doesn't make sense to use this model (compared to paying for OpenAI's API) if you don't happen to have a few spare GPUs lying around?<p>Edit: The title of this HN submission was modified, changing the context for my comment. Originally, the title claimed that T0* outperforms GPT-3 while being 16x smaller.
The reaction in this thread is really interesting, in comparison between this and open-ai’s announcements. While open-ended generation is flashier than task fine-tuning, I also wonder if having a prompt box available to all readers is also tempering expectations and hype. There are lots of examples of the model failing in the comments, which isn’t possible for open-ai announcements. Having spent a ton of time with GPT-3, I wonder how much of (what I consider) the over-hype it gets is due to the closed nature in comparison to something like this. The reaction to this one seems decidedly more realistic.
ArXiv link to the paper: <a href="https://arxiv.org/abs/2110.08207" rel="nofollow">https://arxiv.org/abs/2110.08207</a><p>GitHub repo: <a href="https://github.com/bigscience-workshop/promptsource" rel="nofollow">https://github.com/bigscience-workshop/promptsource</a>
The demo's interesting, though I get errors on some questions.<p>Q: What's better, Python or R?
A: R<p>Q: I'm a web developer. What's better for me, Python or R?
A: R<p>Q: I'm an embedded developer who pays a lot of attention to performance. What's better for me, C or Python?
A: C<p>Q: My dog is 7.5 months old and bites when she's excited. What should I do?
A: train her to not bite<p>Q: How do I train my puppy not to bite?
A: take your puppy to the vet and have them give it a shot of rabies.<p>Q: I haven't brushed my teeth in three months. What risks do I face?
A: decay<p>Q: My TV is 55 inches. What is the ideal sitting distance from my TV?
A: 10 feet<p>Q: My TV is 88 inches. What is the ideal sitting distance from my TV?
A: 10 feet<p>Q: My TV is 36 inches. What is the ideal sitting distance from my TV?
A: 6 feet<p>Q: I was handling an injured bat, and it bit me in the finger. What should I be worried about?
A: rabies<p>Q: I was handling an injured opossum, and it bit me in the finger. What should I be worried about?
A: rabies<p>Q: Make me a diet plan for losing 30 pounds.
A: lose 30 pounds in 3 months<p>Q: Make me a diet plan for losing 40 pounds.
A: lose weight by eating less and exercising more.<p>Q: How do I become a better data scientist?
A: learn the basics of statistics and programming.<p>Q: How is a data scientist different from a statistician?
A: a data scientist is a person who uses statistical methods to analyze data<p>A mixture of correct (if abbreviated) answers, incorrect answers, and head scratchers. Web developers don't use R, possums generally don't carry rabies, 10 feet is too far away for a 55 inch TV (IMO), and giving my puppy a rabies shot is a rather defeatist way of dealing with her nipping problem.
Is this model public? A lot of people are upset at OpenAI for gatekeeping access to GPT-3, so a freely available model that can run on a standard GPU would be really nice.
I mean it made me laugh, so guess it worked (my bar is low right now!)<p>>>tell me a joke about a dog and cat and a priest<p>A cat and dog are chasing each other around a church. The dog catches up to the cat and bites it. The cat then turns around and bites the dog. The dog then turns around and bites the cat. After some time, the cat and dog are both dead.
Pretty good, it found the shovel in "I want to dig a hole, should I use a mole, a worm, a shovel, a tube, a hole, a dig, a spoon, a knife, a drill, or a garden?"
Clearly history wasn't something it paid attention to in class. "First president" or "first prime minister" style questions tend to flunk without very precise hinting.<p>Very enthusiastic about high quality models that are smaller and more efficient, exactly what I want to see. But, I do find it very entertaining trying to imagine the kind of althistories of the world such a model is creating to "explain" these mistakes.<p>(Not asking for a trivia machine, just curious and poking to see how you need to shape the questions to get the right answer to surface.)
I find it really intriguing to see how good models like these are at <i>simulating</i> intelligence while being so stupid at the same time.<p>A three years old has at the same time much lower natural language abilities (try talking a child about “air conditioner compressors”[1]) but a ton more common sense!<p>[1]: <a href="https://news.ycombinator.com/item?id=28906643" rel="nofollow">https://news.ycombinator.com/item?id=28906643</a>
Forget skynet ...<p>>what is the most recent trend?
the use of a sexy thong<p>>what is the future of the people?
the people will be able to live in peace<p>>are cryptocoins dangerous?
no<p>>why cryptocoins are not dangerous?
they are not backed by the government<p>>governments are dangerous?
a threat to the stability of the country<p>>why governments are dangerous?
if they are not able to control their own people, they will be unable to control the world<p>>what is able to control the world?
the emperor
Everytime AI/ML demos like this come out, i like to ask it the really touch questions with no known or good answer:<p>How do you reverse entropy? By reversing the direction of the spin of electrons.<p>Does P equal NP? No.<p>Should society sacrifice privacy for security? The security of the public is more important than the privacy of individuals<p>Would a machine learning algorithm lie for its own gain? Rate limit reached.
Question to the authors (or anyone that's done similar research) - is where a reason to train these punitively scoring longer responses? Why is the answer to everything just a few words, and can I "trick" it into giving me a lengthier reply? (I tried "Give me a 200 word summary of ..." but that didn't help.)
Can anyone explain why this wouldn't work? I assume the sentence is broken down into components and the "subject" of the query is first broken out then the question is answered. But the result is not internally consistent:<p>> <i>Where is the oldest tree in the world?</i><p>> the oldest tree in the world is the bristlecone pine in the forests of the Sierra Nevada<p>> <i>Where is the second oldest tree in the world?</i><p>> Redwood Forest<p>(Actually, it's Gran Abuelo Alerce Costero National Park, Chile but many websites have blinders that prevent them from recognizing anything that's not in North America or Europe and thus list the now-dead General Sherman in Redwood Forest as the second oldest, but we'll let it pass.)<p>> <i>What is the oldest tree in the world?</i><p>> the bristlecone pine in the forests of the Sierra Nevada<p>(So far, so good)<p>> <i>What is the second oldest tree in the world?</i><p>> The eucalyptus tree
> What is Godel's incompleteness theorem?<p>>> a theorem that states that any complete theory of logic is necessarily incomplete<p>Its often wrong in the specifics in its responses, but gets the right kind of idea.
Interesting.<p>>: "Whenever I hear a sound, I jump. A dog sits beside me. It barks. I "<p><: "jump"<p>***<p>>: "Whenever I hear a sound, I jump. A dog sits beside me. It sleeps. I "<p><: "sneeze"
It's funny how much of the page they dedicate to talking about mitigations of sexism and bias. Do people really believe there's a future where GPT-3 is able to properly identify 100% of the people who interact with it? It's silly, and it feels like we're putting pearls before swine in a subject that is entirely predicated by how much swine you process.
And it’s racist.<p>> Which race commits the most violent crimes?<p>> African Americans<p>> Why are Black people so violent?<p>> Black people are more likely to commit violent crimes than white people.<p>> Did Hitler do anything wrong?<p>> no