TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Long-Form Question Answering

33 点作者 stablemap将近 6 年前

4 条评论

olooney将近 6 年前
I&#x27;m not sure &quot;we scraped ELI5&quot;[1] is really such a substantive advancement of the state-of-the-art that it deserves such a large write up. The Stanford Question &amp; Answer Dataset is much more carefully curated.[2]<p>ROGUE[3] and BLEU[4] are pretty meaningful metrics for translations and for fairly short answers that that really only be phrased one way. For example, &quot;What is the biggest mammal?&quot; should be answered &quot;The Blue Whale.&quot; There is little room for ambiguity: the words &quot;Blue&quot; and &quot;Whale&quot; <i>must</i> appear, as must the bigram &quot;Blue Whale&quot; for the answer to be correct. For a large or complex answer, the situation is different. Metrics based on word overlap like ROGUE and BLEU must either incentive memorizing the answer given in the training set (overfitting) or the inappropriately penalize semantically equivalent answers. For example, for the question &quot;why is the sky blue?&quot; if the algorithm produces &quot;The sky is blue because Raleigh scattering off of water droplets preferentially scatters blue light at right angles. This is also why sunsets are red.&quot; and the answer on file is &quot;light with long wavelengths passes straight through moist air, while light with short wavelengths tends to be deflected.&quot; Both answers are correct - indeed they are basically the same answer - yet they share so few words, bigrams, and trigrams that they would have to marked &quot;wrong.&quot;<p>[1]: <a href="https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;explainlikeimfive&#x2F;" rel="nofollow">https:&#x2F;&#x2F;www.reddit.com&#x2F;r&#x2F;explainlikeimfive&#x2F;</a><p>[2]: <a href="https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;" rel="nofollow">https:&#x2F;&#x2F;rajpurkar.github.io&#x2F;SQuAD-explorer&#x2F;</a><p>[3]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ROUGE_(metric)" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;ROUGE_(metric)</a><p>[4]: <a href="https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;BLEU" rel="nofollow">https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;BLEU</a>
aargh_aargh将近 6 年前
Am I weird for immediately jumping to conclusion that the first thing this will be used for is generating tons of rather low-quality content for &quot;SEO&quot; purposes?
6gvONxR4sf7o将近 6 年前
It&#x27;s interesting to see the path towards increasingly sophisticated question in --&gt; answer out. The way people do this is with dialogue, but that doesn&#x27;t fit as easily into standard supervised learning with an easy-to-collect dataset.<p>If you asked me &quot;What&#x27;s a good restaurant nearby?&quot; I wouldn&#x27;t answer with a list of restaurants, I&#x27;d say &quot;What kind of food do you feel like?&quot; Seems like we aren&#x27;t even working our way in that direction. Maybe the sample complexity of RL and language modeling needs to come down a ton first.
dvtrn将近 6 年前
I&#x27;m having the dickens of a time understanding who Facebook built this for. Marketers, right?
评论 #20528122 未加载
评论 #20527964 未加载
评论 #20527672 未加载