TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

How to build your own "Watson Jr." in your basement

147 点作者 flapjack大约 14 年前

8 条评论

gojomo大约 14 年前
My hunch is that 90-99% of all Jeopardy questions can be answered with information in Wikipedia/Wiktionary, properly understood.<p>So I'd start with Wikipedia: ~30GB uncompressed full article text. Break it into chunks; canonicalize phrasings to be more declarative, and include synonyms/hypernym/hyponym phrasings (via something like WordNet), so that various 'cluesy' ways of saying things still bring up the same candidate answers.<p>Because it's free and compact and well-structured, throw in Freebase, too.<p>Jeopardy goes back to certain topics/answers again and again. So I'd scrape the full 200K+ clue "J!Archive", and use it as both source and testing material (though of course not testing the system on rounds in its memory).<p>And I'd add special interpretation rules for commonly-recurring category types: X-letter words, before-and-after, quasi-multiple-choice, words-in-quotes.<p>I think such a system might get half or more of the questions in a typical round correct, and in a matter of seconds, even on a single machine.
评论 #2256823 未加载
评论 #2256831 未加载
评论 #2306752 未加载
tel大约 14 年前
<i>Search optimization: No, this team focused on making IBM Watson optimized to answer in 3 seconds or less. We can accept a slower response, so we can skip this.</i><p>That makes me laugh. I'd guess that search optimization effort has a power law response here. 3 seconds is extraordinary, 1 minute is tricky, 10 minutes is possible after some solid effort, 3 days-heat death of universe is what you get without optimization.<p>Not saying you actually ignore it. It's built into those libraries they casually throw around. Just thought the wording was funny.
srean大约 14 年前
I somehow cannot give up daydreaming wistfully about a personal CM-5. From a previous discussion on HN it seems it still is going to be an expensive thing to build as a toy project. Particularly because of the hyper-cube inter-connection. Not sure if the source code for star-Lisp is available. But I think an emulator lives on at Sourceforge.<p>Edit 1: Here it is <a href="http://sourceforge.net/projects/starsim/" rel="nofollow">http://sourceforge.net/projects/starsim/</a><p>Edit 2: Just doubled checked, the Sourceforge repository has no code !! But I found it here <a href="http://examples.franz.com/category/Application/ParallelProgramming/index.html" rel="nofollow">http://examples.franz.com/category/Application/ParallelProgr...</a> @dhess Thanks a lot for that link. I just ordered a copy :)
评论 #2256707 未加载
charlesju大约 14 年前
It seems to me it would be a lot easier to use EC2 than to setup all the machines in the basement.
kirpekar大约 14 年前
Watson has become an unbelievable marketing tool for IBM.
评论 #2256815 未加载
评论 #2256933 未加载
moomba大约 14 年前
This article doesn't really tell you how to build a "Watson Jr." as they call it. It just tells you to use OpenNLP and UIMA (which is unnecessary, but understandable why its advocated since IBM created it).<p>I was kind of hoping that there would be a deeper dive into how the data was being stored and retrieved. I'm also interested in the Machine Learning side of it. They don't really give any hints at that as well.
评论 #2257288 未加载
joakin大约 14 年前
I hate when I enter a page Im interested in that makes an ajax call whenever I click something, since I read long texts clicking and selecting text.<p>Why man? Why? If you want to track clicks on links with javascript, dont trigger the ajax call when I do click in :not(a) ... -_-'
maeon3大约 14 年前
This is one of the great moments in the history of humanity, right up there with the first self-powered flying machine. North Carolina got a licence plate: "FIRST IN FLIGHT". Someone is going to get the credit for open ended question answering machine shortly. Who gets it?<p>The race is on, whoever creates the first reasonably good question-answering machine for demonstration will get their names etched into the sands of time for the next ten thousand years. Get to it!<p>This industry has the chance to be bigger than Google and Microsoft combined. Every person on the Earth will demand one of these. Those who won't have one will be at a remarkable disadvantage. This is going to turn into a trillion dollar industry.
评论 #2258852 未加载