Alas, my somewhat-skeptical question came in late and got little support:<p><a href="http://www.reddit.com/r/IAmA/comments/fnfg3/by_request_we_are_the_ibm_research_team_that/c1h90u4" rel="nofollow">http://www.reddit.com/r/IAmA/comments/fnfg3/by_request_we_ar...</a><p><i>What determined the use of exactly 10 racks of 9 maxed-out (32-core, 512GB RAM) 4U Power750 servers? For example, would Watson have done better with more hardware? Or could it have made-do with far less, after all the bulk pre-processing of, and training on, source material was finished?<p>(My intuitions about the necessary amount of reference data and topical associations – written up at <a href="http://redd.it/fnixm" rel="nofollow">http://redd.it/fnixm</a> – made me think way less hardware should have been required, at least at the very end during the match.)</i>
I also feel that they sort of jogged around the buzzing in question. Obviously Watson has to calculate and decide his answer, but there is no denying that he was very fast on the buzzer in the game.
Question 3 was the most interesting but data on parsing remarkably incomplete. As far as I can tell, we have only lists of possible ways to break down the data without any explanation of how or why one possible way is preferred to another.<p>Case in point (1):How it decides to treat "Treasure Island" as a proper noun. We see only "modifies(Treasure, Island)" -- indicating that it treats treasure and adjective modifying island, then suddenly in the semantic assumption phase they are treated as a compound.<p>Case in point (2). We are given:<p><pre><code> island(Treasure Island)
location(Treasure Island)
resort(Treasure Island)
book(Treasure Island)
movie(Treasure Island)
</code></pre>
I assume what he is giving us is method names written in Java with "Treasure Island" as the single argument that return a value indicating the likelihood that "Treasure Island" is what the method name refers to. This is extraordinarily interesting. However, it is not at all clear which methods are chosen and why, if they are run in some sort of sequence or simultaneously, etc .<p>Case in point (3) : "Builds different semantic queries based on phrases, keywords and semantic assumptions." This is very vague but indicates that Watson generate a set of queries which it runs against its own internal search engine ranking answers presumably based on the quality of the initial search and the confidence of the answer. Would be very very cool to have an example.<p>All in all, wets the appetite but leaves one wishing for more hearty fare (or a job at IBM!).
Some interesting nuggets in here, I had watched the Nova specials on Watson etc. I would have liked to have had a question about the team, their work stress etc., but otherwise a fun read. I especially enjoyed the step by step parsing and examination of a question in the process of how Watson would work through it.
One could make the argument that since Watson is trained with English information and English Jeopardy! clues, English <i>is</i> Watson's native language. Sure, there's Java down to Assembly beneath Watson's understand of English, but the same goes for native English-speaking humans. English speakers aren't biologically any different than, say, French speakers.