Hi, this is Nicholas, a long time lurker on HN and the person in the video. I saw this thread during my morning commute to work (and was very surprised, to say the least!) and wanted to register to mention a few important details that the news articles always omit. Hopefully this helps correct a few misconceptions!<p>To begin, I'd like to flatly deny that I "built a better search engine." I did my (very academic) work in information retrieval and developed a new algorithm that seems to give significantly better search results (when compared to other academic search techniques, more on this later) on short documents like Twitter tweets. Specifically, my algorithm uses random walks (modelled as Markov chains) on graphs of terms representing documents to perform a type of semantic smoothing known as document expansion, where a statistical model of a document's meaning (usually based on the words that appear in the document) is expanded to include related words. My system is in no way, shape, or form a "search engine" or even comparable with something like Google---rather, it is an algorithm that could help improve search results in a real, commercial search engine.<p>My work is not, by far, the first to attempt document expansion. A number of related techniques, including pseudo-relevance feedback expansion, translation models, some forms of latent semantic indexing, and some of those mentioned by exg already exist. However, to my knowledge, the knowledge of my science fair juges (some of whom are active IR researchers), and the knowledge of my research mentor (also more on this later), my work is a novel method (not a synthesis of existing methods) that seems to work quite well in comparison to other, similar, algorithms on collections of small documents like tweets.<p>The last point is certainly important: it is simply impossible to compare my algorithm to something like Google, for several reasons. First, I'm not a software engineer or a large company; it is downright impossible for me to craft a combination of algorithms like that found in Google to get comparable results. No commercial search engine would be so foolish as to use only a single algorithm (essentially a single feature, from an ML perspective). Instead, they use hundreds or thousands. Second, it is essentially impossible to compare search engines with any level of scientific rigour. I evaluated my system using a standard corpus of data published by NIST as part of TREC (the Text REtrieval Conference), consisting not only of 16+ million tweets, but also of sample queries and the correct, human-determined results for these queries. However, to achieve statistically comparable results, many variables have to be controlled in a way that is impossible with a large, complex search engine. Instead, the academic approach compares individual algorithms one-on-one and postulates that these can be combined to give better search results in aggregate.<p>Specifically, my research showed that my system achieved above-median scores on the official evaluation metrics of the 2011 Microblog corpus when compared to research groups that published last November. Furthermore, my system did the best of all of the "single algorithm" systems, including those that used other document expansion techniques like I described above.<p>Most of my work was spent on the development of the algorithm, proofs of its convergence and asymptotic complexity, a theoretical framework, and a statistical analysis of my results. Notably absent from this list is engineering. My project is not, by any means, "a toy engineering project" as some commenters have suggested. Actually, the engineering in my project is quite poor, as that area is not one I've had much exposure to.<p>To briefly address my research mentor: my parents had nothing to do with my project other than providing emotional support when I was stressed. I had a research mentor at a university who I found after I did very well at the 2011 Canada-Wide Science Fair. He provided me with important computational and data resources (such as the corpus I used), but did not develop my algorithm, proofs, or code, which were my own work.<p>Given the recent attention of my project (and Jack Andraka's project on cancer detection), I'd like to point out a general trend in news articles about science fair projects. In general, the media has a tendency to focus on the potential applications of a project and ignore the science in it, leading to (seemingly fair) criticism. Using me as an example, the talk about "toy" projects and "synthesis" is fair given how it is portrayed in the media. Somehow, "novel IR algorithm based on Markov chain-based document expansion," even with careful (and thorough!) explanation, gets turned into "Teen builds a better search engine." Similarly, a great friend (and roommate) of mine whose project on drug combinations to treat cystic fibrosis was completely shredded on Reddit when it got significant media attention last year. In his project, he never once claimed or tried to claim that he had done anything with immediate (or even near) medical applications. Instead, he discussed his work to identify molecules that bind to different sites on the damaged protein and can work synergistically as drugs. The media spin-machine quickly turned this into "Teen cures cystic fibrosis" and other such nonsense. Even Jack's project (I know both him and his project), which is unusually "real world" has being overspun by the media. It's just what happens. Heck, people even make fun of it in upper-level science fairs, but it still happens.<p>Finally, thank you for the encouraging words! To finish with a shameless plug, I'd like to point out that, while fairs like ISEF tend to be very well-funded (because of the positive publicity). However, many regional and state (in the US) or national (outside of the US) youth science organizations struggle to find funding (and even volunteers) to run fairs that send people to ISEF. If you ever find yourself in a position where you can help (financially, with your time, whatever), I'd strongly encourage it. Given the impact the science fairs have had on my life, I know that I certainly will.