This is easily the most interesting announcement so far. Machine learning has so many applications, but its use is constrained by the high barriers to entry. Recommendation engines, for example, are huge sales drivers, but few among even the largest ecommerce stores use them. A simple prediction interface that's built on the ML expertise at Google is a win for everyone.
I'd love to see how well it could predict comment ratings from Hacker News.<p>The following data would be a good start:<p>1. Text of comment<p>2. How many points the comment has<p>3. How many points the article has<p>4. Time article was posted<p>5. Time comment was posted<p>I'd also be interested to see what kind of user bias there is. If you don't provide user names, you could see what kind of rating a comment <i>should</i> have based on its content, and what rating it actually has because certain users are generally loved (pg) or hated (jasonmcalacanis) by the community.
Not enough details available on how it works. Would rather build my own at this point. Plus the way this is billed oversimplifies the whole model design process.<p>Sorry to sound so negative, but I just earned a PhD in Machine Learning. How would you feel if you were replaced by an API? :-(
I was guessing that google (in their never-ending desire to consume more data) would want to use us as guinea-pigs to improve their algorithms. It's not 100% clear to me, but from the terms of service:<p><i>By submitting, posting, displaying, or transmitting Data on or through the Service, you give Google permission to process your Data for the sole purpose of enabling Google to provide you with the Service in accordance with its privacy policy. You hereby grant Google all licenses to your Data necessary to process the Data and provide you with the Service in accordance with its privacy policy. As a part of the Service and through provided interfaces, Google may allow you to remotely access, view, and download results of the processing of your Data.</i> (via <a href="http://code.google.com/apis/predict/docs/terms.html" rel="nofollow">http://code.google.com/apis/predict/docs/terms.html</a>)<p>I imagine that they might claim the right to use your data anonymously to improve their algorithms, much like they do for your personal data in their other apps. I mean, what better way to refine their supervised learning algorithms than via an endless supply of training sets? But I hate wading through legalese, anyone have any insights?
From the very little information that I see available so far, it appears that Google will first stab at discrete predictions. That is, I don't see probabilistic output yet.<p>Also, from <a href="http://code.google.com/apis/predict/docs/developer-guide.html" rel="nofollow">http://code.google.com/apis/predict/docs/developer-guide.htm...</a>, it is clear that they perform accuracy analysis using the training data. That is, there is no "testing" vs "training" dataset distinction at this point; there is just cross-validation of the training set.
This is interesting:<p>"Automatically selects from several available machine learning techniques"<p>So not only does it learn, it's learning which learning techniques work best for different problems.
As a non-techie, I don't understand the language example they're using. It seems to me many prediction engines are originally built to try to forecast winning lottery numbers or other such gambling events. Google expects me to believe they did this for language?
Basically what I see is they implemented an open platform for running classification algorithms that gives discrete categories as output. Automatic selection from multiple machine learning methods - maybe just simple cross-validation.
"Upload your data to Google Storage for Developers, then use the Prediction API to make real-time decisions in your applications."<p>I can understand the necessity of this, but that'll be some serious lock-in.