Wait, Spark has built-in Model Hyperparameter selection (<a href="http://spark.apache.org/docs/latest/ml-tuning.html" rel="nofollow">http://spark.apache.org/docs/latest/ml-tuning.html</a>), that was not mentioned in the article. What advantages does your service do?<p>Relatedly, why are you advocating using MLLib/RDDs when they have been deprecated in favor of ML/DataFrames (<a href="http://spark.apache.org/docs/latest/ml-guide.html" rel="nofollow">http://spark.apache.org/docs/latest/ml-guide.html</a>)?
I could give a shit about the hyperparameter tuning (CV... it Works For Me) but your writeup of Gaussian processes and why they are called kriging in spatial stats is awesome.<p><a href="http://blog.sigopt.com/post/130275376068/sigopt-fundamentals-intuition-behind-gaussian" rel="nofollow">http://blog.sigopt.com/post/130275376068/sigopt-fundamentals...</a>
So SigOpt was tuning rank (number of latent factors), number of iterations to run the algorithm (in my experience alternating least squares generally converges within 10-20 iterations, but there'd be no downside to running it longer unless it's overfitting), and the regularization strength.<p>What optimal parameters did it find for these?
I'm one of the co-founders of SigOpt (YC W15) and am happy to answer any questions about this post (or anything about SigOpt).<p>More info on the methods behind SigOpt can be found at <a href="https://sigopt.com/research" rel="nofollow">https://sigopt.com/research</a>.
Oh, also, for students: <a href="https://sigopt.com/edu" rel="nofollow">https://sigopt.com/edu</a><p>I'm worried this is going to be like good Scotch for me.
There is package, mlrMBO, created by the great guys who created mlr (absolutely awesome for building pipelines, you will ditch caret in a second!). Not on Spark obviously, but thought some might find it useful.<p><a href="https://github.com/mlr-org/mlrMBO" rel="nofollow">https://github.com/mlr-org/mlrMBO</a>
<a href="https://sigopt.com/pricing" rel="nofollow">https://sigopt.com/pricing</a><p>- Individual: $1,000/month<p>- Enterprise: Custom pricing<p>I am not a multi-million $ company, so I guess it's useless for me.