TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Bayesian Optimization for Collaborative Filtering with MLlib

42 pointsby Zephyr314almost 9 years ago

9 comments

minimaxiralmost 9 years ago
Wait, Spark has built-in Model Hyperparameter selection (<a href="http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ml-tuning.html" rel="nofollow">http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ml-tuning.html</a>), that was not mentioned in the article. What advantages does your service do?<p>Relatedly, why are you advocating using MLLib&#x2F;RDDs when they have been deprecated in favor of ML&#x2F;DataFrames (<a href="http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ml-guide.html" rel="nofollow">http:&#x2F;&#x2F;spark.apache.org&#x2F;docs&#x2F;latest&#x2F;ml-guide.html</a>)?
评论 #12258708 未加载
评论 #12257779 未加载
评论 #12258394 未加载
apathyalmost 9 years ago
I could give a shit about the hyperparameter tuning (CV... it Works For Me) but your writeup of Gaussian processes and why they are called kriging in spatial stats is awesome.<p><a href="http:&#x2F;&#x2F;blog.sigopt.com&#x2F;post&#x2F;130275376068&#x2F;sigopt-fundamentals-intuition-behind-gaussian" rel="nofollow">http:&#x2F;&#x2F;blog.sigopt.com&#x2F;post&#x2F;130275376068&#x2F;sigopt-fundamentals...</a>
评论 #12264238 未加载
a1k0nalmost 9 years ago
So SigOpt was tuning rank (number of latent factors), number of iterations to run the algorithm (in my experience alternating least squares generally converges within 10-20 iterations, but there&#x27;d be no downside to running it longer unless it&#x27;s overfitting), and the regularization strength.<p>What optimal parameters did it find for these?
评论 #12258099 未加载
Zephyr314almost 9 years ago
I&#x27;m one of the co-founders of SigOpt (YC W15) and am happy to answer any questions about this post (or anything about SigOpt).<p>More info on the methods behind SigOpt can be found at <a href="https:&#x2F;&#x2F;sigopt.com&#x2F;research" rel="nofollow">https:&#x2F;&#x2F;sigopt.com&#x2F;research</a>.
apathyalmost 9 years ago
Oh, also, for students: <a href="https:&#x2F;&#x2F;sigopt.com&#x2F;edu" rel="nofollow">https:&#x2F;&#x2F;sigopt.com&#x2F;edu</a><p>I&#x27;m worried this is going to be like good Scotch for me.
blahialmost 9 years ago
There is package, mlrMBO, created by the great guys who created mlr (absolutely awesome for building pipelines, you will ditch caret in a second!). Not on Spark obviously, but thought some might find it useful.<p><a href="https:&#x2F;&#x2F;github.com&#x2F;mlr-org&#x2F;mlrMBO" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;mlr-org&#x2F;mlrMBO</a>
tachimalmost 9 years ago
How does SigOpt compare to GPs?
评论 #12258343 未加载
visargaalmost 9 years ago
<a href="https:&#x2F;&#x2F;sigopt.com&#x2F;pricing" rel="nofollow">https:&#x2F;&#x2F;sigopt.com&#x2F;pricing</a><p>- Individual: $1,000&#x2F;month<p>- Enterprise: Custom pricing<p>I am not a multi-million $ company, so I guess it&#x27;s useless for me.
评论 #12263106 未加载
idewanckalmost 9 years ago
Post author here, happy to answer any questions as well.