TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Introducing Streaming K-Means in Spark MLlib 1.2

69 点作者 rxin超过 10 年前

4 条评论

rxin超过 10 年前
This is a cool feature, and is one of the prime example of what Spark's tight integration of various libraries can enable (in this case Spark Streaming and MLlib). It was originally designed by Jeremy Freeman to handle workloads in neuroscience, which IIRC was generating data at 1TB/30mins.
hcrisp超过 10 年前
Sounds similar to an exponential moving average[1], which itself is a one-pole IIR digital filter. [1] <a href="http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average" rel="nofollow">http:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Moving_average#Exponential_movi...</a>
michaelmior超过 10 年前
Is it true that this doesn&#x27;t support dynamic values of k? That is, the algorithm isn&#x27;t adaptive to a changing number of clusters? That said, I suppose for some small range of k values, you could do this trivially by tracking them all and picking the best.
评论 #8962277 未加载
评论 #8962402 未加载
cfregly超过 10 年前
very interesting post. ironically, hackernews uses a similar type of time-decay algorithm!