科技回声

rxin超过 10 年前

This is a cool feature, and is one of the prime example of what Spark's tight integration of various libraries can enable (in this case Spark Streaming and MLlib). It was originally designed by Jeremy Freeman to handle workloads in neuroscience, which IIRC was generating data at 1TB/30mins.

hcrisp超过 10 年前

Sounds similar to an exponential moving average[1], which itself is a one-pole IIR digital filter. [1] <a href="http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average" rel="nofollow">http://en.wikipedia.org/wiki/Moving_average#Exponential_movi...</a>

michaelmior超过 10 年前

Is it true that this doesn't support dynamic values of k? That is, the algorithm isn't adaptive to a changing number of clusters? That said, I suppose for some small range of k values, you could do this trivially by tracking them all and picking the best.

评论 #8962277 未加载

评论 #8962402 未加载

cfregly超过 10 年前

very interesting post. ironically, hackernews uses a similar type of time-decay algorithm!

Introducing Streaming K-Means in Spark MLlib 1.2

4 条评论

Introducing Streaming K-Means in Spark MLlib 1.2

4 条评论