I wrote an article about mean-shift a while back if anyone is interested in more details about it - <a href="https://spin.atomicobject.com/2015/05/26/mean-shift-clustering/" rel="nofollow">https://spin.atomicobject.com/2015/05/26/mean-shift-clusteri...</a><p>Some comments on K-Means - one large limitation of K-Means is that it assumes spherical shaped clusters. It will fail terribly for any other cluster shape.<p>It's interesting that the author compared results on the same data set for the different algorithms. Each clustering approach is going to work best on a specific type of data set. It would be interesting to compare them across several different data sets to get a better feel for strengths/weaknesses, etc.
This is very specific to 2D data. I bet the story is a lot different for high-dimensional data. The challenges you encounter with this sort of clumping in 2D is unlikely to occur in high-dimensional data due to the "curse" of dimensionality. Clustering in high dimensions has its own quirks and gotchas too, but they're quite distinct from the gotchas of low-dimensionality.<p>Here's more from an actual expert: <a href="http://research.microsoft.com/en-US/people/kannan/book-chapter-2.pdf" rel="nofollow">http://research.microsoft.com/en-US/people/kannan/book-chapt...</a>
One of the better comparisons of Clustering Algorithms online. It's also worth checking out how HDBScan works: <a href="http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb" rel="nofollow">http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/mas...</a>
These all look like clustering based on a 2d space, but does anyone know methods to tackle clustering on a network?<p>Is it just a matter of tweaking the definition of density / distance to the number of hops, or is it a different problem entirely? I can see how with 0 or 1 hops the data would be a very smushed distribution, versus 2d distance is much more rich and spread out.
I would say there are two aspects of clustering that are important: accuracy of the model, and accuracy of the model fitting process.<p>A model is a guess about the underlying process that "generates" the data. If you're trying to use hyperplanes to divide data that lies on a manifold, then you are going to have poor results no matter how good your fitting algorithm is.<p>On the other hand, even if you know the true model, high levels of noise can prevent you from recovering the correct parameters. For instance, Max-Cut is NP-hard, and the best we can do is a semidefinite programming approximation. Beyond a certain noise threshold, the gap between the SDP solution and the true solution becomes very large very quickly.
This is a great resource. As someone who knows little about clustering, this was clear and very informative. It covers potential pitfalls much better than other similar documents I've seen, which is a useful approach.
how about t-SNE for clustering? <a href="https://lvdmaaten.github.io/tsne/" rel="nofollow">https://lvdmaaten.github.io/tsne/</a>
As someone intrested in the subject but with 0 insight or knowledge, would these algorithms be a good match for short text clustering? For example identifying identical products in a price comparison app based on their similar but not identical title/name and possibly other attributes.
Why not use Density Cluster based on this 2014 Science Paper?
<a href="http://science.sciencemag.org/content/344/6191/1492" rel="nofollow">http://science.sciencemag.org/content/344/6191/1492</a>
It would be interesting to see what Agglomerative Clustering the author is using here. I suspect for this two dimensional, density based cluster dataset, single-link agglomerative would perform much better than what is shown (likely average link).