TechEcho

8 comments

mauritsalmost 14 years ago

One can also have a look at the stanford course "Mining Massive Datasets". No video lectures (yet) but course information here:<a href="http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.html" rel="nofollow">http://www.stanford.edu/class/cs246/cs246-11-mmds/handouts.h...</a>The book here: <a href="http://infolab.stanford.edu/~ullman/mmds.html" rel="nofollow">http://infolab.stanford.edu/~ullman/mmds.html</a>

评论 #2623086 未加载

ajaysalmost 14 years ago

Dated: 15-May-2002 . Just sayin'. There's been a lot of work in the last decade on this subject.

评论 #2623768 未加载

评论 #2623277 未加载

grkalmost 14 years ago

<a href="https://massivedatasets.wordpress.com/" rel="nofollow">https://massivedatasets.wordpress.com/</a> from a Danish Technical University course with the same name

cdavidalmost 14 years ago

As mentioned by others, this list is old. The cited algorithms are certainly still good to know, but the meaning of massive is different now. Today, massive means:<pre><code> - too large to fit even in big iron (few people can afford them anyway) - low value: a lot of data are useless / too bad to be useful, so not taking into account all of them all the time is not too bad. </code></pre> Nothing outside near linear or even sublinear algorithms really work in those cases. Singular Value Decomposition is a great example. Up to recently, it was mostly about about doing fast, accurate SVD for large matrices. There is a recent surge on approximate algorithms which see any data only once at most. This is useless for most "hard" engineering tasks, but for analysis of large graph data, you can most likely tolerate a few % of error in your biggest singular values to still get something useful.The fun part is that things as simple as matrix multiplication become an interesting and potentially hard problem.

pyronicidealmost 14 years ago

Would anyone know if there's audio/video of these lectures? I keep seeing amazing classes like this and wishing that everyone could enjoy them instead of just the local students.

sicularsalmost 14 years ago

Slides, notes and papers from Sergei Vassilvitskii's class on a similar topic, COMS 6998-12: Dealing with Massive Data, <a href="http://www.cs.columbia.edu/~coms699812/" rel="nofollow">http://www.cs.columbia.edu/~coms699812/</a> .

评论 #2623501 未加载

helwralmost 14 years ago

also see <a href="http://www.quora.com/Machine-Learning/What-are-some-introductory-resources-for-learning-about-large-scale-machine-learning" rel="nofollow">http://www.quora.com/Machine-Learning/What-are-some-introduc...</a>

chrisaycockalmost 14 years ago

Also of potential interest is Stanford's "Workshop on Algorithms for Modern Massive Data Sets" (MMDS):<a href="http://www.stanford.edu/group/mmds/" rel="nofollow">http://www.stanford.edu/group/mmds/</a>

8 comments

mauritsalmost 14 years ago

评论 #2623086 未加载

ajaysalmost 14 years ago

Dated: 15-May-2002 . Just sayin'. There's been a lot of work in the last decade on this subject.

评论 #2623768 未加载

评论 #2623277 未加载

grkalmost 14 years ago

<a href="https://massivedatasets.wordpress.com/" rel="nofollow">https://massivedatasets.wordpress.com/</a> from a Danish Technical University course with the same name

cdavidalmost 14 years ago

pyronicidealmost 14 years ago

Would anyone know if there's audio/video of these lectures? I keep seeing amazing classes like this and wishing that everyone could enjoy them instead of just the local students.

sicularsalmost 14 years ago

评论 #2623501 未加载

helwralmost 14 years ago

chrisaycockalmost 14 years ago

Algorithms for Massive Data Sets

8 comments

Algorithms for Massive Data Sets

8 comments