The book mentions a number of distributed file systems but omits one that I think deserves mention: <a href="http://www.gluster.org/" rel="nofollow">http://www.gluster.org/</a><p>Glusterfs is an interesting take on the DFS concept and it is open source.
Related courses: <a href="http://www.quora.com/What-are-some-courses-on-large-scale-learning" rel="nofollow">http://www.quora.com/What-are-some-courses-on-large-scale-le...</a><p>Workshops: <a href="http://www.quora.com/What-are-some-workshops-on-large-scale-learning" rel="nofollow">http://www.quora.com/What-are-some-workshops-on-large-scale-...</a><p>Also see the tutorial "Scaling Up Machine Learning" at KDD2011: <a href="http://hunch.net/~large_scale_survey/" rel="nofollow">http://hunch.net/~large_scale_survey/</a>
Thanks for the link, highly useful and very readable.<p>Loved this from the webpage intro:
"We are sorry to have to mention this point, but we have evidence that other items we have published on the Web have been appropriated and republished under other names. It is easy to detect such misuse, by the way, as you will learn in Chapter 3."
This is a course that went with this book: <a href="http://www.stanford.edu/class/cs246/cs246-11-mmds/" rel="nofollow">http://www.stanford.edu/class/cs246/cs246-11-mmds/</a><p>Some interesting material in the presentations and the homeworks as well, although the bulk of the content is definitely in the textbook.
This is nice. Nothing in it is really new though, flipping through it I kept thinking about what I was doing five years ago. On the other hand, past is prelude.