I like the curation of free educational content in a specific area because it eliminates the guesswork, and duplicated effort, of filtering for high-quality resources. Thanks to Claudia Gold for the amazing amount of work she put into this. My main gripe comes with the majority of these data science courses/tracks.<p>It appears that no comprehensive treatment of applied data science exists. For the past few months, I've been searching high-and-low. I understand collaborative filtering; I've heard about the Netflix recommendation challenge ad nauseam; I grasp machine learning, bayesian statistics (prior, posterior, conjugate prior distributions, etc.) on a superficial level. Conversationally, I can hold my own with practitioners', albeit on a beginner level.<p>But what I, and others, want to learn is how to apply these techniques in a scalable way on a real production system. Right now, it's easy to conjecture about what could/should be done, but there's a lack of confidence in how to achieve the goals. I'm experimenting with a collaborative filtering problem using Cassandra as the data store for thumbs up/down ratings on products, and Hadoop for the MR pipeline; it'd be great to have more visible examples available. Is there any place I could find detailed information on real, online machine learning/statistical inference systems?