TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

The Open Source Data Science Masters

95 pointsby nnsalmost 9 years ago

5 comments

neilsharmaalmost 9 years ago
Good collection, but as someone who has been slowly learning data science over the past few years, I think it needs far fewer lectures and waaaay more projects.<p>The biggest difficulty I have with learning data science is not how the algorithm or tools work, but the problem setup. Where is the data? How do I clean it? What insights can I draw from this? Which algorithms to use? What can I do with the algorithm assuming it works?<p>Most MOOC projects decide all this for you by giving you a set of tasks to do in order and skeleton code to work off of. Your job is simply to implement a small part of whatever algorithm you learned that week and press run. This way lacks creative development, exploration, trial and error, and critical thinking skills necessary when you go out in the real world.<p>Also, I think there should be more emphasis on publishing, even if your attempts are inaccurate. Push out a jupyter notebook to github of how you tested out a rudimentary monte carlo simulation on stock data. Or write a blog post with your attempt at determining how much silicon valley home prices will drop if 10K more family units magically existed in SF. Or try to code a random forest algorithm from scratch in a language of your choice. You don&#x27;t have to be right, but publishing forces you to at least take a critical look at your work and think about the material deeply. MOOCs, at least from my experience, just encourage you to move on to the next topic the moment your code works, without diving too deeply.
评论 #12323362 未加载
randcrawalmost 9 years ago
That&#x27;s a nice overview of autodidact resources for DS.<p>But I suggest that you tweak the name a little, like &quot;The Open DS Masters Program&quot; or &quot;Toward OSDS Mastery&quot;.<p>&quot;OSDS Masters&quot; sounds like a plural noun, like you&#x27;re trying to say, &quot;at this website you can find the great open source masters of data science&quot; -- like Richard Stallmann or the authors of Weka. It&#x27;s a bit confusing.
评论 #12320988 未加载
jmdealmost 9 years ago
This seems like a nice compilation for introductory material in one place.<p>I still can&#x27;t get over the term &quot;data science&quot;, though. Not only is it ridiculously meaningless - what sort of science doesn&#x27;t involve data, and how often would data be useful to something that isn&#x27;t scientific at some level - its meaninglessness derives from the hyped buzzword trendiness that drove its upswing.<p>I say this as someone whose expertise is really sitting at the nexus of what would be considered data science. I feel as if I have been doing what might be considered data science for a long time, before there was a label for it, but watching its ascendance in demand and popularity has been troubling. I should be happy, but I feel like it&#x27;s being driven by fashion rather than fundamentals, which makes me worried about the trajectory going forward, and disturbed by some communities being thrown under the bus.
评论 #12322125 未加载
评论 #12329034 未加载
评论 #12321519 未加载
评论 #12321649 未加载
评论 #12321495 未加载
评论 #12323603 未加载
评论 #12322056 未加载
评论 #12321841 未加载
评论 #12321736 未加载
评论 #12323133 未加载
Notre1almost 9 years ago
Clare Corthell, the creator of the Open Source Data Science Masters project, is interviewed in the 2016-07-30 episode of This Week in Machine Learning &amp; AI (TWiML):<p><a href="https:&#x2F;&#x2F;twimlai.com&#x2F;twiml-talk-1-clare-corthell-open-source-data-science-masters-hybrid-ai-algorithmic-ethics&#x2F;" rel="nofollow">https:&#x2F;&#x2F;twimlai.com&#x2F;twiml-talk-1-clare-corthell-open-source-...</a>
Rogerh91almost 9 years ago
I really like this collection of resources--it&#x27;s perfect for people really trying to get into the basics of data science.