TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: What journals and blogs should I be reading to become a data scientist?

82 点作者 dewang超过 11 年前

25 条评论

davidw超过 11 年前
Seen on twitter today:<p><a href="https://twitter.com/jeremyjarvis/status/428848527226437632" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;jeremyjarvis&#x2F;status&#x2F;428848527226437632</a><p>&quot;A data scientist is a statistician who lives in San Francisco.&quot;
评论 #7157196 未加载
评论 #7158072 未加载
评论 #7157684 未加载
评论 #7160597 未加载
评论 #7158923 未加载
rch超过 11 年前
You know, I absolutely see where the poster is coming from, and the suggestions look helpful so far, but the question might as well read: What journals and blogs should I be reading to become a Cardiothoracic Surgeon?<p>(though hopefully nobody bleeds out on a table when someone misconstrues statistical data)<p>We&#x27;ve lived through an amazing time where one could learn by doing, and talented people have been able to compete without the benefit of formal education (myself included), but in my opinion those days are numbered.<p>I&#x27;ve personally observed respected PhD statisticians stumble on the type of problems a data scientist is expected to address. The combination of complex software and often counterintuitive mathematics makes this an imposing field for all but perhaps the top one percent of practitioners. Most everybody else needs to really hit the books for a few years, in a formal setting.<p>With that pre-coffee rant out of the way, I&#x27;m looking forward to finding some new sources here myself. So, in that spirit, thanks for the question.
评论 #7157965 未加载
daniyaln超过 11 年前
Subscribe to Data Science Weekly: <a href="http://www.datascienceweekly.org/" rel="nofollow">http:&#x2F;&#x2F;www.datascienceweekly.org&#x2F;</a><p><a href="http://blog.zipfianacademy.com/" rel="nofollow">http:&#x2F;&#x2F;blog.zipfianacademy.com&#x2F;</a><p><a href="http://blog.cloudera.com/blog/category/data-science/" rel="nofollow">http:&#x2F;&#x2F;blog.cloudera.com&#x2F;blog&#x2F;category&#x2F;data-science&#x2F;</a><p><a href="http://www.hilarymason.com/" rel="nofollow">http:&#x2F;&#x2F;www.hilarymason.com&#x2F;</a><p><a href="http://mathbabe.org/" rel="nofollow">http:&#x2F;&#x2F;mathbabe.org&#x2F;</a><p><a href="http://fivethirtyeight.blogs.nytimes.com/" rel="nofollow">http:&#x2F;&#x2F;fivethirtyeight.blogs.nytimes.com&#x2F;</a><p><a href="http://blog.kaggle.com/" rel="nofollow">http:&#x2F;&#x2F;blog.kaggle.com&#x2F;</a><p><a href="http://grepalex.com/" rel="nofollow">http:&#x2F;&#x2F;grepalex.com&#x2F;</a><p><a href="http://pulseblog.emc.com/category/big-data/" rel="nofollow">http:&#x2F;&#x2F;pulseblog.emc.com&#x2F;category&#x2F;big-data&#x2F;</a><p><a href="http://radar.oreilly.com/" rel="nofollow">http:&#x2F;&#x2F;radar.oreilly.com&#x2F;</a><p><a href="http://flowingdata.com/" rel="nofollow">http:&#x2F;&#x2F;flowingdata.com&#x2F;</a><p><a href="http://oreilly.com/data/newsletter.html" rel="nofollow">http:&#x2F;&#x2F;oreilly.com&#x2F;data&#x2F;newsletter.html</a><p><a href="http://www.gapminder.org/" rel="nofollow">http:&#x2F;&#x2F;www.gapminder.org&#x2F;</a><p><a href="http://mlcomp.org/" rel="nofollow">http:&#x2F;&#x2F;mlcomp.org&#x2F;</a>
评论 #7157189 未加载
joshvm超过 11 年前
Don&#x27;t bother with journals - in pretty much any subject - unless you have a degree and&#x2F;or you understand what to look for, or are directed to notable articles in bibliographies or by peers. There is a lot of crap in all journals, it&#x27;s often needlessly technical for practical purposes or too bleeding edge to actually be useful yet.<p>I&#x27;m not trying to be snarky, but honestly unless you know what you&#x27;re looking for it&#x27;s a fool&#x27;s game. Once you&#x27;ve got the feel for a subject, you tend to find several authors that crop up time and time again, or landmark papers that really shifted the field. But that takes a long time, it takes most PhD students a year to fully understand and simply collate the background of a topic they may think they know a lot about.<p>That and no one <i>actually</i> reads journals. You do a search on Web of Knowledge or ADS or arXiv or whatever your poison and you see what comes up. Point is, you need to know what you&#x27;re looking for.<p>This is akin to saying that if you read Phys Rev enough, you&#x27;ll become a physicist. Sure, sure, keep up with the trends, but big important results get press which is enough to rely on to start off with.<p>To become a data scientist? Read the recommended textbooks and take a proper degree in statistics, computer or data science. Look at the courses on EdX and Coursera for a starting point, they&#x27;ll help you decide whether this is something you seriously want to pursue.<p>Even if this is just a hobby, e.g. you&#x27;re a coder that wants to branch out, you should still take the time to invest in education properly. Data science, like statistics in general, is very easy to mess up. When people draw bad conclusions from data (and good data scientists can make up any conclusion from any data set), bad things inevitably happen. Entire threads of science have been destroyed because somewhere, someone messed up their stats and apparently important results are meaningless.
评论 #7157669 未加载
mswen超过 11 年前
Becoming a data scientist isn&#x27;t a matter of reading journals and blogs. You can get a sense of the field and what is required by reading those sites but becoming a data scientist is years of hard work.<p>You need to develop serious skills in at least 4 of the following disciplines. Statistical analysis<p>RDMS query development<p>NoSQL databases<p>Machine learning<p>Natural Language Processing<p>Web crawling and data harvesting techniques<p>Programming to access data APIs<p>Web development<p>Data visualization<p>Systems in business that generate data including, CRM, ERP and more<p>Geospatial data systems<p>Each of these areas would have its own set of resources both formal and informal.
评论 #7157298 未加载
visakanv超过 11 年前
<a href="http://www.informationisbeautiful.net/" rel="nofollow">http:&#x2F;&#x2F;www.informationisbeautiful.net&#x2F;</a><p><a href="http://blog.okcupid.com/" rel="nofollow">http:&#x2F;&#x2F;blog.okcupid.com&#x2F;</a><p><a href="https://www.facebook.com/data/" rel="nofollow">https:&#x2F;&#x2F;www.facebook.com&#x2F;data&#x2F;</a><p><a href="http://www.pornhub.com/insights/" rel="nofollow">http:&#x2F;&#x2F;www.pornhub.com&#x2F;insights&#x2F;</a>
评论 #7156746 未加载
评论 #7157165 未加载
评论 #7157328 未加载
MrMan超过 11 年前
Unless you are part of a vanishingly small group of autodidacts who can train themselves up to graduate school levels of expertise in multiple overlapping subjects - statistics, computer science (might be able to get away with just being an ok programmer), and the interdisciplinary combination of those called &quot;machine learning,&quot; you should disappear into a statistics degree program, and amend the traditional stats program deficiencies with the modern-day leavening agents that create &quot;machine learning.&quot;<p>Downloading scikitlearn and R and such is not going to work. At that level you are only qualified to be bossed around by a real scientist or statistician. You are an &quot;analyst&quot;.
nashequilibrium超过 11 年前
Follow the link below, there is like 24hrs of lectures, including materials, code etc. These lectures cover reading data, saving data, cleaning &amp; reshaping, visualization, stats, 8hrs machine learning in scikit learn, version control &amp; unit testing, geospatial analyses. This is all in python using numpy,scipy,ipython,pandas and scikit learn as the base tools. You will love the ipython notebook! <a href="https://conference.scipy.org/scipy2013/tutorials_schedule.php" rel="nofollow">https:&#x2F;&#x2F;conference.scipy.org&#x2F;scipy2013&#x2F;tutorials_schedule.ph...</a>
jmount超过 11 年前
Try our upcoming book: &quot;Practical Data Science with R&quot; <a href="http://www.manning.com/zumel/" rel="nofollow">http:&#x2F;&#x2F;www.manning.com&#x2F;zumel&#x2F;</a>
allochthon超过 11 年前
I don&#x27;t have a PhD, and I&#x27;d love to be called a &quot;scientist.&quot; But I think it&#x27;s pretentious to use the label &quot;data scientist&quot; for anyone with solid stats experience and a gift for exploring data. To my mind, scientists have gone through formal training and earned a PhD, which, in a given context, may or may not be necessary for what these guys are doing.
roel_v超过 11 年前
Not a journal or blog, but you should start reading the application guidelines for your local university&#x27;s math, econometrics or similar degrees.
ScottWhigham超过 11 年前
Have you joined&#x2F;visited <a href="http://datatau.com" rel="nofollow">http:&#x2F;&#x2F;datatau.com</a>? Fun HN-style community site.
justinkestelyn超过 11 年前
Good list of initial resources:<p><a href="http://www.cloudera.com/content/dev-center/en/home/developer-admin-resources/new-to-data-science.html" rel="nofollow">http:&#x2F;&#x2F;www.cloudera.com&#x2F;content&#x2F;dev-center&#x2F;en&#x2F;home&#x2F;developer...</a>
amerkhalid超过 11 年前
You can also take MOOC courses for example: <a href="https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;specialization&#x2F;jhudatascience&#x2F;1?utm...</a>
chubot超过 11 年前
I&#x27;d recommend Hadley Wickam&#x27;s papers: <a href="http://vita.had.co.nz/" rel="nofollow">http:&#x2F;&#x2F;vita.had.co.nz&#x2F;</a><p>He is the prolific author of many R packages, which are more like little languages than libraries. His papers are both philosophical and practical, and informed by writing a huge amount of code.<p>The first one on that page is really good, and along with another paper of his got me explicitly thinking of organize my data in R using the relational model (a thing people with computer science backgrounds will know well).<p>It made me realize that R is actually a better SQL. It&#x27;s a language for tables, or an algebra of tables.
mindcrash超过 11 年前
Grab this set: <a href="http://shop.oreilly.com/category/get/data-science-kit.do" rel="nofollow">http:&#x2F;&#x2F;shop.oreilly.com&#x2F;category&#x2F;get&#x2F;data-science-kit.do</a> for Data Science, and maybe this set aswell: <a href="http://shop.oreilly.com/category/get/machine-learning-kit.do" rel="nofollow">http:&#x2F;&#x2F;shop.oreilly.com&#x2F;category&#x2F;get&#x2F;machine-learning-kit.do</a> if you&#x27;re into Machine Learning.<p>Both from O&#x27;Reilly (with some Packt mixed in). Excellent content.
steamer25超过 11 年前
This isn&#x27;t a periodical (although you used to be able to view the top questions for the given week--if anyone knows how to get that out of StackExchange again, please let me know) but it is a good source of bite-sized info-trickle:<p><a href="http://stats.stackexchange.com/questions?sort=votes" rel="nofollow">http:&#x2F;&#x2F;stats.stackexchange.com&#x2F;questions?sort=votes</a>
评论 #7191687 未加载
ih超过 11 年前
Udacity has a data science track of courses (<a href="https://www.udacity.com/courses#!/Data%20Science" rel="nofollow">https:&#x2F;&#x2F;www.udacity.com&#x2F;courses#!&#x2F;Data%20Science</a>) and the blog has recently had data science related posts (<a href="http://blog.udacity.com/" rel="nofollow">http:&#x2F;&#x2F;blog.udacity.com&#x2F;</a>).
ZygmuntZ超过 11 年前
Try these:<p><a href="http://fastml.com/links/" rel="nofollow">http:&#x2F;&#x2F;fastml.com&#x2F;links&#x2F;</a>
x-sam超过 11 年前
I use a twitter list to collect some cool data people, here are some <a href="https://twitter.com/lc0d3r/data-nerds" rel="nofollow">https:&#x2F;&#x2F;twitter.com&#x2F;lc0d3r&#x2F;data-nerds</a>
phatak-dev超过 11 年前
You can learn a lot about machine learning from this course <a href="https://www.coursera.org/course/ml" rel="nofollow">https:&#x2F;&#x2F;www.coursera.org&#x2F;course&#x2F;ml</a>
dbecker超过 11 年前
Not a journal or blog, but I highly recommend Andrew Ng&#x27;s Machine Learning course on Coursera.
0800899g超过 11 年前
What journals and blogs should I be reading to become a data scientist?
skadamat超过 11 年前
datasciencemasters.org<p>and the HN for Data Sci - datatau.com
slashdotaccount超过 11 年前
Dipshit Buzzwords Quarterly Data Mining, Machine Learning, Artificial Intelligence and other euphemisms for being pretentiously lazy Amazon Principal Engineer Tenets