TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

A Junior Data Scientist Bookshelf (including Free Versions and HN Discussions)

157 点作者 gghyslain超过 8 年前

7 条评论

gghyslain超过 8 年前
Thanks everyone for the positives feedbacks. I did not have much time yet to write down full reviews of all the books, but I&#x27;ll work on it - so far this page is more of a personal &quot;bookmark&quot;. But to reply to @Nekopa and @carlsednaoui, here is a short review of the first books.<p>I have had a really pragmatic approach about reading them - only focusing first on parts relevant to my projects.<p># An Introduction to Statistical Learning (ISL) &#x2F; The Elements of Statistical Learning (ESL)<p>I focused on chapter 8-9 of ISL about Tree Based Methods and SVMs, two algorithms I used for my dissertation project. I found ISL to provide very clear explanations of the algorithms with just enough mathematical formalism.<p>I have a good math background so ESL was interesting to go through. But I am more of a practical person, and I found ISL to be more suited for me when it came down to working on my project and supporting my choices.<p># Python Machine Learning<p>Really great hands-on book ! Sebastian Raschka manages well to guide you through all steps of a ML project data: pre-processing, feature engineering, model selection... - all the steps are defined and covered with practical examples.<p>I strongly recommend this book if you are just starting out with ML and feel &quot;lost&quot; about how to start your own project.<p># Taming Text<p>I decided to use text data I had available for my dissertation project. However, half-way through the book I realized my dataset was to small to apply any of the techniques described there. I still like the practical approach and in the end the book gave me a good idea of what can be done with text.<p># Advanced Analytics with Spark<p>I picked this book once I started working on the implementation of my project into production - we use Apache Spark (Scala) at work.<p>It provided me with a good introduction to Spark BUT it&#x27;s based on the RDD-api and as stated on Spark website: &quot;As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package.&quot;<p>I&#x27;m now mostly relying on Spark Doc &#x2F; API, I&#x27;m not aware of any up-to-date books yet :)
clumsysmurf超过 8 年前
&quot;R for Data Science&quot; by Garrett Grolemund &amp; Hadley Wickham was recently completed.<p><a href="http:&#x2F;&#x2F;r4ds.had.co.nz&#x2F;" rel="nofollow">http:&#x2F;&#x2F;r4ds.had.co.nz&#x2F;</a><p>The ebook is free online, you can buy from Amazon &amp; O&#x27;Reilly too.
nekopa超过 8 年前
Nice list! I especially like that you added Resonate in there.<p>Could you add your personal reason for keeping these books on your shelf? That would make the page more interesting, and maybe help you out with your job search, as it will give an insight into your thought processes.
评论 #13241961 未加载
baldeagle超过 8 年前
As a senior practitioner in the field, I feel a few years removed from my initial learning chunk. I really like this list as a throw back to see how the I would have done it today.
评论 #13238854 未加载
bssrdf超过 8 年前
Still debating whether I should start with An Introduction to Statistical Learning (ISL) or Bishop&#x27;s Pattern Recognition and Machine Learning (PRML). I really don&#x27;t like using R (always a python person). Both have rave reviews on Amazon. Any thoughts?
fixxer超过 8 年前
Solid start. I&#x27;d strongly suggest adding some Bayesian modeling books; start with Gelman.<p>If you look at the academic lineage of many of these authors, it will also help you understand how they get stuck into little biases.
carlsednaoui超过 8 年前
This is awesome, thanks for sharing. Ditto what Nekopa said, curious to hear why you like each of these resources.