3 Machine Learning Books that Helped me Level Up as a Data Scientist
Luciano Strika Originally published at datastuff.tech on ・5 min read
There is a Japanese word, tsundoku (積ん読), which means buying and keeping a growing collection of books, even though you don’t really read them all.
I think we Developers and Data Scientists are particularly prone to falling into this trap. Personally, I even hoard bookmarks: my phone’s Chrome browser has so many open tabs, the counter was replaced with a “:D” emoji.
In that zeal for reading and learning most of us experience, we usually end up lost , not sure of which book to pick up next. That’s why today I’ll give you a very short list: just 3 Machine Learning books, so that you won’t just bookmark it and forget it.
Each of these books has helped me immensely in different stages of my career as a Data Scientist, particularly in my role as a Machine Learning Engineer.
Here come the books!
O’Reilly: Data Science from Scratch with Python
I’ll review this book first, since it’s the most introductory or broad one in this list.
I have a very personal attachment to this book, since it’s the one that got me my job. That’s right! I knew next to nothing about Data Science, even what Data Science was, before picking up this book.
I did have a pretty strong Probability and Statistics background, and knew enough Python to defend myself. However, I was missing the practical side of it.
This book did many things for me. It:
- Showed me how to process data in Python efficiently and elegantly (following Python’s good practices ).
- Taught me how to implement most simple Machine Learning algorithms from scratch.
- Showed me what the day-to-day job of a Data Scientist may look like.
- Taught me how to communicate my results to others clearly.
I wholeheartedly recommend it if you’re new to the Data Science community. It will give you a clear overview of most topics you’ll need in order to start being a productive Data Scientist.
It will also showcase Python’s most commonly used libraries and expose you to a lot of idiomatic code , which is always a plus.
Here’s a link to Data Science from Scratch on Amazon.
Springer: Introduction to Statistical Learning
This book is the most comprehensive Machine Learning book I’ve found so far. I learned a lot from it, from Unsupervised Learning algorithms like K-Means Clustering, to Supervised Learning ones like Boosted Trees.
The first chapters may feel a bit too introductory if you’re already working in this field (at least that was my experience). However, they also sum up many things you may not have learned in such an organized way before.
The later chapters are, however, where I think this book really shines. Its explanation of random forests, boosted trees and support vector machines are spot on.
Here are some of the topics you can learn from Introduction to Statistical Learning:
- Regression and Supervised Learning Algorithms: from Linear Regression and SVMs to tree-based methods.
- Unsupervised Learning techniques: especially Clustering, including the K-Means algorithm.
- Sampling methods, and other general Machine Learning core concepts.
- The meaning, advantages and disadvantages of metrics such as accuracy, recall, precision , etc.
I think this book has been my best read so far this year, and it’s made me into a more round up Data Scientist. I recommend it if you have a bit more experience, but want to polish your edges. It is also a very good reference book to keep on your shelf.
It also shows everything’s implementation in R, which I didn’t find particularly useful, but it didn’t hurt. You’ll probably import most of this code from SciKit learn anyway.
As before, here’s a link to Springer’s Introduction to Statistical Learning on Amazon.
Deep Learning by Goodfellow, Bengio et al.
This book blows my mind every time I open it. I’ll be the first to admit I haven’t really read it from start to finish. Yet.
The only reason it’s the last one in the list is because of its very specific scope : Artificial Neural Networks or Deep Learning.
However its first chapters, with an overview of Deep Learning’s precursors and what makes it different, and then the explanation of how Deep Learning works , are marvelous.
It even starts off by explaining everything you need to know before studying deep learning , with whole chapters dedicated to linear algebra , probability and information theory , and numerical computation methods.
The next chapters, which I’ve only partially read, serve as an awesome reference whenever you need to dive deeper into a particular Neural Network architecture.
They include in-depth explanations of Convolutional Neural Networks and Recurrent Neural Networks, along with many regularization or optimization methods.
The third and last section, which revolves around cutting-edge technology , explains Generative models , Autoencoders and many other interesting algorithms. Adding them to your own toolkit will probably give you a great boost!
The authors of this book are the rock stars of Machine Learning right now. One of them even won a Turing award recently, so I can’t think of better people to teach this subject.
Here’s an Amazon link if you’re interested in the Deep Learning book.
Conclusion
I went from a broad, introductory book to an advanced, specific one.
Each of these Machine Learning books has had a profound impact in my career and, to some degree, the way I see the world.
I really hope at least some of them will have the same positive impact on your life!
And if you’ve already read, or are reading, any of them, tell me what you think of them in the comments! I’d love to discuss any of them further, especially the Deep Learning book.
We can also discuss them on Twitter, Medium of dev.to if you’re interested.
I want to hear your opinions!
(small disclaimer: all of these links are Amazon affiliate links, which means I get a small commission if you buy the books. However, I’ll only review books I’ve actually read, and have genuinely recommended to people in real life)
The post 3 Machine Learning Books that Helped me Level Up as a Data Scientist appeared first on Data Stuff.
Great choices and these are in my virtual bookshelf.
O’Reilly: Data Science from Scratch with Python (I used for my UCLA Extension: Intro. To Data Science). Note the book's GitHUb repository has code in Python 3 too. Enjoy re-reading it.
Springer: Introduction to Statistical Learning (I used for my UCLA Extension: ML in R ).
Great dataset and well written on statistics information that helped me with revision/refresher.
Deep Learning by Goodfellow, Bengio et al. ( Recommended as an optional text for my current UCSD Extension: Deep Learning With TensorFlow & Keras ).
Thanks for posting. :-)
I had read O’Reilly: Data Science from Scratch with Python but it does not satisfy my need to find out what is regression, softmax, dense, one hot, or any other methodologies. Do you have any book reference for that?
Thanks for asking!
It's true, Data Science from Scratch is a lot broader, and doesn't specialize in Machine Learning concepts -like the ones you bring up- only.
Most of the things you mentioned are usually associated with Deep Learning, and you'd get a very in-depth explanation of all of them, plus an intuition of when to use them, from Goodfellow's Deep Learning book.
Thanks for the recommendation, I have been waiting for a copy of
Data Science from Scratch with Python
for a while now from Amazon.Thank You. Any suggestions for a mediocre math person with hands on Ruby knowledge?
Well, my first suggestion would be don't call yourself mediocre!
I'm sure you can learn any of these things if you set your mind to it and work on it for enough time. It may take longer, or shorter, but you'll eventually get there.
On the practical side, I'll say read these books, but also code a lot, and keep everything on GitHub. That's your portfolio.
If you keep grinding, you'll realize one day that you've levelled up!
Thank You again!
I've wanted to read The Hundred-Page Machine Learning Book by Andriy Burkov. I haven't had a chance, but I have only heard good things about it.
I'd never heard of this book before, I'll look into it later!
Do you have any recommendations for books on revising math/stats as a prerequisite to diving into these?
Well, Bengio's Deep Learning actually has a very big Statistics chapter which covers everything you should need, and Data Science from Scratch dedicates a big part of its first chapters to introducing you to Statistics, assuming basically no prior knowledge.
So either of those, depending on how confident you feel, should be good to start!
You have good taste. Thanks for sharing.