How to Teach Yourself Machine Learning in 5 Steps


Machine learning has gained a lot of popularity in recent years. ML engineer is one of the hottest career choices, so it’s no wonder more people want to get into the field. However, if you’re going to teach yourself ML from scratch, it’s essential to have a clear vision of your learning path, knowing what to learn and what to skip.

To teach yourself machine learning, learn Python programming first. After that, follow an introductory ML course, dig deeper into Python ML libraries, and do targeted practice to solidify your learnings. Learning some mathematics and statistics along the way is also beneficial.

In this article, you’ll learn everything you need to know about teaching yourself machine learning from the ground up. We’ll discuss the tools to acquire, courses to enroll in, books to read, and what to avoid.

Important Sidenote: We interviewed numerous data science professionals (data scientists, hiring managers, recruiters – you name it) and identified 6 proven steps to follow for becoming a data scientist. Read my article: ‘6 Proven Steps To Becoming a Data Scientist [Complete Guide] for in-depth findings and recommendations! – This is perhaps the most comprehensive article on the subject you will find on the internet!

Why Should You Learn Machine Learning?

You need to have the right mindset before beginning your machine learning journey. It is essential to understand why ML is one of the hottest topics in the tech industry and why you should consider becoming an ML engineer.

From Facebook to Walmart, tech giants have been using machine learning for years now. With the surge in data generated every day, training ML models is easier than ever. Computation has also become cheaper as we have extremely powerful hardware at our fingertips. So, there are plenty of reasons why ML is booming recently.

Netflix paid $1 million to a research team that improved its movie recommendation algorithm’s accuracy by 10 percent. This is clear evidence that every customer-centric organization wants to adopt ML technology. Soon, all industries will find innovative ways to implement ML models to enhance their services. 

Machine learning is here to stay. Technology will keep improving, and ML will become an integral part of it, transforming many other industries. According to the U.S. Bureau of Labor Statistics, machine learning engineers earned an average of $122,000 in 2019, with the top earners making over $189,000. It also predicts that employment for ML engineers will grow by 15 percent from 2019 to 2029.

Becoming a machine learning engineer is an excellent career choice as it is future proof, in demand and high paying. With the amount of information available on the internet, there’s no reason for you not to dive into ML if it interests you.

Can Machine Learning Be Self-Taught?

Yes, but be prepared to work hard. There are many tools and technologies you need to know as a machine learning engineer. It’ll be easier if you have a background in computer science, mathematics, or statistics, but if you’re entirely new to the field, it’ll take some time for you to learn and adjust.

Don’t get me wrong; it is very much doable. You will just need to be patient and learn at your own pace. It can be daunting to see all the stuff that’s there to know. Spend as much time as you need to on every step detailed in this post.

Also, before getting into machine learning, make sure that this is what you want to do. There are mathematics and statistics involved (not a lot, but still, you should know what you’re getting into). I recommend watching some videos or reading a few articles on ML so that you can get an idea of what lies ahead. 

I recommend skimming through the article on the difference between machine learning and normal programming. It gives a broad overview of machine learning, its related fields, and career options. Have a read and come back to this article, if ML still interests you, to get started on the right foot.

Lastly, motivating and disciplining yourself to study is the hardest thing about self-teaching machine learning or anything else for that matter. I highly recommend finding some ML enthusiasts like yourself so that you can encourage and help each other on your journeys.

Getting Started

It is time to get started, but before we do, there are a few little things we need to discuss. First of all, you should learn how to use the Jupyter notebook. It’s an online notebook that allows you to create and share documents containing live code. Jupyter is used extensively for data science purposes. You’ll find online courses and books sharing notes through it all the time. So, it makes sense to know your way around the platform. 

The following video walks you through the setup and functions of Jupyter:

Secondly, if you already know Python, you can skip the first step and get started with machine learning directly. Having a background in computer science will be extremely helpful since ML is mostly computer science anyway. If you’re coming from a non-CS background, though, you’ll find it hard to get straight into ML. But don’t worry; I’ll be linking to a few resources for complete beginners as well.

Next is installing Python. There are many ways to do it, but I recommend the Anaconda bundle. It comes with a Jupyter notebook and several helpful libraries for data science. Here’s a video to help you set up Anaconda:

Books are an excellent source of learning, as well. Several experts have compiled their valuable insights into texts, and many of these guides are also (legally) available for free. If you’re a book-learner, I highly recommend reading the article discussing 15 books for machine learning in Python. You’ll find the best books for beginner, intermediate, and advanced students in that listicle.

Lastly, Brain.fm is another excellent tool. Although it’s not directly related to machine learning, I find it a fantastic app to increase your focus on studying and working. So, do check it out.

Learn Programming

You must have a decent knowledge of programming before touching machine learning because you won’t get anywhere without it. Thankfully, you don’t have to be a programming ninja to implement ML algorithms, knowing basic and intermediate programming will do.

Python, R code, Java, Julia, and LISP are some programming languages used for machine learning purposes. Python is the most popular one, and it’s also easier to understand than other languages. Since it’s the most common ML language, lots of tools are available to help you learn, so the first step is to learn Python.

The goal here is not to become a master of Python; it is to learn enough to develop an intuition. By the end of this step, you should know all about integers, floats, loops, conditional statements, functions, etc.

Resources

If you’re a complete beginner, I recommend the Python for Everybody series by the University of Michigan on Coursera. It’s one of the most popular “no prerequisite” courses on Python. In this series, you’ll learn the basics of Python, Python data structures, accessing web data, using databases, and processing data with Python.

You can also follow the Python Crash Course by freeCodeCamp here:

It teaches all the central concepts of Python in one video. Thanks to the new “chapters” feature of YouTube, you’ll get a nice outline of all the concepts you’ll learn with timestamps in the description box.

On the other hand, if you’re already familiar with another language or just want to brush up on Python syntax, I recommend watching this 43-minute Python guide on YouTube:

It’s a great memory refresher, meant only for people with prior programming experience.

Practicing

Once you’re done learning Python, it is time to spend a few days or weeks practicing it. Get started with The Python Challenge and try to complete all 33 levels using Python scripts. Code Fights is another fun platform to practice core Python concepts. It consists of various short coding challenges that you can complete in 5 minutes.

Another resource for complete beginners is How to Think Like a Computer Scientist. It is an interactive book that can be studied alongside the above courses. You learn fundamental programming concepts using Python, and it feels like a small “CS 101” course. Since it’s an interactive book, you get plenty of opportunities to practice your learnings.

Absorb the Core Concepts of ML

In this step, you’ll learn the theory of machine learning through an online course. There are many existing ML libraries and packages available, but understanding the fundamentals is essential for anyone wanting to implement ML algorithms. Knowing the basics will help you with planning, data collection, preprocessing, assumptions, interpreting model results, improving your models, and other aspects of ML workflow.

Now, you don’t need to go crazy and have the answer to every question. The goal here is to learn just enough theory to help you get started and stay on track. You won’t remember everything you learn, and you shouldn’t dwell on any topic for too long. Consistent practice clears up many concepts that cannot be easily explained, even by experts.

There are several courses available on the internet, and none of them are bad, so it can be hard to choose which ones to spend your time with and which ones to leave. If you’re not careful, you can end up wasting a lot of time completing every other course for certification.

Resources

To get started with machine learning, you can enroll in the popular Machine Learning course by one of the best in the industry, Andrew Ng. It’s a standard course when it comes to learning ML theory. Take your time as it offers a lot of new stuff and clears up the core concepts of ML.

The course teaches you the basics of linear algebra and explains ML algorithms in a way that you don’t need to know calculus to understand how they work. Of course, that doesn’t mean you don’t have to learn calculus down the road; it just means you’ll get through the entire course without any problems.

One thing to note about this course is that it uses MATLAB throughout the coursework. If you’re having trouble with it and want to use Python instead, you can find the examples translated online.

An up-to-date alternative to Andrew Ng’s course is the fast.ai machine learning course. Its approach is “code first” rather than “math first,” teaching you only what you really need to know. It uses tools and libraries like Python, Pandas, Scikit-Learn, and PyTorch. When choosing between the Coursera course and the fast.ai course, I recommend you opt for the latter as it uses more modern technologies.

You can complement these courses with the book An Introduction to Statistical Learning. Although it uses R programming—and not Python—to explain the teachings’ practical applications, it is still an excellent resource if you want to learn statistics for data science. It’s a great idea to complete books such as this alongside video courses. It’ll help you build a solid foundation in machine learning and its underlying concepts in the shortest period.

Start Practicing Using Libraries

Once you’re done with the basic concepts, it is time to practice machine learning using Python libraries. You’ve probably already gotten a healthy dose of exercise following the second step, but it’s time to do more.

In this step, you’ll use specific exercises to hone your ML skills. You’ll familiarize yourself with the entire ML workflow, from data collection to evaluation. This is also the phase where you’ll practice what you’ve learned on real datasets.

When it comes to machine learning frameworks, the big question is: Which library should you learn first as a beginner?

There are many ML frameworks you can use to build and train ML models. However, four of them are the most common ones using Python, namely, Scikit-learn, PyTorch, TensorFlow, and Keras. Although this post is a compilation of resources, let’s briefly discuss what these frameworks are.

  • Scikit-learn is a user-friendly machine learning library that has various algorithms like regression, classification, and clustering.
  • TensorFlow is an end-to-end machine learning framework from Google that’s built for numerical computation and large-scale ML.
  • PyTorch is TensorFlow’s direct competitor from Facebook that’s immensely popular in research labs, but not yet on production servers.
  • Keras is an API built on top of TensorFlow (making it a wrapper for deep learning) that’s very user-friendly and easy to learn.

All four frameworks are open source, and I recommend you begin learning with Scikit-learn. Although other frameworks are also not too difficult, Scikit-learn is the easiest to get started with. Starting simple will allow you to build on your basic knowledge and move on to more complex things later.

Also, focus on learning one library at a time. Don’t flood your brain with too much information; it’ll only hinder your problem-solving capacity.

Resources

Hands-On Machine Learning is one of the best books on this subject. It goes deep into Scikit-learn and TensorFlow, discussing ideas from the very beginning and taking you far into advanced concepts. You’ll find lots of ML challenges throughout the book to apply what you’ve learned. The first half of the text discusses machine learning using Scikit-learn, while the second half looks at deep learning with TensorFlow.

If you’ve followed Andrew Ng’s course in the second step, now is the time to pick up the fast.ai course. As we’ve said, it uses modern Python libraries that’ll help you strengthen your grasp on machine learning using Python. 

This Scikit-learn course by freeCodeCamp is also an excellent resource for learning about the library in video format:

You’ll also require datasets for practicing. U.S. machine learning repo is a wonderful collection of hundreds of datasets curated just for practicing ML. You can filter the available datasets by tasks (classification, regression, clustering, etc.), data type, industry, size, format, and more.

Kaggle is another incredible source for getting community datasets. You can find datasets on various fun topics ranging from Pokémon to wine quality. Lastly, if you’re looking for government-related data, data.gov houses over 190,000 datasets from the U.S. government.

Study Essential Mathematics

At this point, you should be able to work with libraries and apply ML techniques to solve various problems. Keep going! You’re almost there!

This step isn’t exactly number four; it’s more of a prerequisite. However, the code-first approach we’ve been taking has allowed us to keep learning with minimal mathematics. That doesn’t mean it’s not useful to understand the mathematics and statistics behind ML techniques. Here, as elsewhere, you don’t need to get a Ph.D.; learning a couple of ML-relevant topics is more than enough.

First of all, you have to know what you have to know. Here are the concepts you need for understanding machine learning deeply:

  • Linear algebra: This comes up everywhere in machine learning. Topics like Principal Component Analysis (PCA), Matrix Operations, Singular Value Decomposition (SVD), and others are required for understanding the optimization methods used for ML.
  • Probability and Statistics: Machine learning and statistics are not very different, so it’s essential to know basic probability and statistics. You’ll need to study concepts like Bayes’ theorem, Standard Distributions, Random Variables, and others to understand the essence of machine learning.
  • Multivariate Calculus: It is necessary for building a lot of common ML models. It is essential to know topics like Differentiation, Integration, Plotting of functions, Minimum and Maximum values of a function, and more.
  • Optimization Methods: This is crucial for understanding the scalability and efficiency of ML algorithms. The cost function, Gradient Descent Algorithm, Graphs, Likelihood function, and Data Structures are some of the topics that fall into this category.

This is the essential mathematics you need to know to take your machine learning skills to the next level. Seeing so many concepts can be daunting, but the good news is that you don’t need all of this to get started with machine learning. The idea is to learn math as you master new algorithms and techniques.

Resources

As we’ve said, you may also complete some of the following material alongside the video courses or books mentioned in the previous steps. Here are the links to free resources for learning the topics discussed above:

A book worth reading on this subject is Machine Learning: An Algorithmic Perspective. It focuses on the mathematics behind machine learning with Python. Although it is mathematically oriented, it also includes lots of Python coding and practical exercises at the end of each chapter. It’s not a book to learn the fundamentals of mathematics; instead, it teaches you how those concepts apply to machine learning.

Another incredible book is Mathematics for Machine Learning (free PDF link). It goes through all the topics we’ve talked about, and it is an excellent reference book when you want to know the math behind something you’re working on.

Build Fun Projects

In this phase, you’ll build as many quality projects as you can. This step is also not strictly the fifth one. You may start participating in Kaggle competitions or building popular beginner ML projects once you’re comfortable with ML frameworks.

No matter how many courses you watch, no matter how many books you read, nothing can replace hands-on practice. Implementing machine learning algorithms is more challenging than just reading about it or watching someone else do it.

Another benefit of building machine learning projects is that you can add them to your portfolio. It will help you find better career opportunities and land a job more efficiently, so I highly recommend you get started building cool machine learning projects. It’ll boost your confidence and help you master what you’ve learned.

Resources

I mentioned Kaggle competitions. It’s an incredible learning platform for beginners. The problems there may be difficult, but that’s the point—they get you thinking, researching, customizing, etc. Don’t worry about winning every competition; just solve as many problems as you can. You’ll find several competitions aimed at beginners as well.

One of the most popular challenges on Kaggle is the Titanic prediction challenge. This is an excellent project to get your hands dirty because there are lots of tutorials available online. You get to learn how experts approach the same problem.

You can also find tons of project ideas on the internet. All you have to do is Google “machine learning projects for beginners,” and you’ll get unique ideas with detailed solutions.

Writing algorithms is another excellent way to develop your understanding. It helps you build a real sense of its mechanics since you have to think about every step while writing an algorithm from scratch. You’ll also learn to translate mathematical instructions into code. This skill is especially useful when you’re adapting algorithms from research papers.

Start with simpler algorithms. Once you start writing them, you’ll realize how even the simple ones aren’t straightforward. You have to make many small decisions to build even a simple algorithm from scratch.

Things You Shouldn’t Do

We’ve talked about many things you need to do, tasks you need to complete, and books you need to read. Let’s now discuss some things that you shouldn’t do. These are the pitfalls newcomers often fall into and waste their precious time. Here’s what to avoid on your machine learning journey:

  • Don’t go for certificates. Skills are more important than certifications. Nobody will ask you whether you have a Deep learning nano degree from Udacity; employers want to know if you can build and train machine learning models effectively or not.
  • Ignore the latest articles and papers. It’s hard to keep up with the latest developments in the industry. As a beginner, you shouldn’t bother with it. It’ll only stop you from getting a solid foundation as soon as possible. Don’t get distracted and stick to the path.
  • Don’t worry about not knowing something. You won’t become an expert by following the steps in this article. There will always be things you don’t know. As Aristotle said, “The more you know, the more you know you don’t know.” Embrace the fact that you’ll never know 100%, and everything is a milestone that helps you progress.
  • Don’t be impatient. You’re learning all by yourself, so it’s going to take time. You’ll have to be patient and stick to the path you’ve chosen. There will be times when you’ll feel like you don’t quite know what you’re doing, especially if you’re from a non-CS background, but consistent practice will clear things up gradually.

Author’s Recommendations: Top Data Science Resources To Consider

Before concluding this article, I wanted to share few top data science resources that I have personally vetted for you. I am confident that you can greatly benefit in your data science journey by considering one or more of these resources.

  • DataCamp: If you are a beginner focused towards building the foundational skills in data science, there is no better platform than DataCamp. Under one membership umbrella, DataCamp gives you access to 335+ data science courses. There is absolutely no other platform that comes anywhere close to this. Hence, if building foundational data science skills is your goal: Click Here to Sign Up For DataCamp Today!
  • MITx MicroMasters Program in Data Science: If you are at a more advanced stage in your data science journey and looking to take your skills to the next level, there is no Non-Degree program better than MIT MicroMasters. Click Here To Enroll Into The MIT MicroMasters Program Today! (To learn more: Check out my full review of the MIT MicroMasters program here)
  • Roadmap To Becoming a Data Scientist: If you have decided to become a data science professional but not fully sure how to get started: read my article – 6 Proven Ways To Becoming a Data Scientist. In this article, I share my findings from interviewing 100+ data science professionals at top companies (including – Google, Meta, Amazon, etc.) and give you a full roadmap to becoming a data scientist.

Final Thoughts

Machine learning is a huge industry, and there’s certainly a lot to learn. It can be daunting for beginners to see all the topics, especially since they’ve decided to study on their own. That’s why I’ve put together this blueprint that anyone can follow to teach themselves machine learning.

All the resources provided in this article are free, except for books, so the only thing you really need to become a machine learning engineer is time and dedication. Follow the steps mentioned here and stick to the path. It can take you anywhere from nine months to two years to complete everything.

Good luck!

BEFORE YOU GO: Don’t forget to check out my latest article – 6 Proven Steps To Becoming a Data Scientist [Complete Guide]. We interviewed numerous data science professionals (data scientists, hiring managers, recruiters – you name it) and created this comprehensive guide to help you land that perfect data science job.

  1. Machine Learning Repository. (n.d.). UC Irvine. https://archive.ics.uci.edu/ml/
  2. Akinfaderin, W. (2017, March 25). The mathematics of machine learning. Medium. https://towardsdatascience.com/the-mathematics-of-machine-learning-894f046c568
  3. The best resources I used to teach myself machine learning. (2020, January 21). freeCodeCamp.org. https://www.freecodecamp.org/news/the-best-resources-i-used-to-teach-myself-machine-learning-part-1-292232d167/
  4. (n.d.). Brain.fm: Music to improve focus, meditation & sleep. https://www.brain.fm/
  5. (2020, November 6). CodeSignal. https://codefights.com/
  6. Computer and information research scientists: Occupational outlook handbook:: U.S. Bureau of Labor Statistics. (2020, September 1). U.S. Bureau of Labor Statistics. https://www.bls.gov/ooh/computer-and-information-technology/computer-and-information-research-scientists.htm#tab-5
  7. (2020, April 3). Data.gov. https://www.data.gov/
  8. Find open datasets and machine learning projects | Kaggle. (n.d.). Kaggle: Your Machine Learning and Data Science Community. https://www.kaggle.com/datasets
  9. Introduction to statistical learning. (n.d.). Statistical Learning with Sparsity: The Lasso and Generalizations. https://trevorhastie.github.io/ISLR/
  10. Kaiser, C. (2020, March 19). Don’t learn machine learning. Medium. https://towardsdatascience.com/dont-learn-machine-learning-8af3cf946214
  11. Kaleko/CourseraML. (n.d.). GitHub. https://github.com/kaleko/CourseraML
  12. Keras Team. (n.d.). Keras: the Python deep learning API. https://keras.io/
  13. Machine learning. (n.d.). Coursera. https://www.coursera.org/learn/machine-learning
  14. Netflix prize. (2007, February 9). Wikipedia, the free encyclopedia. Retrieved November 12, 2020, from https://en.wikipedia.org/wiki/Netflix_Prize
  15. The Python challenge. (n.d.). The Python Challenge. https://www.pythonchallenge.com/index.php
  16. Python for everybody. (n.d.). Coursera. https://www.coursera.org/specializations/python
  17. (n.d.). PyTorch. https://pytorch.org/
  18. Scikit-learn. (n.d.). scikit-learn: machine learning in Python — scikit-learn 0.16.1 documentation. Retrieved November 12, 2020, from https://scikit-learn.org/stable/index.html
  19. Table of contents — How to think like a computer scientist: Interactive edition. (n.d.). Runestone Interactive. https://runestone.academy/runestone/books/published/thinkcspy/index.html
  20. (n.d.). TensorFlow. https://www.tensorflow.org/
  21. Thinking of self-studying machine learning? Remind yourself of these 6 things. (20, November 12). https://hackernoon.com/thinking-of-self-studying-machine-learning-remind-yourself-of-these-6-things-b55a5f2b6c7d
  22. Titanic: Machine learning from disaster | Kaggle. (n.d.). Kaggle: Your Machine Learning and Data Science Community. https://www.kaggle.com/c/titanic
 

Affiliate Disclosure: We participate in several affiliate programs and may be compensated if you make a purchase using our referral link, at no additional cost to you. You can, however, trust the integrity of our recommendation. Affiliate programs exist even for products that we are not recommending. We only choose to recommend you the products that we actually believe in.

Daisy

Daisy is the founder of DataScienceNerd.com. Passionate for the field of Data Science, she shares her learnings and experiences in this domain, with the hope to help other Data Science enthusiasts in their path down this incredible discipline.

Recent Posts