“Anyone who stops learning is old, whether at twenty or eighty. Anyone who keeps learning stays young.” – Henry Ford
Machine Learning is the talk of the town in present scenario. We see the daily use of Machine Learning in our day to day life, the Google Assistant or Siri we use in our smartphones, the facial recognition systems, the self driving cars, and even the Google search engine. There are other uses of ML which affect our daily lives but we don't realize about it like the Netflix shows recommended to us, the product recommendation on various e-commerce sites, the categorization of our spam mails, and even Job recommendations and resume shortlisting.
“Those people who develop the ability to continuously acquire new and better forms of knowledge that they can apply to their work and to their lives will be the movers and shakers in our society for the indefinite future.” – Brain Tracy
It is important to learn new things in this evolving world and Machine learning will be one of the best options to learn. There are many sources to learn Machine Learning. So many video courses, blogs, and books are present to learn Data Science and Machine Learning.
Kaggle is the largest community of Data Scientists and ML engineers, it has Datasets for training and testing, Community for discussions, Competitions for practice of real world ML problems, and Notebooks for coding and helping beginners with code. It also contains micro-courses for teaching Machine Learning which I consider is one of the best resources to learn ML as it teaches you ML along with practice of codes and has a learning time of 7-8 hours which is quite feasible in our busy schedule.
I have written down a possible path for beginners to follow in order to study Machine Learning. It consists of the order in which the micro-courses must be finished and the details about the things learned in each of the course.
The first thing to learn in case of any software related job is programming language. Python is one of the best languages to learn in case of Machine Learning. This course teaches the basics of Python which are the building block for the courses which are to follow next.
The next course to follow must be Pandas course, which is about a library which helps in reading of the data from different formats(.csv,.txt,.tsv) and converting them to dataframes or series. Dataframes makes processing of the data very easy. Pandas is one of the most popular library in the field of data science. In this course we learn about different operations which can be applied on our data like sorting, filtering, grouping, etc.
Data Visualization is the first step in case of model building for machine learning. In this step we do Exploratory Data Analysis (EDA) which means to draw charts and graphs for the various features and data points present in the data. It helps us visualize the data and draw inferences from it and also have a first impression about what features are important for model building. In this course we learn about Seaborn a data visualization library. We learn to draw different types of charts like Line chart, Bar Chart, Scatter Plot, Distribution charts, etc.
This course is necessary for improving the accuracy of our models. In EDA we also find out about the data whether it contains null values, missing or abnormal values. It is necessary to treat such values and in this course we learn about treating the dataset to make a cleaner and organized dataset. It contains treating missing values using different methods like dropping values, imputation, making the data properly distributed by scaling and normalization and treating other inconsistencies in data like parsing dates, proper character encodings and spelling inconsistencies.
The next step should be this course which introduces us to world of Model building. It teaches us two ML algorithms- Decision Trees and Random Forests and basics of EDA. It teaches the various steps in ML model building that include data exploration, defining model, validation of model, and parameter tuning of the model. The scikit-learn library is used to do all these things as it contains various predefined ML models.
Feature Engineering is the next step to make our model better. This course teaches us how to make new features from the existing features, use techniques like Principal Component Analysis(PCA) and K-means clustering to select important features, also identifying the various important features and using them in our model.
Now we are ready to up our game and learn about more advanced things in ML. This course contains advanced things required to level up our ML knowledge. It contains details about basic data cleaning, basic feature engineering, building pipelines, cross validation and using advanced models like XGBoost which introduces the concept of bagging and boosting to us.
After learning about Machine Learning algorithms and models it is necessary to know about Deep Learning models. This course explains the various concepts required to build neural networks like neurons, layers(input, hidden, output), optimization algorithm, dropout and batch-normalization. It helps us build the neural networks using the Tensorflow library which has Keras integrated in it.
After building the models it is necessary to understand how the models work. If we understand the working of different models it helps us improve our models and also remove bias from our model. The insights from the models help us in-
- Informing feature engineering
- Directing future data collection
- Informing human decision-making
- Building Trust
SHAP and LIME are used to infer insights from the models and make them understandable.
This course is necessary for introducing ethics in the field of AI and ML.
“In a time of drastic change it is the learners who inherit the future. The learned usually find themselves equipped to live in a world that no longer exists.” – Eric Hoffer
There are various specialized use-cases of Machine Learning. These are necessary to learn if we want to solve the problem of a specific domain. The various courses offered on Kaggle are given below-
In this course the dataset consists of image. It introduces us to Convolutional Neural Networks (CNNs). The basic image classification task is done in this course. The concept of data augmentation is also introduced.
In this course the dataset consists of textual image. The basic concept of text classification and basics of NLP are introduced in this course.
In this course the dataset has geographical data. It is about analysis and not about model building. Interactive maps are built from geospatial data and then inferences are drawn from it.