Basic knowledge about this is helpful
Data preprocessing is an important step in the data mining process.
- Import the library.
- Get the data.
- Check for missing or null data.
- Convert categorical data into numbers.
- Split into train and test.
- Feature scaling.
For data preprocessing use this Jupyter notebook
Supervised learning is the learning of the model with an input variable and an output variable and algorithm map the input to the output.
Supervised learning classified into two categories of algorithms:
- Classification: A classification problem is when the output variable is a category, such as "disease" or "No disease".
- Regression: A regression problem is when the output variable is a real value, such as "Price".
There is a wide variety of classification applications from Healthcare to Marketing.
Learn how to implement the following classification models:
The regression technique varies from Linear Regression to Random Forest.
Unsupervised learning is where only the input data is present and no corresponding output variable is there.
Unsupervised learning has two categories of algorithms:
- Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior.
- Association: An association rule learning problem is where you want to discover rules that describe a large portion of your data, such as people that buy X also tend to buy Y
Clustering is similar to classification, but the basis is different. In clustering, you don't know what you are looking for, and you are trying to identify some segments or clusters in your data.
Learn how to implement the following Machine learning Clustering models:
The main problem is how to use the right estimator for our problems?
You can use the Scikit-learn map for your problem.
To make the world a better place, use data wisely.