Machine learning models come in all shapes and sizes and it might be difficult to create a classification between them, but there is a characteristic inherited in all models that can separate them in two types, parametric models vs non-parametric models.
Parametric models are all the ones that give results based on a set of parameters, each parameter is responsible for one of more features and affects each differently. The most basic example of a parametric model is linear regression, where each parameter is multiplied by each feature linearly. Parametric models try to find the probability distribution of the training data and to approximate it using a set of parameters. For parametric models to work we usually assume that the data is drawn from a probability distribution of known form. The advantage of this type of model is that it reduces the problem of estimating a probability density, discriminant, or regression function to estimating a small number of parameters. Its disadvantage is that the distribution assumption may not hold and that might cause a lot of error.
Note: sometimes a better approach is to use semi-parametric methods, these methods mix different distributions.
We usually use a non-parametric approach for density estimation, classification, outlier detection and regression. With this type of model we assume that similar inputs have similar outputs. Instances that are similar mean similar things. Based on past data, the algorithm tries to find similar instances, it interpolates their values and gives a result. For this to work non-parametric models required two basic things
- A history of all the seen data (usually O(n) space and time complexity).
- A distance measure to compare different instances and assign a similarity level (e.g. Euclidean, Mahalanobis) .
The linear complexity is the bottleneck of this method since usually, the training set is bigger than the parameters needed to model the problem.
As a summary, parametric models are faster, lighter, and more simple thus they tend to create less variance error. On the other hand, non-parametric models remember all the training instances and are a more powerful approach, although much slower, we need to be careful in order to prevent overfitting. An example of this is when using decision trees, a good regularization method is to use random forests since the results are interpolated between different trees.
|Linear regression||Histogram estimator|
|Neural networks||Kernel estimator|
|Support vector machines||Decision trees|
|Linear Discriminant Analysis||Random forests|