It is not always the big stuff or the latest packages that help improve the accuracy or performance of our machine learning models.
At times we overlook the basics of Machine Learning and rush higher order solution. When the solution is just right there in front of us.
Here are the 10 simple things you should remember to try first before throwing in the towel and jumping straight to RNNs and CNNs(of course there are datasets which merit you to start straight from LSTMs and BERT). Let us remind ourselves of our checklist before bringing out our Calculus skills.
Domain Knowledge :
Try to understand as much as about the domain as
you can. This will greatly help you in your predictive
models and in coming up with great features.
Get More Data :
You can simply request for more data. The data you
have might not be enough to give you an accurate
model with a good bias-variance output.
Treat Outliers :
When using optimizers like RMSE or MSE, leaving
outliers untreated in your dataset would lead very
Try transforming your Data :
Simple transformations like "square" or "square
root" can give your model "ideas" to better see
patterns in your dataset. And of course if you
suspect a lognormal distribution, then taking logs on
your features would be very beneficial (especially
when using linear models)
Do feature selection :
The curse of dimensionality is not good. So, selecting
the most relevant features to include in your model,
not only helps you reduce overfitting, it also helps
your model run faster. So throw in some LASSO and
let's see which features would survive.
Do cross validation :
Your test dataset should really be like your last
defender before taking your model to production. So
use cross-validation to reduce variance and obtain a
model which generalizes well with new data.
Try many algorithms :
In beginning you are not very sure of the distribution
of your data. So try a couple of models and see which
one optimizes your objectives or criteria with time
you would be better at knowing which model to use.
Hyperparameter Tuning :
Off course, you have to tune those hyper parameters
like "Learning Rate" so that your gradient descent is
able to avoid being trapped in a local minima. You
need to prune those decision trees to avoid overfitting.
Use Ensemble :
Bagging and Boosting have helped many win kaggle competitions why not try same with your dataset.
Reshuffling your data :
Yes, You read it right. The best ideas are the simplest.
Just try it. Merely reshuffling data often helps improve
the performance. Who said machine learning models
do nor need our help to avoid bias?
Oldest comments (0)