Jesper Dramsch

Posted on Feb 3, 2022 • Originally published at dramsch.net on Feb 2, 2022

4 Tools Kaggle Grandmasters use to win $100,000s in competitions

#competition #kaggle #tips #tricks

Expertise is figuring out what works and what doesn't.

Why not let the experts tell you?

Rather than experiment from the ground up for a decade!

🎨 Pseudolabelling

Some competitions don't have a lot of data.

Pseudo labels are created by building a good model on the training data.

Then predict on the public test data.

Finally, use labels with high confidence as additional training data!

This works best on classifiers with a binary outcome.

The core idea:

Sometimes this is specifically applied to retraining on false positives as negatives.

Data augmentation is a way to artificially create more data by slightly altering the existing data.

This trains the ML model to recognize more variance in the data.

Finishing the last training epochs unaugmented usually increases accuracy.

Augmentation during training? Classic.

How about augmenting your data during testing though?

You can create an ensemble of samples through augmentation.

Predict on the ensemble and then use the average prediction from our model!

Kaggle can teach you some sweet tricks for your machine learning endeavours.

This article was about these four: