This article is translated from my Japanese tech blog.
https://tmyoda.hatenablog.com/entry/20210819/1629384283
About the SETI Competition
https://www.kaggle.com/competitions/seti-breakthrough-listen
This competition is given a spectrogram of a signal and predicts anomalies in it.
(The data used in this competition has been artificially generated from a simulator)
Pipeline
Augmentation
I didn't have enough time to investigate augmentation thoroughly. For now, I used these four and mixup is included. I don't know which one is effective...
- vflip
- shift_scale_rotate
- motion_blur
- spec_augment
I wanted to use SpecAug in albumentations, so I created a class as follows.
class SpecAugment(ImageOnlyTransform):
def __init__(self, alpha=0.1, **kwargs):
super(SpecAugment, self).__init__(**kwargs)
self.spec_alpha = alpha
def apply(self, img, **params):
x = img
t0 = np.random.randint(0, x.shape[0])
delta = np.random.randint(0, int(x.shape[0] * self.spec_alpha))
x[t0:min(t0 + delta, x.shape[0])] = 0
t0 = np.random.randint(0, x.shape[1])
delta = np.random.randint(0, int(x.shape[1] * self.spec_alpha))
x[:, t0:min(t0 + delta, x.shape[1])] = 0
return x
Test Time Augmentation (TTA)
Since there are four augmentations I applied this time, I decided to perform the TTA 16 times. The number 16 was chosen because I wanted to apply all the augmentations at least once for each image during the TTA.
For example, when TTA 16 times, 4 types of augmentation, and the probability of each augmentation being applied is p=0.5, the probability of all augmentations being applied at least once can be calculated using the following formula.
TTA: 4, Augmentation 4
Resizing Network
This model is the best score so far.
I believe it would be better to input the image without resizing, but my GPU has not enough memory.
If I want to input the image without resize, I need to reduce the batch size.
However, this leads to a situation where, in the case of imbalanced data like this time (9:1), only one class appears in a batch.
So, I decided to train with the largest possible image size using this model.
Training
In this competition, the dataset was reset once, and the dataset was completely refreshed. So, I decided to use the previous data for pre-training. Doing this, the score slightly increased for both LB and CV.
Also, the pre-training of the model is fold-out, and the fine-tuning is 4Fold CV.
Model
I have encountered a problem model would not learn when enlarged (probably due to bad learning rate and scheduler) even I tried various models (nfnet, volo, swin,...).
So, I decided to use efficientnetv2_s and m which had good score.
What I tried
- AST: Audio Spectrogram Transformer ( [https://arxiv.org/pdf/2104.01778v3.pdf]): No change
- Weighted CE loss: No change
- Temperature scaling ([https://github.com/gpleiss/temperature_scaling]): Slight increase in private
- Dark magic trick ([https://www.kaggle.com/c/seti-breakthrough-listen/discussion/238722]) : Score decreased
- Don't include augmentation in the last few epochs: Score decreased: Score decreased
- Apply mixup not every time but probabilistically: No change: No change
- Adversarial validation: The distributions of train and test were too different, and there were no instances in train with high confidence in test
- Pseudo Label: No change
1st Place Solution
I was surprised by the first place solution.
I think the idea to remove this background can be used in other competitions dealing with spectrograms.
https://www.kaggle.com/c/seti-breakthrough-listen/discussion/266385
Top comments (0)