Training a Vision Transformer on Amazon SageMaker

#deeplearning #opensource #computervision #aws

In this series of three videos, I focus on training a Vision Transformer model on Amazon SageMaker.

In the first video, I start from the « Dogs vs Cats » dataset on Kaggle, and I extract a subset of images that I upload to S3. Then, using SageMaker Processing, I run a script that loads the images directly from S3 into memory, extracts their features using the Vision Transformer feature extractor, and stores them in S3 as Hugging Face datasets for image classification.

In the second video, I start from the image classification dataset that I prepared in the first video. Then, I download a pre-trained Vision Transformer from the Hugging Face hub, and I fine-tune it on my dataset, using a training script based on the Trainer API in the Transformers library.

In the third video, I start from the image classification dataset that I prepared in the first video. Then, I download a pre-trained base Vision Transformer from the Hugging Face hub, and I use PyTorch Lightning to append a classification layer to it. Finally, I train the model using the Trainer API in PyTorch Lightning.

Resources: