Unsplash’s dataset is now open source

#machinelearning #python #datascience #tensorflow

Are you looking for image data to do research in Machine Learning and Deep Learning? Are you tired of open source image datasets that are usually limited in size, expose low-quality images, lack variability in the image data, or rely on mass labeling by 3rd party services? If your answer is 'Yes' to all these questions, I have a surprise for you in the next paragraph.

Unsplash just launched the biggest open source Image Dataset. This dataset has been used by researchers at industry-leading institutions such as Stanford University, Cornwell University, National Cheng Kung University, and Apple.

Train and test models using the largest collaborative image dataset ever openly shared. The Unsplash Dataset is created by over 200,000 contributing photographers and billions of searches across thousands of applications, uses, and contexts.

In total, the dataset contains over 2M high-quality images, with 16GB of accompanying data covering:

Keyword-image conversions in search results
Community and AI-generated keywords
EXIF, location, and landmarks
Image categories and subcategories
User-generated collections and groupings of images
Image views and downloads stats

Read the documentation at https://github.com/unsplash/datasets

To download the Dataset visit

Lite version: https://github.com/unsplash/datasets (550MB)

Full version: https://unsplash.typeform.com/to/HPVbjo (16GB)

Have fun with it and don't make a terminator.

DEV Community

Unsplash’s dataset is now open source

Top comments (0)

Read next

Safeguarding AI with Llama Guard: Ethical AI Development

AI enthusiasm #4 - Your stable diffusion chatbot🐠

Exploring Multicollinearity: Strategies for Detecting and Managing Correlated Predictors in Regression Analysis

Adopt AI, But Responsibly!