DEV Community

Cover image for Top 10 Best Machine Learning Datasets
saadhassan
saadhassan

Posted on

Top 10 Best Machine Learning Datasets

For the development of AI ,machine learning and data science project its important to gather relevant data. Below given are the 10 best machine learning datasets such a way that you can download the dataset and can develop your machine learning project.

1. ImageNet

ImageNet is one of the best datasets for machine learning. Generally, it can be used in computer vision . This project is an image dataset, it was developed by Fei Fei Li and other researcher working on computer vision. See their TED talk here https://www.youtube.com/watch?v=40riCqvRoMs .
Link to the dataset:
http://www.image-net.org/download-faq

2. Indians Diabetics Dataset

If you want to apply machine learning in health care,then you can use this Pima Indian Diabetics dataset in your healthcare system. We all know that diabetes is one of the most common dangerous diseases. You can use this dataset in your diabetes detection system. This dataset is from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective of this dataset is to predict whether or not a patient has diabetes based on specific diagnostic measurement.

https://www.kaggle.com/uciml/pima-indians-diabetes-database

3. Boston House Price Dataset

Do you want to practice regression algorithm? Then you can use this dataset in your machine learning problem. This dataset is collected from the area of Boston Mass.
https://www.kaggle.com/vikrishnan/boston-house-prices

4. HotpotQA

Do you want to work with natural language processing? We all know natural language processing covers a big range area in machine learning. So, if you want to develop a system based on natural language processing (NLP) concept then this dataset is for you my friend. It is collected by a team of NLP researchers at Carnegie Mellon University, Stanford University.
https://hotpotqa.github.io/

5. Labelme

Image processing is one of the amazing is of machine learning. If you are interested in developing an image processing system, then you can use this Labelme dataset in your machine learning project. This dataset is a large volume dataset of annotated images.
http://labelme2.csail.mit.edu/Release3.0/browserTools/php/dataset.php

6. Facial Image Dataset

You can use this interesting machine learning dataset for your computer vision project. This dataset is standard and free to use. Moreover, it contains a variation of data like variation of background and scale, and variation of expressions. This standard dataset helps to evaluate a system precisely.
https://cswww.essex.ac.uk/mv/allfaces/faces94.html

7. Chars74K Dataset

Optical Character recognition is one of the classic classification problems of pattern recognition. This interesting machine learning dataset consists of 64 classes (0-9, A-Z, a-z), 7705 characters taken from natural images, 3410 hand-drawn characters, and 62992 synthesized characters from computer fonts.
http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/#download

8. YouTube Dataset

Are you an expert in machine learning research area or want to do something with video classification? Then, this dataset for machine learning project might help you. Also, you might be glad to know that Google has shared a labeled dataset with 8M classified YouTube Videos and its’ IDs
https://research.google.com/youtube8m/.

9. Amazon Reviews Dataset

We all know natural language processing is about text data. To solve a real-world application, you need ML dataset. Also, this Amazon reviews dataset is one of them. It contains 35 million reviews from Amazon spanning 18 years (up to March 2013).
https://snap.stanford.edu/data/web-Amazon.html

10. xView

If you are an expert in machine learning and you can handle a tricky problem or project, then I must suggest you use this dataset in your project or system. This dataset is one of the standard datasets for imaging problem. Moreover, it is one of the most extensive public datasets.
http://xviewdataset.org/#dataset

CLOSING WORDS:
Dataset is an integral part of machine learning applications. It can be available in different formats like .txt, .csv, and many more. In supervised machine learning, the labeled training dataset is used, and in unsupervised, no label is needed. If you are a beginner, we recommend you to read this article thoroughly.

Thankyou for your time.
Happy Reading

Top comments (0)