DEV Community

Cover image for Top 5 websites where obtain DataSets for your Data Science projects
David
David

Posted on

Top 5 websites where obtain DataSets for your Data Science projects

Data, Data, Everywhere!!!

In the life cycle of any Data science project, it begins by understanding the requirements and objectives of the business problem, in this phase, it is essential to have knowledge about the problem to be solved, ask the right questions and define the horizon where you want to go.

Having its initial phase already completed, we can start with the fun, the data acquisition!!

But… at a conceptual level, what is data?

The data, are all that information extracted from reality, which is recorded in some physical or symbolic support, also implies a conceptual elaboration and has to be expressed in some form of language.

I get that if you’re a data scientist, you’ll know what data is, but I wanted to put that definition somewhere. ^___^

In this acquisition phase, we have to be especially careful, since the quality of our work will depend largely on the quality of our data, here a famous phrase comes to mind: “Garbage in, garbage out”, which It perfectly describes what happens when we introduce garbage data into our Machine Learning models. The interesting thing here is that in this post, I’m going to show you five excellent places where you can obtain your data for your projects.

Kaggle

Let's start with the best!!

Image description

Kaggle is a web platform that brings together the largest Data Science community in the world, with more than 536 thousand active members in 194 countries, this platform give you a provide all the most important tools and resources to make maximum progress in Data science. Too permit explore analyze and share high quality data.

Kaggle supports a variety of publishing formats for your datasets, such as CSVs, JSON, SQlite, Archives like zip and rar, BigQuery and other datafile formats.

Databases are not only public and accessible, but also easier for more people to use, regardless of their tools. Due to the active community, challenges, and available code, is the most recommended for everyone independent of the level of expertise that could have.

World Bank Data

Free and open access to global development data!

Image description

This site is designed to make World Bank data easy to find, download, and use.

All of the data found here can be used free of charge with minimal restrictions, this is a great place where find public data of differents places around the world. time series, debt statistics and world development indicators.

Recommended for people that makes social, financial and demographic studies.

Public Database of your Country

home Sweet Home.

Normally there are public data generated, saved and published by the Governments of your respectives countries and cities, an example is the public database of Buenos Aires, also called Federal Capital, it is the capital and most populous city of the Argentine Republic.

It is always interesting to check data about your place of origin, your homeland, your identity, just google it ;)

Google Dataset Search

Google Dataset Search It is a search engine for data sets. Users can discover datasets hosted in thousands of repositories across the Web through a simple keyword search.

This is a great tool that allows you to quickly and easily search and download datasets of your specific interest, providing you with relevant information.

*Don’t waste much time, just Google it ;)
*

Google trends

Google Trends, also called Google Search Trends, it's also a tool from Google Labs that shows the most popular search terms of the recent past.

Google Trends graphs represent how often a search for a particular term is performed in various regions of the world and in various languages. The X-axis of the graph represents time, and the Y-axis represents the frequency with which the term has been searched globally. It also allows the user to compare the search volume between two or more terms.

An additional feature of Google Trends is the ability to display news related to the search term above the graph, showing how events affect popularity.

Final Thoughts and Closing Comments

Today the amount of data generated per day, hour, minute and second is enormous, and thanks to that, any role that seeks to take advantage of the data, will be very well positioned and will be coveted by those companies aware of the urgent need to extract juice from all that data, as they say somewhere on the internet, data is the new oil.

If you are interested in data science, machine learning, artificial intelligence and education, let’s get in touch and follow me for more!!( ^-^)**(^0^ )

Thank you very much for coming this far, I hope these resources will help you, any comments and feedback are welcome.╰(°▽°)╯

Top comments (0)