DEV Community

Whats the best source of raw data

Fulton Browne on March 12, 2020

I am looking for some data for ML models I am working on, whats the best source?

Collapse
 
ravishkr13 profile image
Ravish kumar

Google just released a data search engine which aggregates data from sources like Kaggle, data.world etc. That seems to be the best option at thins point in time. Here is the link. datasetsearch.research.google.com/

Collapse
 
ben profile image
Ben Halpern

As a New Yorker, I always find it interesting that we have this data source.

opendata.cityofnewyork.us/

Collapse
 
fultonbrowne profile image
Fulton Browne

Wow, thanks. I wonder how many other citys have that?

Collapse
 
charlesdlandau profile image
Charles Landau • Edited

Lots! data.gov

Collapse
 
melissarose profile image
Melissa Rose

Detroit has an open data portal: data.detroitmi.gov/

Collapse
 
liama482 profile image
Liam A.

Also, a lot of individual states have good data banks (e.g. CA, Vermont) with data on a variety of topics, while others states have almost nothing available online. If you care, some states like Vermont also have a separate geospatial data site, while California has CalEnviroScreen which is specifically for environmental data.

In my experience, private organizations sometimes have spotty/incomplete data, but that definitely varies. Certain countries outside the U.S. have substantial data hosted by the government at the national, regional, etc. level.

Collapse
 
onexdata profile image
Nick Steele

So, you can get access to basically every dataset known to man using Wolfram alpha and Mathematica, and you don't even have to know how to spell correctly or name your data sources; Wolfram spent hundreds of millions of dollars collecting all this data and connecting all the sources for you already

Just give Wolfram Alpha a try and see what data it doesn't have... it literally can collect anything on Earth for you with a few simple queries.

Every single set source mentioned here is available through Mathematica. If you don't code with Mathematica, at least to visualize problems or pull data, and you're doing ML, you are living in the stone age like the rest of us, except you don't have fire.

Collapse
 
deciduously profile image
Ben Lovy

I've used kaggle for this.

Collapse
 
ad0791 profile image
Alexandro Disla

Kaggle

Collapse
 
bugsysailor profile image
Bugsy Sailor

I don't know what an ML model is, but I've recently used public.opendatasoft.com for a data source and they have all kinds of interesting public data.

Collapse
 
aleksandrhovhannisyan profile image
Aleksandr Hovhannisyan