Google just released a data search engine which aggregates data from sources like Kaggle, data.world etc. That seems to be the best option at thins point in time. Here is the link. datasetsearch.research.google.com/
Also, a lot of individual states have good data banks (e.g. CA, Vermont) with data on a variety of topics, while others states have almost nothing available online. If you care, some states like Vermont also have a separate geospatial data site, while California has CalEnviroScreen which is specifically for environmental data.
In my experience, private organizations sometimes have spotty/incomplete data, but that definitely varies. Certain countries outside the U.S. have substantial data hosted by the government at the national, regional, etc. level.
So, you can get access to basically every dataset known to man using Wolfram alpha and Mathematica, and you don't even have to know how to spell correctly or name your data sources; Wolfram spent hundreds of millions of dollars collecting all this data and connecting all the sources for you already
Just give Wolfram Alpha a try and see what data it doesn't have... it literally can collect anything on Earth for you with a few simple queries.
Every single set source mentioned here is available through Mathematica. If you don't code with Mathematica, at least to visualize problems or pull data, and you're doing ML, you are living in the stone age like the rest of us, except you don't have fire.
Google just released a data search engine which aggregates data from sources like Kaggle, data.world etc. That seems to be the best option at thins point in time. Here is the link. datasetsearch.research.google.com/
As a New Yorker, I always find it interesting that we have this data source.
opendata.cityofnewyork.us/
Wow, thanks. I wonder how many other citys have that?
Lots! data.gov
Detroit has an open data portal: data.detroitmi.gov/
Also, a lot of individual states have good data banks (e.g. CA, Vermont) with data on a variety of topics, while others states have almost nothing available online. If you care, some states like Vermont also have a separate geospatial data site, while California has CalEnviroScreen which is specifically for environmental data.
In my experience, private organizations sometimes have spotty/incomplete data, but that definitely varies. Certain countries outside the U.S. have substantial data hosted by the government at the national, regional, etc. level.
So, you can get access to basically every dataset known to man using Wolfram alpha and Mathematica, and you don't even have to know how to spell correctly or name your data sources; Wolfram spent hundreds of millions of dollars collecting all this data and connecting all the sources for you already
Just give Wolfram Alpha a try and see what data it doesn't have... it literally can collect anything on Earth for you with a few simple queries.
Every single set source mentioned here is available through Mathematica. If you don't code with Mathematica, at least to visualize problems or pull data, and you're doing ML, you are living in the stone age like the rest of us, except you don't have fire.
I've used kaggle for this.
Kaggle
I don't know what an ML model is, but I've recently used public.opendatasoft.com for a data source and they have all kinds of interesting public data.
github.com/awesomedata/awesome-pub...