I'm learning ML and looking to find more open datasets that I can use, especially in the area of recommender/ranker systems.<p>I'm already familiar with Kaggle, but wondering what else there is out there?
In a few applications of ML that I've worked with, there is no need for an outside dataset because the program generates it's own data. For example, the data could come from a simulation of some process.
<a href="https://datasets.reddit.com" rel="nofollow">https://datasets.reddit.com</a><p><a href="https://opendata.reddit.com" rel="nofollow">https://opendata.reddit.com</a><p><a href="https://archive.ics.uci.edu/ml/datasets.php" rel="nofollow">https://archive.ics.uci.edu/ml/datasets.php</a><p><a href="https://lod-cloud.net/" rel="nofollow">https://lod-cloud.net/</a><p><a href="https://www.data.gov" rel="nofollow">https://www.data.gov</a><p><a href="https://data.un.org/" rel="nofollow">https://data.un.org/</a><p><a href="https://data.worldbank.org/" rel="nofollow">https://data.worldbank.org/</a><p><a href="https://fred.stlouisfed.org/" rel="nofollow">https://fred.stlouisfed.org/</a><p><a href="https://data.oecd.org/" rel="nofollow">https://data.oecd.org/</a><p><a href="https://www.nber.org/research/data?page=1&perPage=50" rel="nofollow">https://www.nber.org/research/data?page=1&perPage=50</a><p><a href="https://github.com/awesomedata/awesome-public-datasets" rel="nofollow">https://github.com/awesomedata/awesome-public-datasets</a><p><a href="https://github.com/datasets" rel="nofollow">https://github.com/datasets</a><p><a href="https://opendata.cern.ch/" rel="nofollow">https://opendata.cern.ch/</a><p><a href="https://data.nasa.gov/" rel="nofollow">https://data.nasa.gov/</a><p><a href="https://data.world/datasets/machine-learning" rel="nofollow">https://data.world/datasets/machine-learning</a><p><a href="https://data.noaa.gov/datasetsearch/" rel="nofollow">https://data.noaa.gov/datasetsearch/</a><p><a href="https://www.usgs.gov/products/data" rel="nofollow">https://www.usgs.gov/products/data</a><p><a href="https://www.fema.gov/about/openfema/data-sets" rel="nofollow">https://www.fema.gov/about/openfema/data-sets</a><p>etc...<p>And of course don't ignore the data you can collect yourself one way or another. A few cheap Arduino Nano or Rpi Pico boards, some sensors, and you can build quite a variety of distributed data collection systems. Use solar panels for power in remote areas, and 4G / cellular data networks and you can get data from all over the place. You can also use a cheap SDR "dongle" to pull down data from various weather satellites and other sources. And don't forget about the API's / data export mechanisms for apps you might use like Fitbit, Strava, MapMyRun, etc.