I am thinking of creating a free REST API service that will allow users to list tournaments, schedules, match results etc - for basketball, cricket, badminton. How do I get the data into my database? Do I need to purchase this from someone or I simply do web scraping of different websites to get the data I want?
If you want to give this data away for free, thats noble of you, but might trigger lawsuits.<p>In general, larger players have connections to private apis that come right from the field. Sports betting, espn, etc all pay for expensive connections to get live data.<p>The historical data will be time consuming but most likely legally safe.<p>Its the live match results data that you should not scrape in my opinion. Technically it will take some time to set up, but its possible. Its the legal part of all this that I would not recommend.<p>If i were you, id make two services: free historical data, and a paid live data api. I dont know much about sports data api in particular, but my question is why isnt this already out there? Probably because its very expensive to get enough feeds to have a live api anyone wants to use.
Don’t purchase anything if it isn’t for live play by play feeds. That can run you almost 500-1k a year per sport (so it will add up).<p>A personal solution I’ve been considering is a scraper that knows how to scrape across a few different sites (so you have fallbacks). This can be used to
scrape real time data and distribute data fetching to avoid rate limits or reliance on one site (let’s say cbs sports).<p>For historical data, you’ll have to get attuned to wiki or specific sports sites that have archives (they are out there). Perfect job to outsource to anyone really if you don’t feel like doing the tedious work.<p>I’d say scraping is the way so long as your scraper knows how to get the same data from multiple sources. If you got the cash then just google a sports data api.