so long story's short, I was building my project I called it "yougle",<p>just like google you search for a word/sentence and then you get where that string you entered appears in youtube videos,<p>I already coded the business logic of yougle, and I already can get where strings appear in videos of some popular youtube channels(pewdiepie, etc..),<p>but the thing is I'm not sure whether this subtitles database I got by scraping youtube is actually legit or not.
I read the TOS(https://www.youtube.com/t/terms) of youtube it said that:<p>"You agree not to use or launch any automated system, including without limitation, "robots," "spiders," or "offline readers," that accesses the Service in a manner that sends more request messages to the YouTube servers in a given period of time than a human can reasonably produce in the same period by using a conventional on-line web browser. "<p>violation #1?
so basically this phrase is that you can't scrape faster than a human can browse internet, and ofcourse it'll take me years to collect data this way<p>violation #2? in the robots.txt(for those of who you don't know what robots.txt file is, its a file that tells bots which endpoints they should not crawl) file of youtube, it have an endpoint that I'm using to get subtitles out of videos:<p>"Disallow: /timedtext_video"<p>should I forget about my project?
what can I do to keep on going with the project while keeping it legit?
#1: obviously violates TOS<p>#2: rude, but probably okay<p>More important: "yougle" might be a trademark violation. I'd be surprised if it wasn't.<p>About forgetting about it: that's your decision. My personal take: I find the project utterly uninteresting, and there is no obvious business opportunity that might make the legal uncertainties worthwhile.
Turns out there is already an API for captions, so problem solved :)<p><a href="https://developers.google.com/youtube/v3/docs/captions" rel="nofollow">https://developers.google.com/youtube/v3/docs/captions</a>