Closed captioning has always seemed to me to be a notoriously bad data set due to misspellings and misphrasing. Has anyone tried to do (a better) speech to text of a cable news channel, for instance?<p>Doing frequency and sentiment analysis on this dataset would be pretty interesting.
One of the original incarnations of Google Video was something somewhat similar to this (an index of closed-captioning data from a lot of different tv streams). What they chose to do with it was different though: they allowed you to search closed-captioned content and it would show you a few thumbnails and the time of day when those words were said on air.<p>This memory is kind of hazy, ISTR it's from 2005 or so.
I personally think there's a great deal that can be done with this data.<p>A few years ago, someone documented how to use an Arduino + Video Experimenter Shield to easily log closed captioning data (<a href="http://blog.makezine.com/2011/08/16/enough-already-the-arduino-solution-to-overexposed-celebs/" rel="nofollow">http://blog.makezine.com/2011/08/16/enough-already-the-ardui...</a>). Never got around to messing with it, but I can imagine 100 interesting things to do with that data.<p>Very cool company. I'm glad someone's doing this.
Seems like scraping all closed captioning would be very valuable data indeed. Is there anyone else doing something like this that provides an API or data feed?
I think the potential here is immense.<p>Boxfish, twitter, YouTube, Siri, and now with Ray Kurzweil @ Google... thinkers are converging on doing to every other form of content what Google did for structured documents.<p>The NLP trend is going to be amusing to watch at least (Siri, Summly), and whether its time has come in the next 5 years or not I'm not certain. But I know Ray Kurzweil knows this technology is inevitable.<p>--<p>As for BoxFish, I think this is a good example of a neatly executed, well funded startup with experienced founders and a solid space. No drama, no demo day, no immediate fires to put out, cool $3m in the bank, Deutsche Telekom AG subsidiary negotiating their deals for them, and "Yahoo just bought a kids startup for 17m" - the topic is hotter than others.
This is the type of startup I for one daydream of having stock of or working at. Has high potential to be worth $mmms or $bn in the future - you know, that all depends and what not. But the makings are clearly there. Excellent work guys! Congratulations.
Here's one endpoint that seems to work and not require an API key: <a href="http://api.boxfish.com/v2/v3/trending/topics/?fields=count" rel="nofollow">http://api.boxfish.com/v2/v3/trending/topics/?fields=count</a>
Reminds me a little of Bluefin Labs (acquired by Twitter). Just hook up this data with a sentiment-engine of Twitter and you can come up with some interesting correlations to how people react to television.<p><a href="https://bluefinlabs.com/" rel="nofollow">https://bluefinlabs.com/</a>
For those interested in a real time API of caption streams you should be sure to check out Opened Captions: <a href="http://openedcaptions.com:3000/" rel="nofollow">http://openedcaptions.com:3000/</a><p>Currently only for C-SPAN but that may change!
With the new Federal regulations stipulating that anything that originates on TV must be captioned when streamed over the internet, Boxfish will be able to get a fairly comprehensive picture of what's going on.
Is this only for US television? Or is it global?<p>What is the reach? I know several people who would be interested in this for smaller countries.<p>I couldn't find this information on the homepage.
`HN DDOS' again? Still spinning after five minutes on:<p><a href="http://boxfish.com/#!search/Klinger" rel="nofollow">http://boxfish.com/#!search/Klinger</a>