It's incredible people still try and do things like this.<p>This is such an over ambitious problem...<p>1. Webpages are ladden with errors, how do you deal with this?
2. Knowledge does not fit in a graph. It's asymptically a graph, as in: I can define relationships like: This recipe contains carrots. carrots contain sugar => this recipe contains sugar. Cool. But, what about this "sugar-free carrot cake recipe?" Well it still contains carrots, so still contains sugar... Contradiction? => requires human curating...
3. It doesn't even solve a real problem... Look at IBM watson, it probably knows a lot more crap than diffbot, and yet, is a pretty useless piece of software...
Mike,<p>fantastic work here! As someone who's really excited about a machine readable web and have been working on it, this is fantastic! Unfortunately, while the Semantic web was to tackle this, the real life proliferation of the Semweb has been, atleast to me personally, extremely disappointing.<p>So this is a fantastic initiative, personally for me to know about.<p>Is there a plan to expose this data via a dev API of some sort for enthusiasts like us?<p>Say a SPARQL or even (Open)Graph API perhaps?<p>My experiences consulting with and working with companies interested in the domain has been that monetizing this data is extremely hard both legally and quality wise.<p>"Nike Tanjun near me" is a query fraught with danger. People typing this query want to find a retailer in their vicinity that sells this Nike product, but where do we source that inventory list from and how do we get our cut?<p>Before people start talking about DSPs and SSPs, this is a very different problem at hand.<p>To know that Nike Tanjun is a shoe sold by Nike, an ontology needs to exist that captures this knowledge so that the user's query can be decoded.<p>How will that ontology be sourced? Further, for it to be usable commercially, Nike has to agree to that encoding. Therin begin the challenges. If we encoded Nike Downshifter to be Tanjun, by mistake, then the user bought them based off our results, disliked them expecting the Downshifters to be like Tanjuns, we have an issue and Nike could persue the matter because we mislead the customer and affected their branding.<p>My primary clients are search companies or companies that want to provide rich search functionality: Google, Bing or even DDG do a phenomenal job in this space and the barrier to entry is pretty high.<p>So knowledge quality, mainteanaanace, versioning and temporal resolution ("The President of the U.S.", "The iPhone" are different entities over time) aside, is diffbot going to monetize this knowledge only as a B2B offering/addon to their clients or are there other "bigger" plans to monetize this tremendous undertaking and keep it rolling in the future?