Show HN: TheBigDB - A simpler open database of facts

133 pointsby christophe971about 12 years ago

22 comments

barakmabout 12 years ago

Former Freebase, current Google engineer here:First of all, let me say that I'm glad more people are thinking/working in the space of triples. Even unstructured ones like this.But when there's no semi-strict schema, it gets really, really tricky. Free text is hard, and actual meaning is hard to separate. (I say semi-strict, as Freebase is schema-last -- feel free to create your own! -- but has some level of enforcement)For specific domains you may be okay with tags. And for some limited applications it probably works great. Triples are cool!But when you start talking about larger, broader, datasets, ones that no one person or small group can curate, you're going to start running into collision.There's certainly an argument to be made for metaschema -- <a href="https://developers.google.com/freebase/v1/search-metaschema" rel="nofollow">https://developers.google.com/freebase/v1/search-metaschema</a> -- and crowdsourcing these sorts of things could be interesting.I think there's a lot of interesting work to be done. But I doubt that this is "better" per se, or at the very least, is little more than a toy.And hey, I built such a toy graph engine once upon a time (be gentle -- it was really a demo hack) <a href="https://github.com/barakmich/jgd" rel="nofollow">https://github.com/barakmich/jgd</a> -- you can even query it with Freebase's old MQL. (Which I have mixed feelings about, but is cool in its own way)I guess my argument is, don't throw the baby out with the bathwater. And feel free to ping me for more!

评论 #5493763 未加载

crazygringoabout 12 years ago

Comparison with Freebase:> Simpler structure: There are no datatypes, namespaces, lists, domains. Just ordered nodes. Having a dead simple structure like that allows developers to quickly and intuitively know how to access the info they want.I don't see how this makes it simpler or intuitive at ALL. If there's no convention as to whether I should search for "born on" or "born_date" or "year_born", or whether the date will be "1900-08-01" or "08-01-1900" or "1900/08"... then how is this supposed to be useful?The central problem is, there are lots of textual ways of describing the same thing. Without standardized datatypes and standardized tags, it quickly becomes a messy, useless free-for-all.I don't see how TheBigDB gets around this. The FAQ explains how it's different from Freebase/Wikidata, but I don't at all understand how it's supposed to be better, or even as good.

评论 #5493623 未加载

评论 #5493274 未加载

评论 #5495407 未加载

andyjohnson0about 12 years ago

Sounds like a much simplified version of Douglas Lenat's Cyc project [1], which has been going since the mid eighties and is attempting to build a structured knowledgebase/ontology of everyday knowledge. They have freely downloadable subset called OpenCyc [2]. It seemed pretty impressive last time I looked at it.[1] <a href="http://en.wikipedia.org/wiki/Cyc" rel="nofollow">http://en.wikipedia.org/wiki/Cyc</a>[2] <a href="http://www.cyc.com/platform/opencyc" rel="nofollow">http://www.cyc.com/platform/opencyc</a>

评论 #5493638 未加载

评论 #5493096 未加载

评论 #5493519 未加载

评论 #5494261 未加载

ChuckMcMabout 12 years ago

I wonder if you could do machine learning on schemata. Basically start learning about dates (as an example) and as it learns updates the information with what it has learned. Something that has one person putting in { name "foo", born "10/1/92"} and someone else putting in { name "bar", born "september 30th, 1966" } and then going back and replacing the dates with an ISO standard date type but with a change history so you could look backwards in time at the data and see how the database had "improved" it. (or not). Then by voting on the improvements you teach the system to clean up its data representations. Crazy? Insightful? Stupid? I don't know but it was the question that popped into my head.

评论 #5496987 未加载

评论 #5495839 未加载

评论 #5496030 未加载

troymcabout 12 years ago

One nice property of the Wikidata database is that it is a "secondary database. Wikidata will record not just statements, but their sources, thus reflecting the diversity of knowledge available and supporting the notion of verifiability." [1]I think that's far better than voting. Voting for facts amounts to relying on a logical fallacy: appeal to the majority. [2] (Voting is fine for popularity contests, or things that can only be matters of opinion, but facts?)[1] <a href="http://www.wikidata.org/wiki/Wikidata:Introduction" rel="nofollow">http://www.wikidata.org/wiki/Wikidata:Introduction</a>[2] <a href="https://en.wikipedia.org/wiki/Argumentum_ad_populum" rel="nofollow">https://en.wikipedia.org/wiki/Argumentum_ad_populum</a>

kmike84about 12 years ago

Is it possible to download all data and use it under some open license (like CC-BY)? I can't find data license terms.If no, then sorry, freebase is vastly superior IMHO - from user's point of view I don't see a point in a crowdsourced proprietary database (even if API is currently free).

评论 #5493366 未加载

dansoabout 12 years ago

Have you/do you plan to seed your database with the already structured data from freebase? It should be relatively straightforward, right? Well, I mean, minus the time to properly map the Freebase schema into your format. But that's probably less time than it takes to wait for people to fill in enough facts.

oelmekkiabout 12 years ago

Congrats, it looks great.So, if I understand correctly, it let people crowdsource any kind of structured and descriptive data ?

评论 #5492930 未加载

namankabout 12 years ago

Excellent! I've been working on something similar. Trying to come up with a schema that is data-centric is hard enough let alone focusing on the ease of use by developers. Good luck!Can I send how many requests I want?, I think you might mean Can I send as many requests as I want? ?

评论 #5493576 未加载

timdorrabout 12 years ago

Any plans to degrade votes over time, so that new or updated facts can more quickly gain precedence?

评论 #5492974 未加载

评论 #5493042 未加载

pmtarantinoabout 12 years ago

Any chance to release this as an open source? For example, people would like to have installed in their servers and use it for their own things. I think it would be useful for fandom. For example, the Star Wars DB or Lord of the Rings Db :-)

评论 #5493183 未加载

fenivabout 12 years ago

Don't be deterred by the negative comments about the unstructured data. It's a tough problem but not an impossible one. I know because I'm also battling the same question building a free-form NLP based self tracking app to help track daily data ( <a href="http://thyself.io" rel="nofollow">http://thyself.io</a> ). The problem for me is that it's hard to perform analytics when one datapoint is in "miles walked" and the other is in "laps ran".As you said, conventions help mitigate the problem a little bit but the end user can hardly be expected to stick to best practices.I have hope though. This is a problem worth solving.

vineetabout 12 years ago

Reminds me of Freebase. They built a huge data-set as well as tools and an api to access themselves. Have you talked to anyone on the team? (they are now at Google) How would you say that you are different from them?

评论 #5493185 未加载

jeffdavisabout 12 years ago

I like this idea in the sense of an experiment. I'm not sure where it will end up, but it could be interesting.As others have pointed out, some kind of conventions must be established around the semantics, and something must be done to avoid redundancy (which leads to inconsistency) and ambiguity.I agree with those criticisms, but if the community also helps develop the schema, it will be interesting to see. What collisions will happen? What will be the result of queries that reach far across disciplines?

adventuredabout 12 years ago

I appreciate any new service that attempts to organize data / information. With that in mind, I hope this succeeds.A suggestion: it needs a demo query box on the site. Shouldn't be too hard to let a rate limited IP address throw a few keywords at it and spit back results. I'd like to see what the db contains before I invest too much time (how many topics, how many facts, etc).

评论 #5493937 未加载

gojomoabout 12 years ago

Based on observations and prior experience (esp. Bitzi), I believe the wiki approach of "correct-in-place" leads to better convergence and community than "downvote the errors, add a corrected entry, upvote the better entry".(Voting democracy may help prevent people from being oppressed in certain ways, but it isn't much of a truth-discovery mechanism.)

indeyetsabout 12 years ago

Interesting concept. It's like RDF for human beings. It's easier for human beings to look at unstructured data, but at the same time it makes it extremely hard to do interesting stuff programmatically. You just can't do reliable inferencing

xivSolutionsabout 12 years ago

Cool project, man. And, way to show class in handling the "detractors" here. I firmly believe that constructive debate is a good thing.

c0n5pir4cyabout 12 years ago

Also quite similar to freebase: <a href="http://www.freebase.com/" rel="nofollow">http://www.freebase.com/</a>

评论 #5493155 未加载

ximengabout 12 years ago

You should have a list of the most recent facts added to give people a taste of what's in the database.

评论 #5493569 未加载

hnriotabout 12 years ago

Do you support downloads of the data?

peteypao2013about 12 years ago

Ahh, too yellow!

评论 #5493610 未加载