Like others here, it's something I've been thinking about for a number of years.<p>This is an important project, with the potential to eclipse wikipedia, maybe even growing to be the saviour of free software? My reasoning follows.<p>Currently we program computers by giving them a set of instructions on how to achieve a goal. As computers grow more powerful, we will stop giving detailed instructions. Instead, we will write a general purpose deduction/inference engine, feed in a volume of raw data and let the computer derive the instructions it must follow to achieve the given goal.<p>There are two parts to such a system: the engine and the data. The engine is something that free software is capable of producing. The missing component is the data. The wikidata project is this missing component.<p>I'm convinced that Wolfram Alpha is a glimpse of this future: an engine coupled to a growing body of structured data. Wolfram's end game isn't taking over search, but taking over computer programming and ultimately reasoning. It's just that search is currently a tractable problem for Alpha, one that can pay the bills until it becomes more capable. There will come a day when Alpha is powerful enough to automatically translate natural language into structured data, at which point it will spider the Internet and its database and capabilities will grow explosively.<p>Free software needs Wikidata, to arrive at this endpoint first and avoid being made largely irrelevant by Alpha (or Google?)
For people interested in this subject, you might want to check out the DBPedia project: <a href="http://dbpedia.org/About" rel="nofollow">http://dbpedia.org/About</a>. They have been extracting structured data from Wikipedia for quite some time already and allow you to query their database with SPARQL.<p>From their site:
The DBpedia knowledge base currently describes more than 3.64 million things, out of which 1.83 million are classified in a consistent Ontology, including 416,000 persons, 526,000 places, 106,000 music albums, 60,000 films, 17,500 video games, 169,000 organisations, 183,000 species and 5,400 diseases.
tl;dr: spin-off Wikipedia infoboxes into a seperate project with an API, and then use that data to bootstrap an open data project with broader goals.<p>In theory, it's a good idea. It takes an existing useful data source and puts in a form that encourages reuse, and since it solves the bootstrapping problem then it's not obviously doomed to failure like the Semantic Web.<p>I see two potential downsides.<p>My first concern is that, in practice, it will make editing Wikipedia more complex. There's no inherent reason why this should be the case, but there's no inherent reason why Wikimedia Commons should make editing Wikipedia more complex either, yet it undeniably does.<p>Secondly, it will prevent a similar source of data from appearing with broader terms of use. For example, OpenLibrary is public domain.
This is actually a startup idea I've had for a while now. It's a great idea in theory, but it's very tricky in practice. Facts have a mysterious way of vanishing if you look closely enough at them, and the raw numbers themselves don't actually tell you anything.<p>The part that's actually interesting is:<p>- The methodology behind the numbers<p>- What we think is most likely the case based on the evidence available<p>- How each fact connects with other facts<p>- What we think we should do based on the evidence available<p>Being able to embed facts is definitely a cool use case, but unless you have all the other stuff backing it up when you click the link back to the database then it's pretty much worthless. And curating these sorts of epistemological discussions and third party analyses isn't something that really fits within the Wikimedia mission, so I doubt they will even try.<p>Because of this I doubt their implementation of the project will be successful, although I do think it's a space that ultimately has potential.
Nice to see they're going to support SPARQL:<p>"O3.1. Develop and prepare a SPARQL endpoint to the data. Even though a full-fledged SPARQL endpoint to the data will likely be impossible, we can provide a SPARQL endpoints that allows certain patterns of queries depending on the expressivity supported by the back end."<p>I see the semantic web slowly realizing its actual purpose (which is not related to semantic natural language processing but rather linking data).
Hats off to Wikimedia, a beacon of the true ideals of the free Internet; they've never tried to monetize their substantial achievements, really made a difference, and actually realized what for other companies have been merely lip service (i.e. freeing up information).
Now this is interesting (from the page):<p>"Wikidata is a secondary database. Wikidata will not simply record statements, but <i></i>it will also record their sources, thus also allowing to reflect the diversity of knowledge available in reality<i></i>."<p>That sounds pretty cool to me, because you could potentially upload probabalistic data from statistical analysis. If they make this so that you can tell how reliable the source is, you could upload information that's accurate to a given degree of probability.<p>It would be very interesting if you could version data by reliability, so that less-reliable data could eventually be replaced by definitive data. This is an achilles heel of current data modeling systems.
My concern for the potential for abuse in this project is much greater than that of wikipedia. How is wikimedia going to ensure that there are no malicious edits to this data? Any changes will almost certainly need stringent peer review.<p>Edit: As an afterthought, it would make a lot of sense to manage it like a git repository, where someone could submit a pull request for data changes, and then some subgroup or a trusted percentage of the population approves the request and it gets merged into the master dataset.
One area I really want to see this take off in is Medicine.<p>As someone who had suffered from an unknown illness (no doctor could figure it out), I can rationalize how such a system would have been helpful. You see a bit of this with WebMD's Symptom Checker, but I feel tools like that aren't comprehensive enough and we end up with a lot of cyberchondria. You can't rely on co-relation to find absolute answers, but helping map out symptoms, lifestyle choices may be a tool to finding solutions faster.<p>It took about a year to resolve my illness. Going to the doctor 2-4 times a week for 10-20 minutes isn't enough to work with when you have no clear-cut diagnosis.<p>Now, to be clear, I am not talking about replacing doctors or devaluing doctors by allowing everyone to <i>be an expert</i>.
It's hardly new... it's been a non starter for about 5 years: <a href="http://lists.wikimedia.org/pipermail/wikidata-l/" rel="nofollow">http://lists.wikimedia.org/pipermail/wikidata-l/</a>
This might be very interesting if it's implemented in a sane way. Unfortunately there doesn't seem to be a very widely-adopted standard in the world of open data for now..
This kind of reminds me of <a href="http://dabanese.blogspot.com/2009/09/introduction.html" rel="nofollow">http://dabanese.blogspot.com/2009/09/introduction.html</a>