TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Wikipedia's in Trouble (2019)

103 pointsby sanquiover 4 years ago

20 comments

The_Colonelover 4 years ago
Most of those mentioned issues are quite minor - some of them (like talk pages containing copy pasted content) seem actually like non-issues.<p>Or the category thing - yes, it has circular references which breaks the nice theory behind it, but what practical issues to the readers or editors does this present?<p>The syntax is messy, yes and it&#x27;s a problem. But converting it to a markdown&#x2F;HTML is downright absurd. I think the current approach of improving it step by step is the most reasonable. Wikipedia is a project which can afford to sustain its own custom syntax (because of history, compatibility concerns and wikipedia&#x27;s special needs).<p>I think the biggest issue isn&#x27;t technical in nature - it&#x27;s the actual task of building the encyclopedia - to decide together on its scope, decide what is notable and what isn&#x27;t. To fight against brigading from interest groups, fight against attempts to game their rules. All of these are intrinsically difficult tasks and of course wikipedia has made many mistakes. But even then it still fares pretty well in my book.
评论 #25304320 未加载
评论 #25305874 未加载
评论 #25305930 未加载
4caoover 4 years ago
This is a good summary of some of the technical issues Wikipedia is facing. Those can be overcome and eventually will.<p>The really serious problems are organizational in nature: the overgrowth of bureaucracy, the jungle of policies and &quot;sort-of&quot; policies that can be weaponized, the emergence of an inner clique skilled in navigating the quagmire, and the resulting impossibly high barrier to entry for new contributors.<p>The experience of contributing to Wikipedia these days is inherently adversarial. People come and write something genuinely useful or at least do so in good faith, and then realize they unleashed on themselves a whole lengthy process when they have to defend what they did on numerous talk pages over an extended period of time, or succumb to deletionists. Even if they prevail, the experience is hardly pleasant, so they eventually leave and never look back.<p>An artefact of the above is also that the editorial base suffers from an overrepresentation of people with vested interests, as they will always be the most highly-motivated to stay. There are of course many great editors too but the way things are going, the project is not sustainable, and its quality is being compromised due to an organizational failure.<p>What Wikipedia needs is not Markdown (not that I have anything against it) but courage, fresh air, and less of the siege mentality.
评论 #25306852 未加载
评论 #25307794 未加载
评论 #25309419 未加载
评论 #25307189 未加载
kayodelycaonover 4 years ago
I think this article shoots itself in the foot by suggesting markdown.<p>Wikipedia predates markdown by several years. The only truly “universal” markup language I know of at the time was bbcode.<p>Even if markdown has been designed, it’s lacking support for templates and tables that wikis make heavy use of.<p>Every wiki software I’ve used has had a custom language for one reason or another. Usually good reasons.
评论 #25302760 未加载
评论 #25302990 未加载
评论 #25304603 未加载
评论 #25303326 未加载
bawolffover 4 years ago
I disagree with most of this (for context im a mediawiki developer. I used to work for WMF but dont anymore. My opinions are my own):<p>-wikitext unparsable - wikitext is a bit insane but there exists a parser called parsoid. If we couldn&#x27;t parse it it would be impossible to make a visual editor<p>- most pages are redirects: not sure what the problem is.<p>-boilerplate text - there is a template system to have repeated text in only one place. Not sure what the issue is.<p>- he goes on a rant about no search without giving much context. There is a search feature based on elastic search (older versions were based on lucene directly). In my opinion its a pretty decent search engine (especially compared to most sites that make their own search). I&#x27;m not sure what the actual complaint is.<p>-complaints about wikidata - this is more political than technical, however &quot;If wikidata was a company, it would not exist anymore, and you wouldn&#x27;t have heard of it.&quot; seems patently false. Wikidata is pretty popular even outside of academia, and is used quite extensively.<p>- category tree being a graph not a tree - that&#x27;s kind of unfortunate but what exactly is the problem here. Its a problem on commons, but i&#x27;ve never really seen how its an issue in practise on wikipedia. Complex categorization is being taken over by wikidata anyways.<p>Template ecosystem is complex - it could certainly be better, but the complexity here is a trade-off allowing more flexibility and allowing the system to evolve.<p>Inclusionist vs deletionist: no comment<p>UI design: perhaps a fair point here, although i do kind of like the stability of the current design. Most of the modern web sucks imho.<p>Moral failure: i dont want wikipedia to fix the world. That&#x27;s not its role. Its job is to document, not to partake.<p>Viz editor not being enabled by default: i agree, although i think it was pushed too hard in early days when there was still kinks, but its long past time now.<p>To be clear, i definitely dont think wikipedia is perfect, i just disagree with some of these specific criticisms.
评论 #25307707 未加载
评论 #25315219 未加载
PaulHouleover 4 years ago
If you are extracting information from Wikipedia and trying to parse the markup you&#x27;re doing it wrong. It&#x27;s the difference between &quot;screwed around with handwritten parsers for years and it still doesn&#x27;t work 100% right&quot; and &quot;look this CSS selector extracts what you want.&quot;<p>Freebase, DBpedia, and many others (me) have tried, but the reality is that the markup language is poorly defined and the only path that is really tested is the one that ends up rendering HTML.<p>If you feed HTML from Wikipedia into a web parser that supports the DOM (say Beautiful Soup) you can generally parse out what you want pretty effectively. Once I switched from the &quot;markup rabbithole&quot; to &quot;parsing standard HTML&quot; I was able to turn my MediaWiki extractor into a Flickr extractor in about 15 minutes.
评论 #25303682 未加载
8bitsruleover 4 years ago
Wikipedia is a complicated mess? So&#x27;s the world, and for many of the same reasons. Look at the average town ... yours, for example. Where -wouldn&#x27;t- you take visitors?<p>What matters to Wikipedia visitors is content quality (and managed depth) and reliability. All the rest is froth. Layout? Are you kidding?<p>Could it be better? Hell yes. It&#x27;d be great to see many articles, written by committee, visited by a professional editor with years of proven experience -in that category-. The Foundation&#x27;s got the money to pay one per category. Each has to visit 5 articles per day. Each they finish is locked and marke &#x27;pro-edited in (the year)&#x27;. Make that fact a search filter. (&#x27;Administrators&#x27; choosing? Brrrr.)<p>Then there&#x27;s Wikibestia.org. Every year, copy a limit of 1 million articles there. Chosen how? Poll the visitors, it&#x27;s theirs! &#x27;Wikibestia ... the people&#x27;s choice!&#x27; 50 net downvotes? Guillotine! Big contest, media advertising, whatever.<p>Articles that can only be read by subject-matter experts completely miss the point. They&#x27;re just there for vanity or whatever. Flag them, give them one year to move to Wikiexpertia, while versions for non-experts are prepared (or not), then delete.
评论 #25305796 未加载
fortran77over 4 years ago
It bothers me that he doesn&#x27;t know &quot;its&quot; from &quot;it&#x27;s.&quot; It probably bothers others when he edits articles and spells it wrong.<p>I think the biggest problem he mentioned is the &quot;exclusionists.&quot; There&#x27;s no harm in including a page on every local elected official, every Nobel prize winner, even every band that has a reference or two somewhere. Most likely it will remain a little article, but so what? The history of small local bands, for example, would be very interesting 100, 200 years from now. (Wouldn&#x27;t it be fun to read about some town&#x27;s local musical groups in 18th century America?)
评论 #25308950 未加载
hatefulover 4 years ago
The solution to all this seems simple to me. Have the site just accept both formats, the old wiki format and the new [not yet decided] format. Then each page will just have to have a format option. Then, slowly, pages can be converted, even if manually - perhaps just auto-converting and then fixing the manual stuff. And I&#x27;m sure there&#x27;s a number of articles that are just simple links and they can be converted right away.<p>You can put bounties or some kind of incentive for people to take on the conversion, but I think if the new format is better, a number of people will feel strongly enough about it to want to convert each article when it&#x27;s updated. And even if 12 years from now some articles aren&#x27;t converted, so what?
评论 #25307498 未加载
SmokeyHamsterover 4 years ago
Yeah, I generally agree. Technologically, it&#x27;s a dumpster fire. And even ideologically, it&#x27;s just become a lefty political circle-jerk where any facts that are not endorsed, in some cases literally, by MSNBC are immediately purged by self-appointed gate keepers and admins.<p>But that&#x27;s fine. It&#x27;ll create more innovation as people ditch it and create better tools. The one part of the article I disagree with is where they say it&#x27;s &quot;unparseable&quot;, which obviously it&#x27;s not. It&#x27;s parsed millions of times a day. Even by third party tools that harvest it for semantic data.<p>That means it can be transformed and used to seed other databases using non-terrible description languages like Markdown.
评论 #25307010 未加载
artagnonover 4 years ago
Yes, this is the unfortunate reality. It happens to the best of software as it ages. The article doesn&#x27;t address the elephant in the room: MediaWiki is really the _only_ solution to a difficult problem that we have today. Which other markup language is so comprehensive? Yes, it&#x27;s unnecessarily complicated, but let&#x27;s understand that the constraints are very tight: the data can&#x27;t be separated out of the markup, and it&#x27;s impossible to make backward-incompatible changes. Many of the really great pages on Wikipedia render beautifully.<p>It&#x27;s untrue that they haven&#x27;t added features over the years: they have, but they do it at a glacial pace. Consider one of the recent ones: when you hover over a link, a box pops up and shows a preview. They used to render math on the client-side, but now, they just convert to SVG: as a result, it loads instantly, and renders consistently. The WYSIWIG editor rollout was slow, because people felt that it would attract low-quality low-commitment edits. They first released it as an optional feature, and then turned it on by default, when they were confident that it worked as intended. Oh, and my favorite? Allowing an article to start with a lower-cased word (say iOS); I remember that there were a bunch of redirects just to correct for this deficiency.<p>Yes, it is a giant pain to edit some pages in that arcane syntax, but nothing else even comes close in terms of features.<p>Yes, there are an enormous number of templates, but in practice, an infrequent contributor just finds a page that uses a similar template, and copies it out.<p>Yes, there are lots of bots, and they try very hard to guard against spam, without making you sign up or even solve a CAPTCHA to edit. Plenty of bot edits are &quot;good&quot; edits: they revert rage-rewrites, rage-deletions, and all kinds of malicious user behavior.<p>What you don&#x27;t understand about redirects is that, the good ones can&#x27;t be automated. It&#x27;s not a string-matching problem. Yes, they could automate &#x2F;some&#x2F; of the redirects, and they try. I&#x27;ve personally never run into a typo-redirect in recent years.<p>Yes, it can get political at times, and it&#x27;s _very_ difficult to have objective guidelines about which pages are worthy of existing. Politicians&#x27; pages often get locked, when there&#x27;s an upcoming election, and this means that you need an account to edit. Again, MW has lots of great features.<p>Wikipedia is aging, and nobody can deny that, but who would want to do the thankless work of parsing the markup and porting it to another system, AND correct the breakages? What commercial value does it have, and who&#x27;s going to fund it?
bzb6over 4 years ago
This blog is unreadable in mobile Safari.
评论 #25306318 未加载
评论 #25307139 未加载
awinter-pyover 4 years ago
&gt; Work is being ravenously deleted all the time<p>this reminds me of the library in the david brin uplift books, where the myth is that it contains perfect knowledge but in fact there are sinister memory holes<p>(not saying it&#x27;s sinister in this case)
评论 #25306930 未加载
WorldPeasover 4 years ago
Could one not scrape the html-rendered pages, then transfer it to markdown? I see the bulk of the complication with the Einstein article is in citations, could one not simply write a script to track all citations, and create numeric links to them at the end of the document?
hk__2over 4 years ago
&gt; After a year of development, when it was completed, wikipedia editors voted en-mass against integrating the visual editor.<p>To be more accurate, those are the editors on the English Wikipedia (the largest one). The French Wikipedia, for instance, have this visual editor enabled.
ultimateocelotover 4 years ago
If you want to extract info from Wikipedia, you could use MediaWiki to parse the content and then scrape the (relatively) consistently formatted pages. Might be an easier angle of attack.
评论 #25307542 未加载
biryani_chickenover 4 years ago
Is there some kind of universal intermediate format that could be targeted by parsers and renders into html? maybe something from pandoc?
liminalover 4 years ago
I didn&#x27;t know that Freebase offered their db contents for free and Wikimedia declined. What a stupid, short-sighted loss to the community.
评论 #25304267 未加载
thrower123over 4 years ago
None of the technical issues listed amount to even a molehill next to the mountain of cultural and political issues that plague Wikipedia.
评论 #25303698 未加载
pierewoehlover 4 years ago
I would be highly interested in Moving Wikipedia to Markdown in sphinx with own repo in git. But this poses issue to people who want to use a WYIWYG Editor and not git it away.
评论 #25302574 未加载
评论 #25303089 未加载
Simulacraover 4 years ago
I cannot recall the last time I used Wikipedia. The hyperaccurate wikis make even the most benign research task a daunting experience. It&#x27;s like there are perfectionists out there that won&#x27;t stop tweaking the wiki&#x27;s. I think Wikipedia has served its usefulness and should be allowed to be absorbed or recreated into something better. For now it&#x27;s just a trust fund for Jimmy Wales.
评论 #25304495 未加载
评论 #25304029 未加载
评论 #25307093 未加载
评论 #25303804 未加载
评论 #25304211 未加载