Tech giants let the Web's metadata schemas and infrastructure languish

301 点作者 timhigins将近 5 年前

19 条评论

the_duke将近 5 年前

Actual title: Google and other tech giants are happy to have control over the Web's metadata schemas, but they let its infrastructure languishI know that hating on Google is fashionable, but that's a bit too much editorializing. Especially considering the content of the post, and Google just being a small side note.---On-topic: I recently looked into using schema.org types as the basis for a information capturing system, but many of the types are somewhat outdated, of questionable quality or just missing. Development indeed seems slow, while changes that are needed by one of the larger involved companies get pushed through quickly.I think a big part of that stagnation is a lack of interest though. The whole semantic web domain has been pretty much inactive.It's a real shame: having canonical types for most things in existence, and have those actually be supported as import/export formats or for cross-app integrations, would be immensely valuable! But there is absolutely no business incentive there - rather the opposite. Easy portability of data is not something most companies would want.

评论 #24086019 未加载

评论 #24086148 未加载

评论 #24088272 未加载

评论 #24087026 未加载

评论 #24086253 未加载

评论 #24088940 未加载

评论 #24087021 未加载

评论 #24086448 未加载

评论 #24086498 未加载

评论 #24088234 未加载

frou_dh将近 5 年前

This reminds me of the tragic situation where if you process XHTML locally using XML tools that incidentally fetch the DTD, then things block and become absolutely dirt slow, because the W3C sysadmins are permanently pissed off by that: <a href="https://stackoverflow.com/a/13865692/82" rel="nofollow">https://stackoverflow.com/a/13865692/82</a>

评论 #24088199 未加载

评论 #24090197 未加载

评论 #24087387 未加载

jacques_chester将近 5 年前

It was once put to me that Google's promotion system creates this dynamic.Starting a new project that garners widespread attention looks good in a package, but replacing lightbulbs and scrubbing floors doesn't. Folks create a splash, get promoted, then move on and are not replaced.I've never worked at Google, so I do not know if this dynamic is real. I would be interested to hear from Googlers about incentives to work or not work on something.

评论 #24086616 未加载

评论 #24085782 未加载

Santosh83将近 5 年前

Isn't schema.org supposed to be an "industry wide" collaborative effort? In which case we must also remark on the disinterest shown by players like Microsoft, Apple or Google, or even Facebook, Twitter and so on, all of whom benefit by this semantic markup.

评论 #24087038 未加载

评论 #24087312 未加载

wrnr将近 5 年前

Google is dropping the ball here, as they stand to benefit the most from a single central ontology for the web. It does illustrate that this approach doesn't work if you are looking to innovate quickly and not be dependent on the goodwill of a single institution that doesn't even know who you are.Maybe we can finally stop using ontologies for the semantic web and start solving the hard problem of language pragmatics.

评论 #24090211 未加载

sawaruna将近 5 年前

I'd love to see schema.org updated and used more. As someone still doing linked data work, albeit in academia, I mainly use it simply to provide more context to self-created, domain specific properties within ontologies using things like rdfs:seeAlso, skos:related, etc.Ideally it'd be nice (imo) if schema.org had more domain specific extensions, similar to the bib[0] one which allows for things like comic book properties to be described.[0] <a href="https://schema.org/docs/bib.home.html" rel="nofollow">https://schema.org/docs/bib.home.html</a>

stefan_将近 5 年前

I don't understand. This is a tool for Google to extract information from websites and keep potential visitors on Google instead. Every use case for and future progress on it will be measured on that metric.They don't care to address any of the issues or "fix the infrastructure" because this isn't a "organize all the information in the world!" project at all. The guys that take Google visitor retention stats into their next performance meeting are probably poking fun at all the ontology nerds that have descended on their metric-driven scheme.

评论 #24086648 未加载

hn-cmt将近 5 年前

The Schema.org vision certainly is not dead within Google. See the Google-backed DataCommons project at <a href="http://datacommons.org/" rel="nofollow">http://datacommons.org/</a> which heavily relies on the schemas defined by schema.org. Headed by the creator of schema.org.

tomcam将近 5 年前

My solution was to reverse engineer highly ranked web pages. I used a subset of the schema that seemed to be universal to those pages. Schema.org just gave me the proper file formats.

评论 #24087730 未加载

评论 #24092109 未加载

techntoke将近 5 年前

I think their choice of JSON-LD as the recommended format and not being transparent in how it effects results is the biggest issue. JSON-LD requires duplication of content, where as microdata is inline with existing content.

评论 #24088568 未加载

评论 #24086607 未加载

评论 #24088713 未加载

acdha将近 5 年前

Actual thread for anyone wanted to look at the images which Thread Reader stripped:<a href="https://twitter.com/alkreidler/status/1291509746000855040" rel="nofollow">https://twitter.com/alkreidler/status/1291509746000855040</a>

评论 #24086963 未加载

bawolff将近 5 年前

So fork? Its not the big G's responsibility to solve all the internet's problems and honestly most other web metadata standards have failed, only difference is that this one has a big name attached we can all blame.

Nasrudith将近 5 年前

There is one question I always have about the semantic web schemes? What if it finally catches on and the end sites just immediately start lying their ass off for selfish purposes? Like many of the earlier search engine optimizations to try to land common hits on a massive page that doesn't actually provide what you are looking for.The only way around that is for somebody to do the processing of the real data to validate that it isn't just bullshit for a nefarious purpose. From what I've heard about the Semantic web conceptually seems a bit skeumorphic as a concept.

评论 #24091370 未加载

zelly将近 5 年前

All the information is already out there. Ontology is a crutch.

pokoleo将近 5 年前

@dang there's a typo in the name: should say "infrastructure", not "infrastrucure"It's missing the T, as-in: infrastrucTure

valuearb将近 5 年前

How do you avoid the Bike-Shedding problem?Would forcing the proposer to quantify costs and benefits help?

评论 #24085966 未加载

westurner将近 5 年前

It's "langushing" and they should do it for us? It's flourishing and they're doing it for us and they have lots of open issues and I want more for free without any work.Wow! Nobody else does anything to collaboratively, inclusively develop schema and the problem is that search engines aren't just doing it for us?1) Search engines do not owe us anything. They are not obligated to dominate us or the schema that we may voluntarily decide to include on our pages.We've paid them nothing. They have no contract for service or agreement with us which compels them to please us or contribute greater resources to an open standard that hundreds of people are contributing to.2) You people don't know anything about linked data and structured data.Here's a list of schema: <a href="https://lov.linkeddata.es/dataset/lov/" rel="nofollow">https://lov.linkeddata.es/dataset/lov/</a> .Here's the Linked Open Data Cloud: <a href="https://lod-cloud.net/" rel="nofollow">https://lod-cloud.net/</a>Does your or this publisher's domain include any linked data?Does this article include any linked data?Do data quality issues pervade promising, comparatively-expensive, redundant approaches to natural-language comprehension, reasoning, and summarization?Here, in contributing this example PR adding RDFa to the codeforantarctica web page, I probably made a mistake. <a href="https://github.com/CodeForAntarctica/codeforantarctica.github.io/pull/3" rel="nofollow">https://github.com/CodeForAntarctica/codeforantarctica.githu...</a> . Can you spot the mistake?There should have been review.<a href="https://schema.org/ClaimReview" rel="nofollow">https://schema.org/ClaimReview</a>, W3C Verifiable Claims / Credentials, ld-signatures, and lds-merkleproof2017.Which brings us to reification, truth values, property graphs, and the new RDF* and SPARQL* and JSON-LD* (which don't yet have repos with ongoing issues to tend to).3) Get to work. This article does nothing to teach people how to contribute to slow, collaborative schema standards work.Here's the link to the GitHub Issues so that you can contribute to schema.org: <a href="https://github.com/schemaorg/schemaorg" rel="nofollow">https://github.com/schemaorg/schemaorg</a>..."Standards should be better and they should pay for it"Who are the major contributors to the (W3C) open standard in question?Is telling them to put up more money or step down going to result in getting what we want? Why or why not?Who would merge PRs and close issues?Have you misunderstood the scope of the project? What do the editors of the schema feel in regards to more specific domain vocabularies? Is it feasible or even advisable to attempt to out-schema domain experts who know how to develop and revise an ontology or even just a vocabulary with Protegé?To give you a sense of how much work goes into creating a few classes and properties defined with RDFS in RDFa in HTML: here's the <a href="https://schema.org/Course" rel="nofollow">https://schema.org/Course</a> , <a href="https://schema.org/CourseInstance" rel="nofollow">https://schema.org/CourseInstance</a> , and <a href="https://schema.org/EducationEvent" rel="nofollow">https://schema.org/EducationEvent</a> issue: <a href="https://github.com/schemaorg/schemaorg/issues/195" rel="nofollow">https://github.com/schemaorg/schemaorg/issues/195</a>Can you find the link to the Use Cases wiki (which was the real work)? What strategy did you use to find it?..."Well, Google just does what's good for Google."Are you arguing that Google.org should make charitable contributions to this project? Is that an advisable or effective way to influence a W3C open standard (where conflicts of interest by people just donating time are disclosed)?Anyone can use something like extruct or OSDS to extract RDFa, Microdata, and/or JSON-LD from a page.Everyone can include structured data and linked data in their pages.There are surveys quantifying how many people have included which types in their pages. Some of that data is included on schema.org types pages....Some written interview questions:> Which issues have you contributed to? Which issues have you seen all the way to closed? Have you contributed a pull request to the project? Have you published linked data? What is the URL to the docs which explain how to contribute resources? How would you improve them?<a href="https://twitter.com/westurner/status/1291903926007209984" rel="nofollow">https://twitter.com/westurner/status/1291903926007209984</a>...After all that's happened here, I think Dan (who built FOAF, which all profitable companies could use instead of <a href="https://schema.org/Person" rel="nofollow">https://schema.org/Person</a> ) deserves a week off to add more linked data to the internet now please.

评论 #24107770 未加载

rondennis将近 5 年前

What's in it for the tech giants? Google is merely interested in peddling its ads to its Chrome users. Use Brave instead.

ProAm将近 5 年前

Google neglect a project? No...I don't believe it.Isn't this SOP for Google?