Actual title: Google and other tech giants are happy to have control over the Web's metadata schemas, but they let its infrastructure languish
I know that hating on Google is fashionable, but that's a bit too much editorializing. Especially considering the content of the post, and Google just being a small side note.
---
On-topic: I recently looked into using schema.org types as the basis for a information capturing system, but many of the types are somewhat outdated, of questionable quality or just missing. Development indeed seems slow, while changes that are needed by one of the larger involved companies get pushed through quickly.
I think a big part of that stagnation is a lack of interest though. The whole semantic web domain has been pretty much inactive.
It's a real shame: having canonical types for most things in existence, and have those actually be supported as import/export formats or for cross-app integrations, would be immensely valuable! But there is absolutely no business incentive there - rather the opposite. Easy portability of data is not something most companies would want.
> But there is absolutely no business incentive there - rather the opposite. Easy portability of data is not something most companies would want.
That depends on what kind of data it is. For example, your home address is not part of your bank's primary business model, but keeping it up-to-date is important for it. If data portability in and out of the bank makes it more likely that you'll keep it up-to-date, that's useful for your bank as well.
Legislation and customer demand is also making it more and more palatable. If some data is not critical to your business model, but being the sole guardian of it is a legal/reputational liability is, then actually handing control over that data over to someone else and re-using that is very useful.
That's interest on the side of the data consumer not the data provider, for lack of better words.
If the bank was the one owning the information they would not want it to be shared with others as that would allow their client to easily migrate to another bank which they definitely do not want.
But as the one receiving the data,sure, it would be nice to have others share it with me, they'd say.
I'm afraid without legislation data sharing is never going to be a thing.
At this moment that bank is the entity that keeps this data. Their challenge is, however, that the data gets outdated. But if they give other parties the ability to access that data, then the consumer will have more motivation to keep it up-to-date, and the bank will now have access to more accurate address data.
(Note that the bank is an example - it could be another party.)
The solution for this would be for banks to use the government as single source of truth - in Germany we have the Melderegister anyway, it's mandatory to register your primary address.
Unfortunately it's not allowed by law that a consumer gives "push access" to e.g. banks, health insurance or employers.
It's a textbook collective action problem. Everyone would gain from having a high-quality shared ontology, but nobody gains enough individually (it's a public good).
The typical solutions to collective action problems are (1) benefactors who subsidise production (either privately or through taxation), or (2) direct command and control. Google was apparently filling the role of benefactor.
I'm not even sure if everyone would gain from having a high-quality shared ontology, because as soon as you go beyond trivial examples, the details of the data model inevitably have competing incompatible needs which require some compromise.
I could certainly imagine that for many companies the disadvantages of using a model that's not simply a copy of their specific view of that problem domain are larger than the hypothetical benefits of interoperability, so even if such a shared ontology would exist, many would intentionally choose to use their own ontology instead of adapting to that standard.
At one point I worked at a company founded in 2005ish, so one of their core things was an ontology. We found that while some very generic things were reusable (person and address, say) almost everything that drove business logic was different, use case to use case.
Yeah, trying to standardize this seems like it would quickly turn into one of those "things people believe about time" rabbit-holes of edge cases and differences.
Even inside a single company I get scared now when someone says "if only we had just one standard way to handle this sort of thing"... if it's rarely simple for just one company, how would it work globally?
Without substantial benefits of a large universal ontology and without the ability to painlessly diverge from an ontology to not compromise on accurately modeling a particular domain, it’s hard to see the net benefits. Everyone will want to customize things for their domain, or point of view. An ontology should be easy to fork, like a repo.
I would certainly want to extend, modify and replace the data models for my core business as I see fit. But beyond those there are still going to be a whole lot of models in need in order to run the company but want to keep low maintenance. E.g. for hiring I might not have strong opinions what a job post, a candidate or an application should look like, so I'd be happy sticking to the standard in those cases and benefit of it being easier to mix and match tooling and pass around the data.
Also I reckon to me that a partially customized ontology, which is inevitable, is still easier to map between orgs than if they build it from scratch completely
Or maybe see it less as a standardized ontology but as a standardized way to create ontologies
> I recently looked into using schema.org types as the basis for a information capturing system, but many of the types are somewhat outdated, of questionable quality or just missing.
It grew out of the semantic web community so this was roughly what I expected. That space just seems cursed to have these lofty ideals which are never realized because it’s hard to justify spending time on something which has no known consumer. Schema.org seemed poised to change that but they only use a couple of types and then only for a few types of searches.
Spent time with schema.org years ago. It's just not needed/useful for most scenarios and the amount of work and convincing most groups to use it isn't worthwhile so continuing to extend it isn't worth the effort.
Not needed nor useful for who? It’s obviously useful for a more robust and open web, for our collective society, so I’m not sure who the subject of your statement is.
Not to be curt, but it clearly isn't obviously useful—otherwise the project wouldn't be languishing such as it is. The notion of creating a single overarching conceptual map to regulate the representation of the varied manifold of human experience on the web is almost certainly a deeply misguided idea, and even if it's philosophically sound (a big if), it's not clear that schema is anything like the correct approach. I'm open to be convinced of it's value if you'd like to elaborate, but I'll just say its far from obvious.
Ah, I think we might be talking about different things. I think the larger promise of the semantic web is a categorically different thing than adding a bit of meta data to pages to know basic things like author, content type, description, etc.
It’s the latter I think is clearly valuable, in order for us to have competition for the likes of google and Facebook. It lowers the barrier for creating competing search engines, modern rss readers, and even things like distributed social networks.
Schema.org is designed to be useful to Bing and Google but not other entities. It is enough to help them compile better training sets to extract that kind of metadata without schema.org, but not enough to build a simple extractor that would be useful to a smaller software company.
Yes, despite its agnostic branding and name indicating basically totally maximal scope, schema.org has basically the features Google's interested in supporting for pulling out things from pages and emails.
To the extent that other uses can basically piggyback on data that sites added to target Google, it does provide some value, but I don't see it as really even attempting to be a generally useful "semantic web" or linked data vocabulary in the sense of interoperating with other things.
The Dutch startup I work at [1] is active in the semantic web technology space. It's not pretty much inactive. The industry is simply not in the foreground of things.
Development indeed seems slow, while changes that are needed by one of the larger involved companies get pushed through quickly.
Whichever company did that would be accused of trying to "take over" the web.
Ideally large companies should be sponsoring open efforts to define things that affect how the web works rather than doing the work themselves. Smaller open teams that move fast to define structures that work for as many people as possible, even if they're not perfect for Google, Microsoft, etc, would be more useful to the internet industry as a whole.
i recently went though schema.org a bit while putting together a blog, and it was a long list (for a human to digest), but relative to all the objects in the world, tiny. google's vested interest and stamp on it was pretty evident.
i also went through microformats, which seems to be much smaller, and more tightly-focused around blogs and structuring data shared among federated sites.
I know that hating on Google is fashionable, but that's a bit too much editorializing. Especially considering the content of the post, and Google just being a small side note.
---
On-topic: I recently looked into using schema.org types as the basis for a information capturing system, but many of the types are somewhat outdated, of questionable quality or just missing. Development indeed seems slow, while changes that are needed by one of the larger involved companies get pushed through quickly.
I think a big part of that stagnation is a lack of interest though. The whole semantic web domain has been pretty much inactive.
It's a real shame: having canonical types for most things in existence, and have those actually be supported as import/export formats or for cross-app integrations, would be immensely valuable! But there is absolutely no business incentive there - rather the opposite. Easy portability of data is not something most companies would want.