News Linked Data Summit and the call for native to the web vocabulries
Posted on January 25, 2010
Filed Under Semantic Web, information architecture, journalism, semantic information architecture | Comments Off
I recently spoke at the News Linked Data Summit, a pan-news industry event looking at the potential of Linked Data. Martin Belham and the Media Standards Trust have already blogged about aspects of the day but I wanted to add my slides and a perspective on the discussion.
A topic that interests me is the relationship between Linked Data and controlled vocabularies, to steal a phrase from Tom Coates (native to the web), and Linked Data’s call for vocabularies native to the web.
Let’s look at it this way – if you were asked to creating a web presence for an individual or organisation today you might propose the following:
- Make interesting documents public.
- Publish using web standards such as HTML.
- Provide useful information about the individual or organisation.
- Link to similar documents where you can.
- Then if the documents are useful and you are gracious in linking to others they will link back to you.
It is apparent that Linked Data asks the same of controlled vocabularies.
- Make your vocabularies public.
- Publish using the web standards of Linked Data.
- For each concept provide useful information for humans and machines.
- Link to other vocabularies (map concepts) where you can.
- If you have provided a useful set of concepts and relationships others will link back to you, increasing the value of your CV.
It could seem crazy at the moment to give away your taxonomy for free but it would have been a similarly difficult argument to convincing an organisation to have a web presence ten or fifteen years ago.
Linked Data is already showing the benefits of this approach. When we open-source vocabularies we can be much more ambitious in the richness of relationships and complexity of structures. In my talk I mentioned that the, wonderful, Wildlife Finder would not have been feasible had the ontologies not been publically available to use and build upon. A Wildlife Finder built on a far simpler BBC bespoke taxonomy of animals, habitats and behaviours would have been a far poorer and more costly proposition. Martin expands on this in his Guardian post.
Recently we have seen the likes of LCSH and New York Times vocabularies joining the Linked Data cloud and becoming web native vocabularies. I suspect the success and survival of many vocabularies will depend on how quickly their owners can grasp the importance of becoming open and native to the web.
This comment from Peter Krantz articulates the data publishing process and emphasises the role of vocabularies.
1. Publish whatever you have in whatever format it currently is in.
This provides data for people to start tinkering with and ask
questions about.
2. While data is out there, start thinking about the context it lives
in. We are looking at harmonizing the way agencies publish their
vocabularies as a first step (e.g. OWL).
3. Gradually adapt your data to make it use common identifiers for
common things.