The Cluetrain Manifesto
Just been reading the 10th Anniversary edition of The Cluetrain Manifesto.  In his Chapter ‘but how does it taste?’ Rick Levine focuses on the changes in Participation – through blogging, social networks and participation in ecommerce sites (customer reviews etc).  However he references the walls between his Linkedin, Facebook and Phone universes.  I like his demand: ‘We need to be more fanatical in our elimination of conversational friction’.

This very much speaks to the Cluetrain Manifesto – that the Internet is all about conversations.  And effectively Levine is making the point that semantics has a role to play in facilitating this.

Ireland and linked open data

What is the timeline for the Irish government in terms of linked open data? When you read newspapers full of stories about TD expenses, FAS waste, the objectives of An Board Snip – surely publishing data in meaningful, useful formats is part of the way forward. And it must be just one element of being a smart economy. And promoting a level of transparency (and accountability) which we crave as a society.

When I read pieces like Government Should Do its Own Data Homework by Jeni Tennison it just reminds me of the progress we need to make here in Ireland. And we have the expertise – in the IT community and, in particular, in DERI.

Perhaps there is an initiative – but I do not remember reading anything about a timeline.

Challenges in linked data

I referenced recently Tim Berners Lee’s encouragement to everyone looking to publish linked open data to use the Resource Definition Framework.  I also referenced in this blog recent work completed by the New York Times in this field.  The New York Times initiative has attracted an amount of comment in the technical community identifying the teething issues/ errors in this data as published.

Stefan Mazzocchi’s recent post, Data Smoke and Mirrors, speaks to some of the issues associated with publishing lots of linked data using RDF.  Stefan has reviewed a triplification of all the data from – and has been left somewhat bemused.  The posting itself provides some examples.

The point here is that we want to see the data published, we want to see the standards used – but it’s far from simple and publishing for the sake of publishing or triplifying for the sake of triplifying may be self defeating.  As a community we need to focus on quality and the end user of the data.

semantic web and the subprime crisis

Nice piece by Michael Cataldo outlining potential benefits of semantic web – in terms of making it easier to access data on the web and cross reference/ correlate the data.  Michael makes the point that fuller adoption of semantic web principles at an earlier date may have assisted in preventing some of the elements of the subprime crisis.

I am very much a fan of the semantic web and indeed of the movement towards linked open data.  However it is interesting to read reports of Tim Berners Lee’s own frustrations wrt advances in linked open data e.g. the fact that data is being published on in non RDF formats (thereby limiting the ability of people to browse from this data to other RDF marked up data).

I think Michael Cataldo, in looking to demonstrate potential benefits of semantic web, may be stretching things a little far wrt the subprime crisis – were people motivated to make the data easily understood or was obfuscation not part of the intent?

Thinking about the scope of semantic web

Read an excellent summary paper by Mills Davis of Project 10X.  Interesting description of the ‘notion’ of semantic web: The key notion of semantic technology is to represent meanings and knowledge (e.g., knowledge of something, knowledge about something, and knowledge how to do something, etc.) separately from content or behavior artifacts, in a digital form that both people and machines can access and interpret.

Would recommend the summary paper to anyone looking to gain an insight into the semantic web 3.0 and its potential.

Semantic Web 1: Semantics – what is an ontology?

To a computer, the Web is a flat, boring world, devoid of meaning. This is a pity, as in fact documents on the Web describe real objects and imaginary concepts, and give particular relationships between them. For example, a document might describe a person. The title document to a house describes a house and also the ownership relation with a person. Adding semantics to the Web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values. Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.

– Tim Berners-Lee “W3 future directions” keynote, 1st World Wide Web Conference Geneva, May 1994

hen we speak of web 3.0 and the semantic web we focus on computer processing/ understanding of web content.  Currently web sites are ‘marked up’ to make them easier for us as readers of the site to follow them.  Using HTML certain text is marked as a ‘header’, certain text is marked as ‘bold‘, as indented, etc.  All of this facilitates us, as humans, in reading and following/ understanding the content.  But, more importantly, we understand much of the content based on our own knowledge, the context of each phrase/ sentence, etc.

So, how much of this data on the web could be processed (‘understood’) by computers, analysed and presented back to us as humans in a useful format e.g. categorised, annotated, summarised, ranked, etc?  Broadly there are two possible ways forward: software which can figure out what the content is about (Natural Language Processing etc.) or some additional ‘marking-up’ of the content – to flag what specific terms/ words/ phrases mean.

Natural Language Processing is a major area – huge rseearch completed and ongoing, major advances made over the years.

On the mark up front there have also been significant advances and product offerings.

One core element in all of this computer processing/’understanding’ is agreement of the meaning of terms/ concepts – hence the use of the phrase ‘semantics’.  We are all familiar with the phrase often used in trying to resolve/ advance arguments: ‘it’s a question of semantics’.  Generally the intent of the phrase is to say that the antagonist and protagonist agree conceptually but that much of the disagreement is accounted for by misunderstanding/ different understanding of the terms being used by either party.

Dealing with concepts, their relationships and meanings is addressed using ONTOLOGIES.  The semantic web has given rise to a whole field in the development, publication and maintenance of ontologies.  Rather than trying to explain ‘ontologies’ in detail here I think this short video – focused on introducing ‘biomedical ontologies’ – does a great job of explaining the concept of and use for ontologies.

Practical use for semantics

We spend a great deal of time ourselves online trying to find information, comparing and contrasting data from different web sites.  A number of us are well used to using sites such as to assist in checking out travel options.

Read an interesting piece on  For now offering comparison shopping re electronic goods and apartment rental (in US).

Authors claim to be using the power of their semantic search engine to extract the relevant data from multiple sites to present detailed product purchasing options and comparisons.  In presenting the apartment data they include very good mashups to present the locations.  In the case of electronic goods still seems to me that there is a lot of scope for variation in the additional items e.g. additional memory for a camera.  However, even allowing for this, certainly shows the power of applications which can process data presented on web sites – and that is a basic objective for web 3.0/ semantic web.

Driving success of semantic web

Read an interesting survey re traction around the semantic web.  Listed a number of barriers to adoption of semantic web:

  1. organisational culture
  2. the complexity of the technology
  3. a general lack of experts
  4. a lack of success stories
  5. a lack in quality of available software and
  6. the problem to quantify the benefits

I thought it would be interesting to consider each of thse in some more detail in a series of postings – designed to assist in promoting a greater understanding of semantic web and its potential use.  Would welcome any feedback/ ideas on this subject.

The referenced survey targeted a fairly technical, web savy, group, across Europe.  Am keen to engage more directly with business poeple – amongst many of whom I am not sure there is a clear understanding of, or interest in, the semantic web.

