Explaining linked data, RDF and SPARQL

The guys at Talis have done a good job of evengelising linked data as a concept, the basic tools and their platform. If you are new to the space then the presentation by Rob Styles (44 mins) (from the Linked Data and Libraries event on 21st July 2010) is worth watching. In particular he does a good job of explaining what RDF is (a graph data model) – as against the different ways in which you can write it down e.g. Turtle, RDFa and RDF/XML. His whiz through SPARQL gives a useful intorduction to how RDF data can be queried.

2010: Big year for semantics

Interesting to read Palisano’s (head of IBM) comments:

“We are amassing an unimaginable amount of data in the world. In just three years, [internet] traffic is expected to total more than half a zettabyte. That’s a trillion gigabytes – or a one followed by 21 zeroes,” he tells industry, academic and political leaders.

“Where we once inferred, we now know. Where we once interpolated and extrapolated, we can now determine. The historical is giving way to the real-time and it’s not just about volume and velocity. The nature of the data we are collecting and analysing is changing, too.

“All this data is far more real-time than ever before. Most of us today, as leaders and as individuals, make decisions based on information that is backward-looking and limited in scope. That’s the best we had, but that is quickly changing.”

This just reinforces my previous blog of June 2009: here.

And this week we had the official launch in the UK of its government linked open data site.

We’ve seen the debate – back and forth – about linked open data.  We’ve seen the debate about top down v. bottom up approaches to semantics.  We’ve seen the arguments about the merits of RDF as against other frameworks.  But the volumes of data continue to increase – as does participation in social networks.

On a daily basis we see announcement about new products.  Nova Spivack tells us that the days of ‘Search’ are running out – we need ‘Help’ not ‘Search’.  We eagerly await his Twine 2.0.  We have seen significant product advancements announced this month in products such as Open Calais and Open Amplify.  One other product which caught my eye last week is Kngine.

Products such as Amplify aim to deal with the ‘tricky’ content – e.g. the ‘opinions’ implicit in content of social networks.   And this is a key element of what we are looking for: context for the content.  I am more interested in information on a particular subject when I understand the context, the perspective of the provider of the information.  I also want the richness of analysis possible through the combination of wider sources of data – including data compiled by government agencies which should be available to me.  Linked open data initiatives are required in all countries.  For Ireland – the sooner the better, if we consider ourselves a smart economy or a knowledge society.

Challenges in linked data

I referenced recently Tim Berners Lee’s encouragement to everyone looking to publish linked open data to use the Resource Definition Framework.  I also referenced in this blog recent work completed by the New York Times in this field.  The New York Times initiative has attracted an amount of comment in the technical community identifying the teething issues/ errors in this data as published.

Stefan Mazzocchi’s recent post, Data Smoke and Mirrors, speaks to some of the issues associated with publishing lots of linked data using RDF.  Stefan has reviewed a triplification of all the data from data.gov – and has been left somewhat bemused.  The posting itself provides some examples.

The point here is that we want to see the data published, we want to see the standards used – but it’s far from simple and publishing for the sake of publishing or triplifying for the sake of triplifying may be self defeating.  As a community we need to focus on quality and the end user of the data.