Monthly Archives: April 2012

How to produce Linked Data from SPARQL endpoints.

Imagine you have a triplestore which allows SPARQL queries. Now, how can someone link to the resources in your triplestore using the identifiers (must be HTTP URIs) of those resources? Let’s elaborate a bit more with an example. In our example, we have three triples shown below in Turtle notation.

@prefix rdfs:    <> .
@prefix rdf: <> .
@prefix contact: <> .

<http://localhost:8080/mydataset/People/Rakebul_Hasan> rdf:type contact:Person .
<http://localhost:8080/mydataset/People/Rakebul_Hasan> contact:fullName "Rakebul Hasan" .
<http://localhost:8080/mydataset/People/Rakebul_Hasan> contact:mailbox <> .

Now, imagine these triples reside in a triplestore and can be queried from a SPARQL endpoint http://localhost:8080/openrdf-sesame/repositories/pubbytest. The idea is that if someone performs a HTTP GET request to http://localhost:8080/mydataset/People/Rakebul_Hasan, he will get the description of this resource. This notion of providing information about a resource is one of the core principles of Linked Data outlined by Tim Berners-Lee.

We will use a tool called Pubby to do this. Pubby allows to produce Linked Data from SPARQL endpoints. We will use Tomcat as a webserver to host pubby. Now, let’s install pubby as a webapp in our Tomcat. To do this, please follow the steps below:

  1. Unzip pubby and copy the webapp directory in the webapps directory of your tomcat. Rename the copied webapp directory to mydataset (or whatever to suit your needs).
  2. Modify the WEB-INF/config.ttl as below (or according to your needs).
# Prefix declarations to be used in RDF output
@prefix conf: <> .
@prefix meta: <> .
@prefix rdf: <> .
@prefix rdfs: <> .
@prefix xsd: <> .
@prefix owl: <> .
@prefix dc: <> .
@prefix dcterms: <> .
@prefix foaf: <> .
@prefix skos: <> .
@prefix geo: <> .
@prefix dbpedia: <http://localhost:8080/resource/> .
@prefix p: <http://localhost:8080/property/> .
@prefix yago: <http://localhost:8080/class/yago/> .
@prefix units: <> .
@prefix geonames: <> .
@prefix prv:      <> .
@prefix prvTypes: <> .
@prefix doap:     <> .
@prefix void:     <> .
@prefix ir:       <> .

# Server configuration section
<> a conf:Configuration;

    # Project name for display in page titles
    conf:projectName "Pubby Test";

    # Homepage with description of the project for the link in the page header
    conf:projectHomepage <>;

    # The Pubby root, where the webapp is running inside the servlet container.
    conf:webBase <http://localhost:8080/mydataset/>;

    # Dataset configuration section
    conf:dataset [
        # SPARQL endpoint URL of the dataset
        conf:sparqlEndpoint <http://localhost:8080/openrdf-sesame/repositories/pubbytest>;

        # Common URI prefix of all resource URIs in the SPARQL dataset
        conf:datasetBase <http://localhost:8080/mydataset/>;
        #The unmatched part between conf:webBase and the request url will be appended with conf:datasetBase

Now, if you access the resources URI http://localhost:8080/mydataset/People/Rakebul_Hasan using a browser, it will return you an HTML-based page aimed at human users (a redirection happens behind the scene, the URL in the address bar of the image below is different for this reason).

We will use the cURL tool to perform the HTTP GET operation from commandline. We will set the accept header as ‘Accept: text/turtle’ in order to receive the response in turtle format. The curl command will be:

curl -L -H 'Accept: text/turtle' http://localhost:8080/mydataset/People/Rakebul_Hasan

The response should be:

@prefix rdfs:    <> .
@prefix foaf:    <> .

<> ;
"Rakebul Hasan" ;
<> .

rdfs:label "RDF description of Rakebul_Hasan" ;
foaf:primaryTopic <http://localhost:8080/mydataset/People/Rakebul_Hasan> .

Pubby added two additional triples to the set of triples that describes our resource. One to specify its labe using the rdfs:label property and another one to specify its topic using the foaf:primaryTopic property.

Behind the scene, a 303 redirection happens. The -L in the curl command makes sure that the redirection link is followed. If you remove -L from the curl command, then your will see the 303 response with the link as shown below.

curl -H 'Accept: text/turtle' http://localhost:8080/mydataset/People/Rakebul_Hasan
303 See Other: For a description of this item, see http://localhost:8080/mydataset/data/People/Rakebul_Hasan

The link returned with the 303 response is the location of machine readable description of the requested resource. If you perform another curl with this new link, you will get the same response as we got with our first curl command.

To conclude, we have seen how to publish Linked Data where the original RDF data are behind a SPARQL endpoint. We have seen an example of dereferenceable HTTP URI with content negotiation.