Monthly Archives: October 2012

Loading Geonames in Virtuoso

  1. Covert the Geonames RDF XML dump to N-Triples.
    The RDF dump contains all the geonames entries in a text formatted as a feature URI in one line followed by the RDF XML description in the next line for each feature. For example:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <rdf:RDF xmlns:cc="" xmlns:dcterms="" xmlns:foaf="" xmlns:gn="" xmlns:owl="" xmlns:rdf="" xmlns:rdfs="" xmlns:wgs84_pos="">
    <gn:Feature rdf:about="">
    <rdfs:isDefinedBy></rdfs:isDefinedBy><gn:name>Āb-e Yasī</gn:name><gn:featureClass rdf:resource=""/>
    <gn:featureCode rdf:resource=""/><gn:countryCode>IR</gn:countryCode><wgs84_pos:lat>32.8</wgs84_pos:lat><wgs84_pos:long>48.8</wgs84_pos:long><gn:parentFeature rdf:resource=""/><gn:parentCountry rdf:resource=""/><gn:parentADM1 rdf:resource=""/><gn:nearbyFeatures rdf:resource=""/><gn:locationMap rdf:resource=""/></gn:Feature></rdf:RDF>

    It’s not really possible to parse this file directly using RDF parsers. I wrote a python script which converts the RDF dump file to a single file with all the triples represented in the dump file. The python script  serializes everything in ntriple in a file. This ntriple file can be easily loaded into Virtuoso and other triple stores.

  2. Install and configure Virtuoso
    I used yum in my Fedora to install virtuoso opensource and related packages. I guess other package managers can do the same job. The virtuoso-opensource package is for virtuoso opensource database server. The virtuoso-opensource-utils package comes with isql-v commandline-based sql client. It can be also used for SPARQL queries. The virtuoso-opensource-conductor package gives a nice web user interface.

    sudo yum install virtuoso-opensource
    sudo yum install virtuoso-opensource-utils
    sudo yum install virtuoso-opensource-conductor

    Now, follow the steps below to configure virtuoso.

    • Create a config directory for your current user(the user that will run the server). It my case it was /[my home]/virtuoso.
    • Copy the default virtuoso.ini (normally it’s located in /var/lib/virtuoso/db/virtuoso.ini) to this directory (make sure you modify the access permissions to be able to modify it).
    • Modify the following parameters in virtuoso.ini
      ;Depending on your memory size, change the following two parameters.
      ;You will find instruction in the default virtuoso.ini file
      DatabaseFile = [path to user's virtuoso config directory]/virtuoso.db
      ErrorLogFile = [path to user's virtuoso config directory]/virtuoso.log
      LockFile = [path to user's virtuoso config directory]/virtuoso.lck
      TransactionFile = [path to user's virtuoso config directory]/virtuoso.trx
      xa_persistent_file = [path to user's virtuoso config directory]/virtuoso.pxa
      DatabaseFile = [path to user's virtuoso config directory]/virtuoso-temp.db
      TransactionFile = [path to user's virtuoso config directory]/virtuoso-temp.trx
    • Add the directory that contains the geonames.nt file to the allowed directories of Virtuoso.
      DirsAllowed = ., /usr/share/virtuoso/vad, [directory that contains the geonames.nt]
    • Configure the odbc.ini file as below (create it if it doesn’t exist).
      [Local Virtuoso]
    • Start virtuoso by executing the following command in your linux shell.
      virtuoso-t -df +configfile /[path to user's config directory]/virtuoso.ini
    • Login to virtuoso using isql virtuoso client and change the default password (‘dba’ is the default password of user dba). If the change password doesn’t work from isql client, try logging in the conductor web client (http://localhost:8890/conductor) login using user: dba, pass: dba then execute the set password command from interactive SQL option.
      $ /usr/libexec/virtuoso/isql dba
      SQL> set password 'dba' 'new-password'
  3. Load the converted Geonames N-Triples into Virtuoso
    • Copy the Bulk Loader Procedure and Sub-procedures creation SQL script from the link here, save it as rdfloader.sql in the path to user’s config directory and modify the line 331from


      DECLARE gr INT;
    • From the isql console execute the following command.
      SQL> load [path to ]rdfloader.sql;
    • If anything goes wrong, drop the load_list and ldlock tables by executing the commands below and then load again by using the previous command.
      SQL> drop table load_list;
      SQL> drop table ldlock;
    • Select the geonames.nt file that you want to load (the third parameter is the graph name where the triples will be loaded).
      SQL> ld_dir ('path to the directory where geonames.nt is located', 'geonames.nt', '');
    • Execute the loader (it will take a long time, 9 hours in my computer).
      SQL> rdf_loader_run ();
  4. TestYou can access the SPARQL endpoint web interface at localhost:8890/sparql. In the default dataset uri field, type We will run a query for getting all the regions of France. Regions of France are represented by the <; relation. The query will look like:
    select distinct ?uri, ?name where {
    ?uri <> <>.
    ?t <> ?uri.
    ?uri <> ?name}

    Type this query in the query text filed and press run query. It will return a list of region URIs and names.

This tutorial has been adapted from the following tutorials: