Tools to create rdf data cubes

Tools to create a DSD (Data Structure Definition)

Unfortunately, as far as we know, there are no real-world 'end user' friendly tools for creating RDF Data Cube data structure definitions.

Tools to create and manage code lists

Tools to generate observations

The input

Consider the following structured CSV file for Olympic medals. This file, which we’ve named input.csv, is an extract taken from a much larger CSV data set which we reduced and aggregated (using Excel and some pattern replacements in a text editor) to only contain the data that we require for our example. It contains data for the 2004, 2008, and 2012 Olympics, and the number of medals of each type won by athletes from China, Great Britain, and the USA

A subset shown:

Competition,Edition,NOC,Gender,Medal,Value
Olympics,2004,CHN,Male,Bronze,5
Olympics,2004,CHN,Male,Gold,16
Olympics,2004,CHN,Male,Silver,9
Olympics,2004,CHN,Female,Bronze,10
Olympics,2004,CHN,Female,Gold,36
Olympics,2004,CHN,Female,Silver,18
Olympics,2004,GBR,Male,Bronze,8
Olympics,2004,GBR,Male,Gold,12
Olympics,2004,GBR,Male,Silver,15
Olympics,2004,GBR,Female,Bronze,7
Olympics,2004,GBR,Female,Gold,5
Olympics,2004,GBR,Female,Silver,10
Olympics,2004,USA,Male,Bronze,33
Olympics,2004,USA,Male,Gold,51
Olympics,2004,USA,Male,Silver,33
Olympics,2004,USA,Female,Bronze,40
Olympics,2004,USA,Female,Gold,65
Olympics,2004,USA,Female,Silver,42
Olympics,2008,CHN,Male,Bronze,11
Olympics,2008,CHN,Male,Gold,34
Olympics,2008,CHN,Male,Silver,11
Olympics,2008,CHN,Female,Bronze,46
Olympics,2008,CHN,Female,Gold,40
....

Generating the observations using TARQL

Tarql is a command-line tool for converting CSV files as above to RDF using SPARQL 1.1 syntax. More information can be found at http://tarql.github.io/.

The SPARQL query (olympics.sparql) to generate the triples according to our example is as follows:

prefix owl: <http://www.w3.org/2002/07/owl#>
prefix void: <http://rdfs.org/ns/void#>
prefix dcterms: <http://purl.org/dc/terms/>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix sdmx-dimension: <http://purl.org/linked-data/sdmx/2009/dimension#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix sdmx-attribute: <http://purl.org/linked-data/sdmx/2009/attribute#>
prefix qb: <http://purl.org/linked-data/cube#>
prefix skos: <http://www.w3.org/2004/02/skos/core#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix sdmx-concept: <http://purl.org/linked-data/sdmx/2009/concept#>

CONSTRUCT
{?URI
  rdf:type qb:Observation ;
  qb:dataSet <https://example.org/id/datacube/olympics> ;
  qb:measureType <https://example.org/ns/olympics#numberofmedals> ;
  sdmx-dimension:refArea ?refArea ;
  sdmx-dimension:refPeriod ?refPeriod ;
  sdmx-dimension:sex ?sex ;
  <https://example.org/ns/olympics#competition> ?competition ;
  <https://example.org/ns/olympics#medaltype> ?medal ;
  <https://example.org/ns/olympics#numberofmedals> ?number ;
.}
FROM <file:input.csv>
WHERE
{
BIND
  (URI(CONCAT("https://example.org/id/observation/",LCASE(?Edition),"/",
LCASE(?NOC),"/",LCASE(?Gender),"/",LCASE(?Medal),"/",LCASE(?MeasureType))) as ?URI)
BIND 
 (URI(CONCAT("http://publications.europa.eu/resource/authority/country/",?NOC)) as ?refArea)
#BIND (STRDT(STR(?Edition),xsd:gYear) as ?refPeriod)
BIND
  (URI(CONCAT("http://reference.data.gov.uk/id/year/",STR(?Edition))) as ?refPeriod)
BIND 
 (URI(CONCAT("http://purl.org/linked-data/sdmx/2009/code#",?Gender)) as ?sex)
BIND 
 (URI(CONCAT("https://example.org/id/concept/",LCASE(?Competition))) as ?competition)
BIND 
  (URI(CONCAT("https://example.org/id/concept/",LCASE(?Medal),"medal")) as ?medal)
BIND
  (STRDT(STR(?Value),xsd:integer) as ?number)
}

To convert our CSV data, we run the following command line instruction:

     tarql olympics.sparql > olympics.ttl
    

Sample output for one observation looks as follows:

<https://example.org/id/observation/2004/chn/sex-m/bronze/count>
  rdf:type qb:Observation ;
  qb:dataSet <https://example.org/id/datacube/olympics> ;
  qb:measureType <https://example.org/ns/olympics#numberofmedals> ;
  sdmx-dimension:refArea
    <http://publications.europa.eu/resource/authority/country/CHN> ;
  sdmx-dimension:refPeriod <http://reference.data.gov.uk/id/year/2004> ;
  sdmx-dimension:sex <http://purl.org/linked-data/sdmx/2009/code#sex-M> ;
  <https://example.org/ns/olympics#competition>
    <https://example.org/id/concept/olympics> ;
  <https://example.org/ns/olympics#medaltype>
    <https://example.org/id/concept/bronzemedal> ;
  <https://example.org/ns/olympics#numberofmedals> 5 .
  

Generating the observations with table2qb

Table2qb (pronounced “table to cube”) is a tool that can be used to convert structured CSV data into RDF data cubes. It is aimed at users who understand statistical data and are comfortable with common data processing tools, but it does not require programming skills or detailed knowledge of RDF.

We have a fully worked out example using the olympics dataset.