An example observation

Previously we pinpointed a particular observation in our sample Olympics data, specifically the number of Olympic gold medals won by athletes from the USA in the 2012 Olympics edition. We hinted that that number could actually be the aggregation of the number of medals won by male and female athletes combined, so let’s narrow down the gender to only female athletes:

How do we translate this observation from this list of values into triples?

Let’s begin by assigning a URI to identify the observation. We mentioned before that it is common and recommended practice to use http(s), as well as wording that is reasonably meaningful for human interpretation. Since our observation is for gold medals won by female athletes from the USA at the 2012 Olympics, we could go with something like this:

     <https://example.org/id/observation/2012/usa/sex-f/gold/count>

Important: several governments have established URI guidelines. Please consult your standards body for further guidance.

We are talking about an observation, so we know what type to specify:

     rdf:type qb:Observation ;

Make a note that we’ll have to define the necessary prefixes rdf: and qb: in our dataset, and of course all other prefixes that we will end up using as well.

Let’s also add a label for our observation - it is good practice to attach human-readable labels using rdfs:label to all entities:

     rdfs:label "gold medals won by female athletes from the USA at the 2012 Olympics" ;

The value for the year dimension for our observation is fixed to 2012. Since dates and times and periods are often used as dimensions in statistical data, there is an existing property that we can use:

     sdmx-dimension:refPeriod "2012"^^xsd:gYear ;

We use the refPeriod property as defined in the sdmx-dimension vocabulary, assign it the value 2012, and in addition we specify that the notation used is gYear as defined in XML Schema Definition (xsd). Another option, which we will be using since our tooling in later examples expects us to do so, is to use a value from the controlled vocabularies used by the UK government:

     sdmx-dimension:refPeriod <http://reference.data.gov.uk/id/year/2012>;

The gender dimension is another one that is frequently used in statistical data, and again we can re-use an already existing property as well as a value from an existing codelist:

     sdmx-dimension:sex  ;

Countries also are frequently used (as “reference area”): let’s re-use the existing property refArea from SDMX, and for the value we can use the code from the codelist for countries managed and used by the EC:

     sdmx-dimension:refArea <http://publications.europa.eu/resource/authority/country/USA> ;

So, we are left with two more dimensions to specify: our observation is for the competition identified as Olympics, and we’re only interested in gold medals. In other words we need two more triples.

     <https://example.org/ns/olympics#competition>  <https://example.org/id/concept/olympics> ;   
     <https://example.org/ns/olympics#medaltype> <https://example.org/id/concept/goldmedal> ;

Since we don’t know of any existing properties or values that can be used, these triples assume that we have defined a vocabulary with codes to describe the types of Olympic games and types of medals, as well as the corresponding properties to make the link between our subject observation and those types . Such a vocabulary is defined by means of SKOS, in which the specific types of games and medals are concepts which belong to conceptschemes and as such can be used as code lists.

More details on exactly what this implies can be found in the annotated sample cube.

At this point, we have identified our observation, we have typed it as an observation and have given it a label, and we have fixed the values of the year, gender, country, competition, and medal dimensions. What is left to specify is the actual measure itself, the unit in which that measure is expressed, and the data set that this observation belongs to.

The number of medals won was 105. In this case we decide to make our own property. As namespace for doing this we use one that we manage and control ourselves, namely <https://example.org/ns/olympics#>, and within this namespace we define our property numberofmedals.

     <https://example.org/ns/olympics#numberofmedals> 105 ;

The unit in which this number is expressed is Number:

     sdmx-attribute:unitMeasure <http://qudt.org/vocab/unit#Number> ;

And finally, we identify the data set that this observation belongs to:

     qb:dataSet <https://example.org/id/datacube/olympics> ;

With some re-ordering of the triples, we obtain this end-result for our observation:

     <https://example.org/id/observation/2012/usa/sex-f/gold/count>   rdf:type qb:Observation ;   
         qb:dataSet <https://example.org/id/datacube/olympics> ;   
         sdmx-attribute:unitMeasure <http://qudt.org/vocab/unit#Number> ;   
         sdmx-dimension:refArea <http://publications.europa.eu/resource/authority/country/USA> ;   
         sdmx-dimension:refPeriod <http://reference.data.gov.uk/id/year/2012> ;   
         sdmx-dimension:sex <http://purl.org/linked-data/sdmx/2009/code#sex-F> ;   
         rdfs:label "gold medals won by female athletes from the USA at the 2012 Olympics" ;   
         <https://example.org/ns/olympics#competition> <https://example.org/id/concept/olympics> ;   
         <https://example.org/ns/olympics#medaltype> <https://example.org/id/concept/goldmedal> ;   
         <https://example.org/ns/olympics#numberofmedals> 105 ; 
     .