Validating data cubes

Using SHACL

SHACL stands for Shapes Constraint Language and this RDF based data modeling language allows to define constraints on the structure and content of your graph; hence also on rdf data cubes.

The RDF data cube specification itself contains 21 constraints, which can be formalised using SHACL.

As an example: every qb:Observation has exactly one associated qb:DataSet, as formulated as IC-1 (integrity constraint nr. 1) of the spec.

Let's have a look at how this translates into SHACL.

qb:ObservationShape
  rdf:type sh:NodeShape ;
  rdfs:label "Observation shape" ;
  sh:property [
      sh:path qb:dataSet ;
      rdfs:comment "IC-1. Unique DataSet" ;
      sh:maxCount 1 ;
      sh:message "Every qb:Observation has exactly one associated qb:DataSet." ;
      sh:minCount 1 ;
    ] ;
    ...

To show how it works, we will be using TopBraid Composer (the Free Edition can be used)

  1. Open up Topbraid Composer
  2. Create a new Project
  3. Import the generated turtle files into the Project:
    • olympics.ttl
    • components.ttl
    • competitions.ttl
    • medaltypes.ttl
  4. Import supporting vocabularies:
    • rdf data cube
    • sdmx
    • skos
    • xkos
  5. Open 'olympics.ttl'
  6. Import all the other relevant rdf files (see above) into 'olympics.ttl'
  7. Import also the 'datacube.shapes.ttl' file to be found in /Topbraid/Common. This file contains the SHACL translations of the 21 constraints as defined in the RDF data cube specification.
  8. The import view should look like
  9. Go to the SHACL validation pane and run the validator (blue or green button).
  10. Errors, if any, are listed then: SHACL errors

There are also possibilities to run these validation from the command line, using e.g. SHACL batch validator

Using the RDF validator

The OGI project developed a validation test set and tool.

The test set has 2 parts:

Validating against RDF data cube constraints

The tests for validation against RDF data cube constraints can be found at the following URL: https://github.com/Swirrl/rdf-validator/tree/master/queries.

Download these to a queries folder on your system, and then download the validator itself from https://github.com/Swirrl/rdf-validator/releases.

Run the validation as follows :

java -jar ./rdf-validator-0.2.0-standalone.jar 
  --endpoint olympics.ttl
  --suite ./queries

where the endpoint parameter refers to the triples file and the suite parameter to the folder where the tests have been downloaded.

The results we get on our data are:

 Creating SPARQL repository: http://localhost:3030/onecube/sparql
1 /queries/14_SELECT_Dataset_Without_MeasureDimension_Obs_Has_Value.sparql: PASSED
2 /queries/09_SELECT_SliceKey_Has_At_Most_1_SliceStructure.sparql: PASSED
3 /queries/16_SELECT_Dataset_With_MeasureDimension_Obs_Has_Exactly_1_Value.sparql: PASSED
4 /queries/03_SELECT_DSD_Has_At_Least_1_Measure.sparql: PASSED
5 /queries/05_SELECT_Concept_Dimension_Has_At_Least_1_CodeList.sparql: PASSED
6 /queries/01_SELECT_Observation_Has_At_Most_1_Dataset.sparql: PASSED
7 /queries/01_SELECT_Observation_Has_At_Least_1_Dataset.sparql: PASSED
8 /queries/02_SELECT_Dataset_Has_At_Most_1_DSD.sparql: PASSED
9 /queries/15_SELECT_Dataset_With_MeasureDimension_Obs_Has_Value_For_Measure.sparql: PASSED
10 /queries/19_SELECT_ConceptScheme_Values_Must_Be_From_CodeList.sparql: PASSED
11 /queries/08_SELECT_SliceKey_ComponentProperty_Not_Declared_As_Component_On_DSD.sparql: PASSED
12 /queries/02_SELECT_Dataset_Has_At_Least_1_DSD.sparql: PASSED
13 /queries/10_SELECT_Slice_Dimensions_Have_Value.sparql: PASSED
14 /queries/11_SELECT_Obs_Dimensions_Have_Value.sparql: PASSED
15 /queries/19_SELECT_Collection_Values_Must_Be_From_CodeList.sparql: PASSED
16 /queries/06_SELECT_Invalid_DSD_Component_Marked_As_Optional.sparql: PASSED
17 /queries/13_SELECT_Obs_Required_Attributes_Have_Value.sparql: PASSED
18 /queries/09_SELECT_SliceKey_Has_At_Least_1_SliceStructure.sparql: PASSED
19 /queries/12_SELECT_Duplicate_Observations.sparql: PASSED
20 /queries/18_SELECT_Observations_In_Slices_Must_Be_In_Slices_Dataset.sparql: PASSED
21 /queries/17_SELECT_All_Measures_Present_In_Measures_Dimension_Cube.sparql: PASSED
22 /queries/04_SELECT_Dimension_Has_At_Least_1_Range.sparql: PASSED
23 /queries/07_SELECT_SliceKey_Has_At_Least_1_DSD.sparql: PASSED
Passed 23 Failed 0 Errored 0 Ignored 0         

Validating against CubiQL constraints

Important: These constraints are only relevant if you plan to use the CubiQL API.

The tests for validation against CubiQL constraints can be found at the following URL: https://github.com/Swirrl/graphql-qb/tree/issue_127/validation.

The validation procedure is similar to the one for validating against RDF data cube constrainst: run the command-line validator pointing to the triples and to the folder with the constraints. In addition you need to supply an additional parameter pointing to a configuration file as documented at https://github.com/Swirrl/graphql-qb/blob/master/doc/table2qb-cubiql.md.