Create your first Custom Metadata Model#
Experimental Feature
The described features, configuration and APIs are work in progress.
This tutorial will guide you through the process of creating your first custom metadata model for piveau. You will use piveau Profile and a custom SHACL file to do this. You will call our model DCAT-AP-Simple - a trimmed version of DCAT-AP with only essential properties. When you finished this tutorial you will have a working piveau backend that can be used to build Open Data portals with a very simple and compact metadata schema.
Prerequisites#
- Basic knowledge of piveau, SHACL and DCAT
- A running development instance of piveau hub-repo and hub-search (see Quick Start)
Setup and Configuration#
- To get started you have your local hub-repo and hub-search setup ready. Make sure you start with a vanilla project with empty Elasticsearch and Virtuoso. For now shutdown the services.
- Create a new directory
dcat-ap-simple
outside of the working directories of piveau. For this tutorial we assume the path is:/home/alice/dcap-ap-simple
. - Set the following configuration for hub-repo AND hub-search (usually in the config.json or via environment variables):
{
"PIVEAU_FEATURE_FLAGS": {
"piveau_profile": true
},
"PIVEAU_PROFILE": {
"type": "directory",
"path": "/home/alice/dcap-ap-simple"
}
}
- The feature flag
piveau_profile
activates the feature. - In
PIVEAU_PROFILE
you declaring that you will provide the profile in a directory on your disk and you pass the path to that directory.
Info
If you have already other feature flags enabled, you need to merge your existing settings into the JSON object.
- For now there is nothing more to do in your piveau installation.
Create the SHACL File#
A SHACL file is the single point of truth and core of your custom data model.
Adding Prefixed and Meta Information#
- Browse to the directory
dcat-ap-simple
, create a Turtle filedcat-ap-simple.ttl
and open it in your favorite IDE. (piveau Profile is currently only supporting Turtle as RDF format.) - Add the prefixes and some meta information to the file (you find detailed information inline):
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dcatap: <http://data.europa.eu/r5r/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix pv: <https://piveau.eu/ns/voc#> . # (1)!
pv:DCAT_AP_Simple # (2)!
a pv:PiveauProfile ;
pv:profileVersion "1" ; # (3)!
dct:title "A simple DCAT-AP profile for piveau"@en ;
dct:description "Based and inspired by DCAT-AP 2.1.1"@en ;
foaf:maker [
foaf:mbox <mailto:alice@wonderland.com> ;
foaf:name "Alice" ;
] ;
owl:versionInfo "0.0.1" . # (4)!
- Always use this namespace for piveau.
- This meta information is important for helping others to understand the purpose of your profile.
- Currently only version 1 of the piveau Profile feature exists.
- Here you can set the version of your profile. This supports to keep track of changes.
Adding the Catalog Shape#
- Now you add a shape for a catalog to
dcat-ap-simple.ttl
- catalog is one of the core classes of DCAT-AP. - A shape is more or less a list of properties you want to store and how to manage each of these properties.
- Your minimal catalog metadata will allow to store title, description (both in multiple languages) and publisher of the catalog.
- Please refer to the inline comments for detailed explanations.
dcatap:Catalog_Shape # (1)!
a sh:NodeShape ;
sh:name "Catalog"@en ;
sh:property [ # (2)!
pv:mappingClass "SimpleMultiLangTitle" ; # (3)!
pv:mappingName "title" ; # (4)!
sh:minCount 1 ; # (5)!
sh:nodeKind sh:Literal ;
sh:path dct:title ; # (6)!
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "SimpleMultiLang" ;
pv:mappingName "description" ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ; # (7)!
sh:path dct:description ;
sh:severity sh:Violation ; # (8)!
] ;
sh:property [
pv:mappingClass "Agent" ; # (9)!
pv:mappingName "publisher" ;
sh:maxCount 1 ; # (11)!
sh:minCount 1 ;
sh:path dct:publisher ;
sh:severity sh:Violation ;
] ;
sh:property [
sh:path dcat:dataset ;
sh:severity sh:Violation ;
sh:description "Required property for piveau base functionality." ; # (10)!
] ;
sh:property [
pv:mappingClass "StandardText" ;
pv:mappingName "id" ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:targetClass dcat:Catalog .
- This is the URI of your shape. You will need the full form (http://data.europa.eu/r5r/Catalog_Shape) later as a reference to it.
sh:property
adds a property to your modelpv:mappingClass
defines how the property is processed and indexed. In this case, SimpleMultiLangTitle is a field that supports multiple languages and autocomplete.- This is the name of the property in the search service.
- If you set this you make this property mandatory.
- This is the source of this property in RDF. So the value from this property comes from dct:title.
- The
sh:nodeKind
value is currently to used and only informative. - The
sh:severity
value is currently to used and only informative. - The
Agent
class allows to parse and indexfoaf:Agent
. - Some properties are mandatory and required for a correct functionality.
- If you set
sh:maxCount
to 1, the property will not be an array.
You find detailed information about the properties here.
Adding the Dataset Shape#
- Now you add another shape to define the metadata model of a dataset.
- As you see, the structure repeats - some details are highlighted in the comments:
dcatap:Dataset_Shape
a sh:NodeShape ;
sh:name "Dataset"@en ;
sh:property [
pv:mappingClass "SimpleMultiLangTitle" ;
pv:mappingName "title" ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ;
sh:path dct:title ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "SimpleMultiLang" ;
pv:mappingName "description" ;
sh:minCount 1 ;
sh:nodeKind sh:Literal ;
sh:path dct:description ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "Keywords" ; # (1)!
pv:mappingName "keywords" ;
sh:nodeKind sh:Literal ;
sh:path dcat:keyword ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "Language" ;
pv:mappingName "language" ;
sh:path dct:language ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "Agent" ;
pv:mappingName "creator" ;
sh:maxCount 1 ;
sh:path dct:creator ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "ContactPoint" ;
pv:mappingName "contact_point" ;
sh:path dcat:contactPoint ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "DateTime" ;
pv:mappingName "issued" ;
sh:maxCount 1 ;
sh:path dct:issued ;
sh:severity sh:Violation ;
sh:shape dcatap:DateOrDateTimeDataType_Shape ;
] ;
sh:property [
pv:mappingClass "Nested" ; # (2)!
pv:mappingLink dcatap:Distribution_Shape ; # (3)!
pv:mappingName "distributions" ;
sh:path dcat:distribution ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "Theme" ;
pv:mappingName "categories" ;
sh:path dcat:theme ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "StandardText" ;
pv:mappingName "id" ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:property [
pv:mappingClass "Nested" ;
pv:mappingName "catalog_record" ;
pv:mappingProperty [
pv:mappingClass "DateTime" ;
pv:mappingName "issued" ;
sh:maxCount 1 ;
] ;
pv:mappingProperty [
pv:mappingClass "DateTime" ;
pv:mappingName "modified" ;
sh:maxCount 1 ;
] ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:property [
pv:mappingClass "Nested" ;
pv:mappingName "catalog" ;
pv:mappingProperty [
pv:mappingClass "StandardText" ;
pv:mappingName "id" ;
sh:maxCount 1 ;
sh:minCount 1 ;
] ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:property [
sh:maxCount 1 ;
pv:mappingClass "SpatialResource" ;
pv:mappingName "country" ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:property [
pv:mappingClass "Metrics" ;
pv:mappingName "quality_meas" ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:targetClass dcat:Dataset .
- The mapping class
Keywords
allows you to to index keywords. Nested
is a very important class. It allows to link to other shapes. Here the distribution shape, which is introduced in the next section.- In
pv:mappingLink
the connection is defined. In the search service the referenced model will be embedded.
Adding the Distribution Shape#
- You finalize the model with the shape for distributions.
- This shape was referenced in the dataset shape above.
dcatap:Distribution_Shape
a sh:NodeShape ;
sh:property [
pv:mappingClass "StandardText" ;
pv:mappingName "id" ;
sh:maxCount 1 ;
sh:description "Required property for piveau base functionality." ;
] ;
sh:property [
pv:mappingClass "License" ;
pv:mappingName "license" ;
sh:maxCount 1 ;
sh:path dct:license ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "Format" ;
pv:mappingName "format" ;
sh:maxCount 1 ;
sh:path dct:format ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "StandardDisabled" ;
pv:mappingName "access_url" ;
sh:minCount 1 ;
sh:nodeKind sh:BlankNodeOrIRI ;
sh:path dcat:accessURL ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "SimpleMultiLang" ;
pv:mappingName "title" ;
sh:nodeKind sh:Literal ;
sh:path dct:title ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "StandardDisabled" ;
pv:mappingName "download_url" ;
sh:nodeKind sh:BlankNodeOrIRI ;
sh:path dcat:downloadURL ;
sh:severity sh:Violation ;
] ;
sh:property [
pv:mappingClass "SimpleMultiLang" ;
pv:mappingName "description" ;
sh:nodeKind sh:Literal ;
sh:path dct:description ;
sh:severity sh:Violation ;
] ;
sh:targetClass dcat:Distribution .
Create the piveau.json#
Now you create the entry point of your simple profile.
- Create a file in the
dcat-ap-simple
directory with the namepiveau.json
and the following content:
{
"version": "1",
"id": "dcat-ap-simple",
"core": [
{
"id": "dataset",
"description": "A simple representation of a dataset",
"path": "dcat-ap-simple.ttl",
"name": "dataset",
"shapeUri": "http://data.europa.eu/r5r/Dataset_Shape"
},
{
"id": "catalog",
"description": "A simple representation of a catalogue",
"path": "dcat-ap-simple.ttl",
"name": "catalog",
"shapeUri": "http://data.europa.eu/r5r/Catalog_Shape"
}
]
}
- This file connects you SHACL file and the containing shapes with the core entities (catalogs and datasets) of piveau.
- The value of the
path
key needs to be set to your SHACL file. - The
shapeUri
has to match the full URI of the shape in the SHACL file - do not use the prefix here. - It is also possible to separate the shapes into multiple SHACL files.
Start the Services#
Now you are all setup and ready to start hub-repo and hub-search and test your data model.
- Start your databases, hub-repo and hub-search.
- You will notice the following log entries in both services:
INFO io.piveau.profile.ProfileLoader - Loaded piveau profile 'dcat-ap-minimal'
INFO i.p.h.search.util.index.IndexManager - Loaded shape successfully for dataset
INFO i.p.h.search.util.index.IndexManager - Loaded shape successfully for catalogue
- After successfully launching the service you can browse to http://localhost:8081/profile (both service, hub-repo and hub-search offer this endpoint) to get information about the installed profile.
Testing Everything#
It is now time that you test how the new profile is affecting your piveau instance.
The Search Schema#
- First, you can go to Elasticsearch http://localhost:9200/_mapping to reproduce how the SHACL file was transformed into a a mapping:
- You will find here all your properties with names as defined in
pv:mappingName
. - If you have a look into the OpenAPI of hub-search you will find that the model of the dataset is also fitting our profile:
Info
Since hub-repo is schema agnostic, who will not find changes there.
Creating Test Data#
Below you find concrete instances and requests to create a catalog and a dataset that match the defined data model.
Simple Catalog Example#
- Create a catalog like this:
PUT http://localhost:8081/catalogues/simple-catalog
Content-Type: text/turtle
X-API-Key: {{api-key}}
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<https://piveau.io/id/catalogue/dcat-ap-simple-catalog>
a dcat:Catalog ;
dct:type "dcat-ap" ;
dct:title "DCAT-AP simple Example Catalogue"@en ;
dct:description "This catalog holds examples DCAT-AP simple Datasets"@en ;
dct:publisher <https://piveau.io/def/publisher/piveau> .
<https://piveau.io/def/publisher/piveau>
a foaf:Organization ;
foaf:homepage <https://piveau.io> ;
foaf:mbox <mailto:info@piveau.de> ;
foaf:name "Piveau" .
- If you query now hub-search (
http://localhost:8083/catalogues/simple-catalog
) you see an actual instance in hub-search of the simple metadata model:
Simple Dataset Example#
- The same way you can also create an instance of dataset:
PUT http://localhost:8081/catalogues/simple-catalog/datasets/origin?originalId=simple-dataset
Content-Type: text/turtle
X-API-Key: {{api-key}}
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dcatap: <http://data.europa.eu/r5r/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://piveau.io/set/data/dcat-ap-simple-dataset>
a dcat:Dataset ;
dct:title "This is a DCAT-AP simple dataset"@en ;
dct:description "This is a description of a DCAT-AP simple dataset"@en ;
dct:language <http://publications.europa.eu/resource/authority/language/ENG> ;
dct:creator <https://piveau.io/def/creator/piveau> ;
dcat:theme <http://publications.europa.eu/resource/authority/data-theme/TECH> ;
dct:issued "2024-07-31T00:00:00"^^xsd:dateTime ;
dcat:contactPoint [ a vcard:Individual ;
vcard:hasEmail <mailto:john@doe.de> ;
vcard:fn "John Doe" ;
vcard:hasAddress [
vcard:street-address "John Doe Str." ;
vcard:locality "Berlin" ;
vcard:postal-code "12345" ;
vcard:country-name "Germany"
] ;
vcard:hasTelephone "0049123456789" ;
vcard:hasURL <http://www.johndoe.de> ;
vcard:hasOrganizationName "John Doe Inc." ] ;
dcat:distribution <https://piveau.io/set/distribution/1> ;
dcat:keyword "piveau"@en, "opendata"@en .
<https://piveau.io/set/distribution/1>
a dcat:Distribution ;
dct:title "Example Distribution "@en ;
dct:description "Example Distribution Description"@en ;
dcat:accessURL <https://myactualdata.com/file> ;
dcat:downloadURL <https://myactualdata.com/downdload/file.csv> ;
dct:license [
a dct:LicenseDocument ;
skos:prefLabel "My License" ;
dct:title "This is my custom License" ;
skos:exactMatch "my-license"
] ;
dct:format <http://publications.europa.eu/resource/authority/file-type/CSV> .
<https://piveau.io/def/creator/piveau>
a foaf:Organization ;
foaf:homepage <https://piveau.io> ;
foaf:mbox <mailto:info@piveau.de> ;
foaf:name "Piveau" .
- You can query it from here:
http://localhost:8083/datasets/simple-dataset
Conclusion#
Now you are able to create your own metadata model for piveau, install it and test it out. You can use the DCAT-AP-Simple example as a starting point for a bigger and more complex model. For further inspiration you can also have a look into the build-in SHACL file that powers piveau by default.