piveau RDF Vocabularies#

Introduction#

Vocabularies play a crucial role in structuring and standardizing data within the piveau open data portal. They help ensure interoperability, reusability, and meaningful indexing of datasets. This document explains what vocabularies are, how they are used in piveau, and provides detailed technical instructions on configuring and managing them.

What Are Vocabularies?#

A vocabulary is a structured set of terms and concepts used to define and categorize data in a machine-readable format. In piveau, vocabularies help to: - Standardize dataset properties (e.g., dcat:theme linking to predefined categories). - Enable efficient data retrieval and enrichment. - Improve search capabilities through structured indexing.

For more information on vocabularies in the semantic web, refer to the W3C ontology standards.

Vocabularies Used in piveau#

Linking Data to Vocabularies#

The RDF schema in piveau recommends using specific vocabularies for certain RDF properties. For example, dcat:theme links datasets to categories defined in the EU Data-Theme Vocabulary.

Example: Assigning a Theme to a Dataset

@prefix dcat: <http://www.w3.org/ns/dcat#> .

<http://data.europa.eu/88u/dataset/simple-dataset> dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI> .

Explanation: Here, the dataset is categorized under "Agriculture" (AGRI) without needing additional metadata, as the vocabulary already defines the term.

Querying Vocabulary-Linked Data#

Using SPARQL, users can retrieve structured information about a dataset's assigned vocabulary terms:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?theme ?prefLabel
WHERE {
  <http://data.europa.eu/88u/dataset/simple-dataset> dcat:theme ?theme .
  ?theme skos:prefLabel ?prefLabel .
}

Explanation: This query retrieves the dataset's theme along with its human-readable label.

Warning

The piveau triplestore does not resolve external URIs automatically.
Vocabulary graphs must be stored locally in the triplestore to enable querying and indexing.

Vocabulary storage structure#

Piveau organizes vocabularies using a three-layer graph structure in the triplestore, ensuring clear separation and efficient management of different types of vocabulary-related information.

graph LR
    subgraph "Catalogue Graph"
        Cat["Vocabularies Catalogue<br/>(dcat:Catalog)<br/>graph: https://piveau.io/id/catalogue/vocabularies"]
    end

    subgraph "Vocabulary Dataset Graphs"
        DS1["Dataset: theme-vocabulary<br/>(dcat:Dataset)<br/>graph: `https://piveau.io/set/data/theme-vocabulary`"]
        DS2["Dataset: language-vocabulary<br/>(dcat:Dataset)<br/>graph: https://piveau.io/set/data/language-vocabulary"]
        DS3["Dataset: other-vocabulary<br/>(dcat:Dataset)<br/>graph: https://piveau.io/set/data/other-vocabulary"]
    end

    subgraph "Vocabulary Content Graphs"
        V1["Theme Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://publications.europa.eu/resource/authority/data-theme"]
        V2["Language Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://publications.europa.eu/resource/authority/language"]
        V3["Other Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://example.org/vocabulary/other"]
    end

    Cat -->|dcat:dataset| DS1
    Cat -->|dcat:dataset| DS2
    Cat -->|dcat:dataset| DS3

    DS1 -->|dcat:distribution<br/>accessURL| V1
    DS2 -->|dcat:distribution<br/>accessURL| V2
    DS3 -->|dcat:distribution<br/>accessURL| V3

    classDef catalogStyle fill:#f9f,stroke:#333,stroke-width:2px
    classDef datasetStyle fill:#bbf,stroke:#333,stroke-width:2px
    classDef vocabStyle fill:#bfb,stroke:#333,stroke-width:2px

    class Cat catalogStyle
    class DS1,DS2,DS3 datasetStyle
    class V1,V2,V3 vocabStyle

Layer 1: Vocabularies Catalogue

Graph: https://piveau.io/id/catalogue/vocabularies
Type: dcat:Catalog
Purpose: Serves as the main entry point for the interal piveau vocabulary management. Should not be modified manually.
Content: Contains metadata about all vocabulary datasets
Visibility: Marked as hidden to exclude from general dataset indexing

Layer 2: Vocabulary Dataset Graphs

Each vocabulary has its own dataset representation stored in a dedicated graph:

Graph Pattern: https://piveau.io/set/data/{vocabulary-id} (If following the standard piveau URI schema)
Type: dcat:Dataset
Purpose: Stores metadata about specific vocabularies for the interal piveau vocabulary management. Should not be modified manually.
Key Information:
- Vocabulary hash (stored as dct:identifier)
- Access URL to vocabulary content
- Version information
- Update timestamps
- Description and documentation

Layer 3: Vocabulary Content Graphs

The actual vocabulary definitions are stored in separate graphs, typically using their original URIs:

Graph Pattern: Original vocabulary URI (e.g., http://publications.europa.eu/resource/authority/data-theme)
Type: skos:ConceptScheme
Purpose: Contains the actual vocabulary terms and relationships
Content:
- SKOS concepts and their relationships
- Labels in multiple languages
- Hierarchical structures
- Mappings to other vocabularies

Relationships Between Layers

Catalogue to Datasets:

# In default graph
<https://piveau.io/id/catalogue/vocabularies>
    a dcat:Catalog ;
    dcat:dataset <https://piveau.io/set/data/theme-vocabulary> .

Datasets to Content:

# In dataset graph
<https://piveau.io/set/data/theme-vocabulary>
    a dcat:Dataset ;
    dcat:distribution [
        a dcat:Distribution ;
        dcat:accessURL <http://publications.europa.eu/resource/authority/data-theme>
    ] .

Warning

The piveau triplestore does not resolve external URIs automatically.
Vocabulary graphs must be stored locally in the triplestore to enable querying and indexing.

Vocabularies in different piveau services#

Vocabulary storage and management with Repo:

Vocabularies are stored in a catalogue structure.
Hidden from being indexed as normal datasets to maintain system organization
Each dataset contains metadata linking to the vocabulary.
A hash comparison prevents unnecessary updates during import.
Only the /vocabulary endpoints should be used to manage the vocabularies manually.

Vocabulary usage in the Data Provider Interface:

Users can assign themes to datasets without handling raw URIs.
Labels are resolved for improved usability.

Vocabulary usage in search & indexing:

Before indexing a dataset, vocabulary properties are resolved to enhance search results.
Each vocabulary term is stored as a searchable instance within an index in the search service.
The search service indexes each vocabulary separately (vocabulary_* naming convention).

Adding and managing vocabularies#

On first start#

piveau offers manual command for loading vocabularies used in DCAT-AP and DCAT-AP.de. This command is called installVocabularies and has the flag -h available to provide more information about it. It is available through the hub repo shell and also soon via the hub repo action API. You can find more information about this process here.

Danger

This process can take several minutes depending on the deployment and virtuoso configuration and the number of vocabularies that are going to be installed.

Note

The hub repo shell can be enabled via the PIVEAU_HUB_SHELL_CONFIG environment variable like it is done in the sample config. It is then accessible via repo-url/shell.html. More can be found out in the cli reference.

Non-SKOS vocabularies#

In piveau, there are some vocabularies that are not available as SKOS vocabularies but are still needed as vocabularies for the frontend. These vocabularies can be imported directly via the hub search CLI and are not stored as RDF in hub repo.

indexXmlVocabularies

Vocabulary Management API#

The piveau API provides endpoints for managing vocabularies through standard HTTP operations. Each vocabulary is identified by a unique vocabularyId.

Please refer to the OpenAPI description for the most up to date reference.

Danger

You should only use these API endpoints to manage the vocabularies. The vocabulary catalogue and the vocabulary datasets are needed for the internal management of the vocabularies and should not be modified manually via API. Doing so could lead to internal inconsistencies.

Authentication#

For write operations (PUT, DELETE), authentication is required using either:

API Key: Provided in the X-API-Key header
Bearer Token: Provided in the Authorization header with Bearer prefix (User has to be part of vocabularies group within keycloak)

Endpoints#

1. Retrieve a Vocabulary#

GET /vocabularies/{vocabularyId}

Retrieves a vocabulary in RDF format.

Example request:

curl -X GET \
  'https://piveau.io/api/hub/repo/vocabularies/data-theme' \
  -H 'Accept: application/rdf+xml'

Response (200 OK):

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#">
    <skos:ConceptScheme rdf:about="http://publications.europa.eu/resource/authority/data-theme">
        <!-- Vocabulary content -->
    </skos:ConceptScheme>
</rdf:RDF>

2. Check Vocabulary Existence#

HEAD /vocabularies/{vocabularyId}

Checks if a vocabulary exists without retrieving its content.

Example request:

curl -I \
  'https://piveau.io/api/hub/repo/vocabularies/data-theme'

Response (200 OK):

HTTP/1.1 200 OK
Content-Type: application/rdf+xml
Last-Modified: Wed, 10 Feb 2025 12:00:00 GMT

3. Create or Update a Vocabulary#

PUT /vocabularies/{vocabularyId}

Creates a new vocabulary or updates an existing one. Requires authentication. If they not exist, the vocabulary catalogue and dataset will be created automatically.

Example request for creating/updating:

curl -X PUT \
  'https://piveau.io/api/hub/repo/vocabularies/custom-theme' \
  -H 'X-API-Key: your_api_key' \
  -H 'Content-Type: application/rdf+xml' \
  --data-binary @vocabulary.rdf

Possible responses:

201 Created (new vocabulary)

HTTP/1.1 201 Created
Location: https://piveau.io/api/hub/repo/vocabularies/custom-theme

204 No Content (updated existing vocabulary)

4. Delete a Vocabulary#

DELETE /vocabularies/{vocabularyId}

Permanently removes a vocabulary. Requires authentication.

Example request:

curl -X DELETE \
  'https://piveau.io/api/hub/repo/vocabularies/custom-theme' \
  -H 'X-API-Key: your_api_key'

Response (204 No Content) if successful.

Error Handling#

Common error responses:

Authentication Errors
- 401 Unauthorized: Missing or invalid credentials
- 403 Forbidden: Valid credentials but insufficient permissions
Resource Errors
- 404 Not Found: Vocabulary doesn't exist
- 400 Bad Request: Invalid RDF data in PUT request
Database Errors
- 500 SPARQL Request Failed: this might have to do with requesting very large Graphs. It can be circumvented by setting the environment variable VIRT_SPARQL_MAXCONSTRUCTTRIPLES=50000

Info

This list is continually extended with possible problems you might encounter, and their solutions. If you see an error that is not yet listed here, you can open a ticket in our issue tracker.

Vocabulary Enrichment in hub-search#

Vocabulary usage is enriched during indexing in hub-repo or hub-search. The fields to be indexed can be configured with piveau profile. The enrichment process itself can be configured via the elasticsearch configuration in piveau hub search.

Example: Vocabulary Enrichment#

Dataset with a Vocabulary-Defined Type#

@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .

<https://piveau.io/set/data/simple-dataset>
    a                              dcat:Dataset ;  # (1)!
    dct:type                       <http://publications.europa.eu/resource/authority/dataset-type/TEST_DATA> .  # (2)!

Defines the resource as a dcat:Dataset.
Assigns a dataset type from a controlled vocabulary.

Enriched Representation in Search Index#

{
    "id": "simple-dataset",
    "type": {
       "id": "TEST_DATA",  // (1)!
       "label": "Test data",  // (2)!
       "resource": "http://publications.europa.eu/resource/authority/dataset-type/TEST_DATA"  // (3)!
    }
}

Retains the dataset type identifier.
Adds a human-readable label for improved usability.
Stores the original resource URI.

Configuring Enrichment in hub-search#

To enable enrichment, hub-search must be configured in the Elasticsearch configuration:

{
  "PIVEAU_HUB_SEARCH_ES_CONFIG": {
    "vocabulary": {
      "dataset-type": {
        "fields": ["type"],  // (1)!
        "excludes": ["distributions"],  // (2)!
        "replacements": [
          "id:id",  // (3)!
          "label:pref_label.en",  // (4)!
          "resource:resource"  // (5)!
        ]
      }
    }
  }
}

Specifies which fields to enrich (in this case, type).
Excludes distributions.type from enrichment to prevent conflicts.
Maps the type.id field to the vocabulary ID.
Replaces type.label with the preferred English label from the vocabulary.
Ensures type.resource retains the original URI.

Configuration Details#

Vocabulary Identifier (dataset-type):
- Must match the vocabulary's identifier in the system
- Used to locate the correct vocabulary data
Fields Array (fields):
- Lists all dataset properties that should be enriched using this vocabulary
- Example: "type" enriches any property named "type"
Excludes Array (excludes):
- Specifies JSON paths where enrichment should not occur
- Example: "distributions" prevents enriching "type" fields within distribution objects
- Optional: Use "includes" instead to specify only where enrichment should occur
Replacements Array (replacements):
- Format: "target:source"
- Target: Field name in the enriched output
- Source: Field name in the vocabulary data
- Common mappings:
  - "id:id" - Short identifier
  - "label:pref_label.en" - English display label
  - "resource:resource" - Full URI

Future Enhancements#

Future improvements will need to:

Index additional metadata for specific vocabularies (e.g., corporate bodies).
Extend the existing schema for more flexible enrichment strategies.
Reference additional dataset attributes beyond standard vocabularies.

Relevant Code References: - VocabularyHelper.kt - Enrichment Implementation - Specific Enrichment Logic (Line 191)

Understanding SKOS#

SKOS (Simple Knowledge Organization System) is a W3C standard used to represent controlled vocabularies, taxonomies, and thesauri in RDF. Think of it as a way to organize concepts and their relationships, similar to how you might organize items in a library catalog or product categories in an online store.

Key SKOS Concepts#

Concepts

The basic building blocks of SKOS
Represent ideas, meanings, or categories

Example: "Agriculture" as a dataset theme

<http://publications.europa.eu/resource/authority/data-theme/AGRI>
    a skos:Concept ;
    skos:prefLabel "Agriculture"@en ;
    skos:prefLabel "Landwirtschaft"@de .

Labels
- Ways to name concepts in different languages:
  - prefLabel: The main label (only one per language)
  - altLabel: Alternative labels or synonyms
  - hiddenLabel: Labels for search matching
```
<http://example.org/concepts/farming>
    skos:prefLabel "Agriculture"@en ;
    skos:altLabel "Farming"@en ;
    skos:altLabel "Agronomy"@en .
```

Hierarchical Relationships

broader: Links to more general concepts

narrower: Links to more specific concepts

<http://example.org/concepts/dairy-farming>
    skos:broader <http://example.org/concepts/farming> .

Concept Schemes

Collections of concepts

Similar to a controlled vocabulary or taxonomy

<http://example.org/vocabulary/themes>
    a skos:ConceptScheme ;
    skos:hasTopConcept <http://example.org/concepts/farming> .

SKOS in Piveau#

In piveau, SKOS is used to: 1. Organize dataset themes and categories 2. Provide multilingual labels for concepts 3. Support semantic search functionality

Example of a dataset using SKOS concepts:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<https://piveau.io/set/data/farming-dataset>
    a dcat:Dataset ;
    dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI> .

When this dataset is displayed in the portal: - The SKOS prefLabel is shown instead of the URI - Users can find the dataset using any altLabel

piveau RDF Vocabularies#

Introduction#

What Are Vocabularies?#

Vocabularies Used in piveau#

Linking Data to Vocabularies#

Querying Vocabulary-Linked Data#

Vocabulary storage structure#

Vocabularies in different piveau services#

Adding and managing vocabularies#

On first start#

Non-SKOS vocabularies#

Vocabulary Management API#

Authentication#

Endpoints#

1. Retrieve a Vocabulary#

2. Check Vocabulary Existence#

3. Create or Update a Vocabulary#

4. Delete a Vocabulary#

Error Handling#

Vocabulary Enrichment in hub-search#

Example: Vocabulary Enrichment#

Dataset with a Vocabulary-Defined Type#

Enriched Representation in Search Index#

Configuring Enrichment in hub-search#

Configuration Details#

Future Enhancements#

Understanding SKOS#

Key SKOS Concepts#

SKOS in Piveau#

Related Resources#