Skip to content

piveau RDF Vocabularies#

Introduction#

Vocabularies play a crucial role in structuring and standardizing data within the piveau open data portal. They help ensure interoperability, reusability, and meaningful indexing of datasets. This document explains what vocabularies are, how they are used in piveau, and provides detailed technical instructions on configuring and managing them.


What Are Vocabularies?#

A vocabulary is a structured set of terms and concepts used to define and categorize data in a machine-readable format. In piveau, vocabularies help to: - Standardize dataset properties (e.g., dcat:theme linking to predefined categories). - Enable efficient data retrieval and enrichment. - Improve search capabilities through structured indexing.

For more information on vocabularies in the semantic web, refer to the W3C ontology standards.


Vocabularies Used in piveau#

Linking Data to Vocabularies#

The RDF schema in piveau recommends using specific vocabularies for certain RDF properties. For example, dcat:theme links datasets to categories defined in the EU Data-Theme Vocabulary.

Example: Assigning a Theme to a Dataset

@prefix dcat: <http://www.w3.org/ns/dcat#> .

<http://data.europa.eu/88u/dataset/simple-dataset> dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI> .
Explanation: Here, the dataset is categorized under "Agriculture" (AGRI) without needing additional metadata, as the vocabulary already defines the term.

Querying Vocabulary-Linked Data#

Using SPARQL, users can retrieve structured information about a dataset's assigned vocabulary terms:

PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX dct:  <http://purl.org/dc/terms/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

SELECT DISTINCT ?theme ?prefLabel
WHERE {
  <http://data.europa.eu/88u/dataset/simple-dataset> dcat:theme ?theme .
  ?theme skos:prefLabel ?prefLabel .
}
Explanation: This query retrieves the dataset's theme along with its human-readable label.

Warning

  • The piveau triplestore does not resolve external URIs automatically.
  • Vocabulary graphs must be stored locally in the triplestore to enable querying and indexing.

Vocabulary storage structure#

Piveau organizes vocabularies using a three-layer graph structure in the triplestore, ensuring clear separation and efficient management of different types of vocabulary-related information.

graph LR
    subgraph "Catalogue Graph"
        Cat["Vocabularies Catalogue<br/>(dcat:Catalog)<br/>graph: https://piveau.io/id/catalogue/vocabularies"]
    end

    subgraph "Vocabulary Dataset Graphs"
        DS1["Dataset: theme-vocabulary<br/>(dcat:Dataset)<br/>graph: https://piveau.io/set/data/theme-vocabulary"]
        DS2["Dataset: language-vocabulary<br/>(dcat:Dataset)<br/>graph: https://piveau.io/set/data/language-vocabulary"]
        DS3["Dataset: other-vocabulary<br/>(dcat:Dataset)<br/>graph: https://piveau.io/set/data/other-vocabulary"]
    end

    subgraph "Vocabulary Content Graphs"
        V1["Theme Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://publications.europa.eu/resource/authority/data-theme"]
        V2["Language Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://publications.europa.eu/resource/authority/language"]
        V3["Other Vocabulary Content<br/>(skos:ConceptScheme)<br/>graph: http://example.org/vocabulary/other"]
    end

    Cat -->|dcat:dataset| DS1
    Cat -->|dcat:dataset| DS2
    Cat -->|dcat:dataset| DS3

    DS1 -->|dcat:distribution<br/>accessURL| V1
    DS2 -->|dcat:distribution<br/>accessURL| V2
    DS3 -->|dcat:distribution<br/>accessURL| V3

    classDef catalogStyle fill:#f9f,stroke:#333,stroke-width:2px
    classDef datasetStyle fill:#bbf,stroke:#333,stroke-width:2px
    classDef vocabStyle fill:#bfb,stroke:#333,stroke-width:2px

    class Cat catalogStyle
    class DS1,DS2,DS3 datasetStyle
    class V1,V2,V3 vocabStyle

Layer 1: Vocabularies Catalogue

  • Graph: https://piveau.io/id/catalogue/vocabularies
  • Type: dcat:Catalog
  • Purpose: Serves as the main entry point for vocabulary management
  • Content: Contains metadata about all vocabulary datasets
  • Visibility: Marked as hidden to exclude from general dataset indexing

Layer 2: Vocabulary Dataset Graphs

Each vocabulary has its own dataset representation stored in a dedicated graph:

  • Graph Pattern: https://piveau.io/set/data/{vocabulary-id} (If following the standard piveau URI schema)
  • Type: dcat:Dataset
  • Purpose: Stores metadata about specific vocabularies
  • Key Information:
    • Vocabulary hash (stored as dct:identifier)
    • Access URL to vocabulary content
    • Version information
    • Update timestamps
    • Description and documentation

Layer 3: Vocabulary Content Graphs

The actual vocabulary definitions are stored in separate graphs, typically using their original URIs:

  • Graph Pattern: Original vocabulary URI (e.g., http://publications.europa.eu/resource/authority/data-theme)
  • Type: skos:ConceptScheme
  • Purpose: Contains the actual vocabulary terms and relationships
  • Content:
    • SKOS concepts and their relationships
    • Labels in multiple languages
    • Hierarchical structures
    • Mappings to other vocabularies

Relationships Between Layers

  1. Catalogue to Datasets:

    # In default graph
    <https://piveau.io/id/catalogue/vocabularies>
        a dcat:Catalog ;
        dcat:dataset <https://piveau.io/set/data/theme-vocabulary> .
    

  2. Datasets to Content:

    # In dataset graph
    <https://piveau.io/set/data/theme-vocabulary>
        a dcat:Dataset ;
        dcat:distribution [
            a dcat:Distribution ;
            dcat:accessURL <http://publications.europa.eu/resource/authority/data-theme>
        ] .
    

Warning

  • The piveau triplestore does not resolve external URIs automatically.
  • Vocabulary graphs must be stored locally in the triplestore to enable querying and indexing.

Vocabularies in different piveau services#

Vocabulary storage and management with Repo:

  • Vocabularies are stored in a catalogue structure.
  • Hidden from being indexed as normal datasets to maintain system organization
  • Each dataset contains metadata linking to the vocabulary.
  • A hash comparison prevents unnecessary updates during import.

Vocabulary usage in the Data Provider Interface:

  • Users can assign themes to datasets without handling raw URIs.
  • Labels are resolved for improved usability.

Vocabulary usage in search & indexing:

  • Before indexing, vocabulary properties are resolved to enhance search results.
  • Each vocabulary term is stored as a searchable instance within an index.
  • The search service indexes each vocabulary separately (vocabulary_* naming convention).

Adding and managing vocabularies#

On first start#

piveau offers manual command for loading vocabularies used in DCAT-AP and DCAT-AP.de. This command is called installVocabularies and has the flag -h available to provide more information about it. It is available through the hub repo shell and also soon via the hub repo action API.

Note

The hub repo shell can be enabled via the PIVEAU_HUB_SHELL_CONFIG environment variable like it is done in the sample config. It is then accessible via repo-url/shell.html. More can be found out in the cli reference.

Non-SKOS vocabularies#

In piveau, there are some vocabularies that are not available as SKOS vocabularies but are still needed as vocabularies for the frontend. These vocabularies can be imported directly via the hub search CLI and are not stored as RDF in hub repo.

indexXmlVocabularies

Vocabulary Management API#

The piveau API provides endpoints for managing vocabularies through standard HTTP operations. Each vocabulary is identified by a unique vocabularyId.

Please refer to the OpenAPI description for the most up to date reference.

Authentication#

For write operations (PUT, DELETE), authentication is required using either:

  • API Key: Provided in the X-API-Key header
  • Bearer Token: Provided in the Authorization header with Bearer prefix

Endpoints#

1. Retrieve a Vocabulary#
GET /vocabularies/{vocabularyId}

Retrieves a vocabulary in RDF format.

Example request:

curl -X GET \
  'https://piveau.io/api/hub/repo/vocabularies/data-theme' \
  -H 'Accept: application/rdf+xml'

Response (200 OK):

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:skos="http://www.w3.org/2004/02/skos/core#">
    <skos:ConceptScheme rdf:about="http://publications.europa.eu/resource/authority/data-theme">
        <!-- Vocabulary content -->
    </skos:ConceptScheme>
</rdf:RDF>

2. Check Vocabulary Existence#
HEAD /vocabularies/{vocabularyId}

Checks if a vocabulary exists without retrieving its content.

Example request:

curl -I \
  'https://piveau.io/api/hub/repo/vocabularies/data-theme'

Response (200 OK):

HTTP/1.1 200 OK
Content-Type: application/rdf+xml
Last-Modified: Wed, 10 Feb 2025 12:00:00 GMT

3. Create or Update a Vocabulary#
PUT /vocabularies/{vocabularyId}

Creates a new vocabulary or updates an existing one. Requires authentication. If they not exist, the vocabulary catalogue and dataset will be created automatically.

Example request for creating/updating:

curl -X PUT \
  'https://piveau.io/api/hub/repo/vocabularies/custom-theme' \
  -H 'X-API-Key: your_api_key' \
  -H 'Content-Type: application/rdf+xml' \
  --data-binary @vocabulary.rdf

Possible responses:

  • 201 Created (new vocabulary)
    HTTP/1.1 201 Created
    Location: https://piveau.io/api/hub/repo/vocabularies/custom-theme
    
  • 204 No Content (updated existing vocabulary)
4. Delete a Vocabulary#
DELETE /vocabularies/{vocabularyId}

Permanently removes a vocabulary. Requires authentication.

Example request:

curl -X DELETE \
  'https://piveau.io/api/hub/repo/vocabularies/custom-theme' \
  -H 'X-API-Key: your_api_key'

Response (204 No Content) if successful.

Error Handling#

Common error responses:

  1. Authentication Errors

    • 401 Unauthorized: Missing or invalid credentials
    • 403 Forbidden: Valid credentials but insufficient permissions
  2. Resource Errors

    • 404 Not Found: Vocabulary doesn't exist
    • 400 Bad Request: Invalid RDF data in PUT request

Vocabulary usage is enriched during indexing in hub-repo or hub-search. The fields to be indexed can be configured with piveau profile. The enrichment process itself can be configured via the elasticsearch configuration in piveau hub search.

Example: Vocabulary Enrichment#

Dataset with a Vocabulary-Defined Type#

@prefix dcat:   <http://www.w3.org/ns/dcat#> .
@prefix dct:    <http://purl.org/dc/terms/> .

<https://piveau.io/set/data/simple-dataset>
    a                              dcat:Dataset ;  # (1)!
    dct:type                       <http://publications.europa.eu/resource/authority/dataset-type/TEST_DATA> .  # (2)!
  1. Defines the resource as a dcat:Dataset.
  2. Assigns a dataset type from a controlled vocabulary.

Enriched Representation in Search Index#

{
    "id": "simple-dataset",
    "type": {
       "id": "TEST_DATA",  // (1)!
       "label": "Test data",  // (2)!
       "resource": "http://publications.europa.eu/resource/authority/dataset-type/TEST_DATA"  // (3)!
    }
}
  1. Retains the dataset type identifier.
  2. Adds a human-readable label for improved usability.
  3. Stores the original resource URI.

To enable enrichment, hub-search must be configured in the Elasticsearch configuration:

{
  "PIVEAU_HUB_SEARCH_ES_CONFIG": {
    "vocabulary": {
      "dataset-type": {
        "fields": ["type"],  // (1)!
        "excludes": ["distributions"],  // (2)!
        "replacements": [
          "id:id",  // (3)!
          "label:pref_label.en",  // (4)!
          "resource:resource"  // (5)!
        ]
      }
    }
  }
}

  1. Specifies which fields to enrich (in this case, type).
  2. Excludes distributions.type from enrichment to prevent conflicts.
  3. Maps the type.id field to the vocabulary ID.
  4. Replaces type.label with the preferred English label from the vocabulary.
  5. Ensures type.resource retains the original URI.

Configuration Details#

  1. Vocabulary Identifier (dataset-type):

    • Must match the vocabulary's identifier in the system
    • Used to locate the correct vocabulary data
  2. Fields Array (fields):

    • Lists all dataset properties that should be enriched using this vocabulary
    • Example: "type" enriches any property named "type"
  3. Excludes Array (excludes):

    • Specifies JSON paths where enrichment should not occur
    • Example: "distributions" prevents enriching "type" fields within distribution objects
    • Optional: Use "includes" instead to specify only where enrichment should occur
  4. Replacements Array (replacements):

    • Format: "target:source"
    • Target: Field name in the enriched output
    • Source: Field name in the vocabulary data
    • Common mappings:
      • "id:id" - Short identifier
      • "label:pref_label.en" - English display label
      • "resource:resource" - Full URI

Future Enhancements#

Future improvements will need to:

  • Index additional metadata for specific vocabularies (e.g., corporate bodies).
  • Extend the existing schema for more flexible enrichment strategies.
  • Reference additional dataset attributes beyond standard vocabularies.

Relevant Code References: - VocabularyHelper.kt - Enrichment Implementation - Specific Enrichment Logic (Line 191)

Understanding SKOS#

SKOS (Simple Knowledge Organization System) is a W3C standard used to represent controlled vocabularies, taxonomies, and thesauri in RDF. Think of it as a way to organize concepts and their relationships, similar to how you might organize items in a library catalog or product categories in an online store.

Key SKOS Concepts#

  1. Concepts

    • The basic building blocks of SKOS
    • Represent ideas, meanings, or categories
    • Example: "Agriculture" as a dataset theme
      <http://publications.europa.eu/resource/authority/data-theme/AGRI>
          a skos:Concept ;
          skos:prefLabel "Agriculture"@en ;
          skos:prefLabel "Landwirtschaft"@de .
      
  2. Labels

    • Ways to name concepts in different languages:
      • prefLabel: The main label (only one per language)
      • altLabel: Alternative labels or synonyms
      • hiddenLabel: Labels for search matching
        <http://example.org/concepts/farming>
            skos:prefLabel "Agriculture"@en ;
            skos:altLabel "Farming"@en ;
            skos:altLabel "Agronomy"@en .
        
  3. Hierarchical Relationships

    • broader: Links to more general concepts
    • narrower: Links to more specific concepts
      <http://example.org/concepts/dairy-farming>
          skos:broader <http://example.org/concepts/farming> .
      
  4. Concept Schemes

    • Collections of concepts
    • Similar to a controlled vocabulary or taxonomy
      <http://example.org/vocabulary/themes>
          a skos:ConceptScheme ;
          skos:hasTopConcept <http://example.org/concepts/farming> .
      

SKOS in Piveau#

In piveau, SKOS is used to: 1. Organize dataset themes and categories 2. Provide multilingual labels for concepts 4. Support semantic search functionality

Example of a dataset using SKOS concepts:

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

<https://piveau.io/set/data/farming-dataset>
    a dcat:Dataset ;
    dcat:theme <http://publications.europa.eu/resource/authority/data-theme/AGRI> .

When this dataset is displayed in the portal: - The SKOS prefLabel is shown instead of the URI - Users can find the dataset using any altLabel