Harvesting a piveau instance#
piveau hub repo offers two endpoints that return the metadata of the datasets in RDF. These can be used to harvest the datasets of a portal:
- List datasets (
/datasets
) for a list of all datasets in the portal - List datasets of catalogue (
/catalogues/{catalogueId}/datasets
) for a list of all datasets in a specific catalogue
Both endpoints offer mostly the same set of query parameters to navigate through the list:
- valueType: has to be
metadata
to return the complete metadata of the datasets - offset: the offset of the first returned dataset, can be used together with
limit
to get the next set of datasets in the list - limit: the number of returned datasets. Max 5000, default 100. Can be used together with
offset
to get the next set of datasets in the list
Only available in the list datasets endpoint:
- hydra: set to
true
to include a HYDRA pagination object in the RDF response. This object will include a link to the next page of datasets.
Returned format & accept header: If no specific accept header is set, the content type returned will be
application/ld+json
. Other possible formats can be found in the API description and include e.g.
application/rdf+xml
or text/turtle
.
Warning
Please always refer to the API documentation for an up-to-date reference.
Hydra paging in hub-repo#
The listDataset enpoint has an optional prameter hydra
which will add HYDRA paging to the dataset list. The HYDRA graph will look like this:
{
"@id": "http://piveau.io/datasets/?valueType=metadata&hydra=true&limit=3&offset=0",
"@graph": [
{
"@id": "http://piveau.io/datasets/?valueType=metadata&hydra=true&limit=3&offset=0",
"@type": "http://www.w3.org/ns/hydra/core#PartialCollectionView",
"http://www.w3.org/ns/hydra/core#totalItems": {
"@value": "1234",
"@type": "http://www.w3.org/2001/XMLSchema#int"
},
"http://www.w3.org/ns/hydra/core#next": "http://piveau.io/datasets/?valueType=metadata&hydra=true&limit=3&offset=3",
"http://www.w3.org/ns/hydra/core#first": "http://piveau.io/datasets/?valueType=metadata&hydra=true&limit=3&offset=0",
"http://www.w3.org/ns/hydra/core#last": "http://piveau.io/datasets/?valueType=metadata&hydra=true&limit=3&offset=0"
}
]
}
Warning
This parameter is, as of this writing only available on the list datasets endpoint, not in the list datasets of catalogue endpoint.
If this service is behind a proxy that rewrites the URI, wrong URIs might show up. To fix this, you can configure your
proxy to send the correct current absolute URI in the X-Original-URI
header and this will be used.
For a kubernetes ingress you might want to set it like this: