How to use the APIs of piveau-hub#
piveau-hub offers to methods of interaction:
- A comprehensive RESTful API, that offers all functionaltiies of piveau-hub.
- A Single-Page-Application (piveau-hub-ui), that is mainly designed for the end user.
Using the repo API#
The main API and interaction point is offered by the piveau-hub-repo service. You find the OpenAPI documentation on the base route of the service (e.g. http://localhost:8081, if you followed the quick start guide). The documentation offers details about all available endpoints and interaction methods. To get you started we will create a catalogue and dataset by example. This requires you have piveau-hub running and the API key on hand. You find the latter in the configuration of the piveau-hub-repo.
Create a catalogue#
A catalogue is a container for organising and managing your datasets. Every dataset must be included within a catalog; it is not possible for a dataset to be outside of a catalogue.
PUT http://localhost:8081/catalogues/example-catalog
Content-Type: text/turtle
X-API-Key: {{api-key}}
@prefix dct: <http://purl.org/dc/terms/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
<https://example.io/id/catalogue/example-catalog>
a dcat:Catalog ;
dct:type "dcat-ap";
dct:title "Example Catalog"@en ;
dct:description "This is an example Catalog"@en ;
dct:language <http://publications.europa.eu/resource/authority/language/ENG> ;
Create a dataset#
After creating a catalogue, now we can add a dataset to our catalogue.
PUT http://localhost:8081/cazagloues/example-catalog/datasets/origin?originalId=example-dataset
Content-Type: text/turtle
Authorization: {{api-key}}
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<https://example.io/set/data/example-dataset>
a dcat:Dataset ;
dct:title "Example Dataset 2"@en ;
dct:description "This is an example Dataset" ;
dct:issued "2015-08-28T00:00:00"^^xsd:dateTime ;
dcat:distribution <https://example.io/set/distribution/1> .
<https://example.io/set/distribution/1>
a dcat:Distribution ;
dcat:accessURL <http://a-csv-file.com> ;
dct:format <http://publications.europa.eu/resource/authority/file-type/CSV> ;
dct:title "Example Distribution 1" .
The dataset will also be indexed by the piveau-hub-search service:
Using the search API#
Searching#
To simply to a full-text search, you can use the q
parameter.
In the following example, we search for "health":
If you like to do an autocomplete search, you can enable the autocomplete
parameter.
The administrator can configure which field is used for autocomplete, one is title
.
In the following example we do an autocomplete search in the catalogues index:
Note: The filter
parameter must be set for these parameters. The index must have a configured autocomplete field.
The filter
parameter is explaine in the facets section.
Pagination#
By default, a search query returns 10 results per page.
To increase the number of results per page, you can use the limit
parameter.
It allows to increase the number of results up to 1000 results per page.
To go through the pages, you can use the page
parameter.
It starts counting at 1, so to access the first page with 15 results per page you call:
To access the second page you would increase page by one:
By default, the search service returns the first page, i.e. page is set to 0.
Using page
and limit
is limited to maximum result window.
It can be configured by the administrator and increased on demand.
If page*limit
is higher than the maximum result window, the search service won't return any results.
If you want to iterate through all the search results for a particular query,
you should use the scroll API.
The scroll API takes a snapshot of the current search result list.
When you paginate with page
and limit
pages could change in between through indexing or deletion.
To use the scroll API, you must set the scroll
parameter:
Afterwards you can use the scrollId
to pass it to the scroll API:
Everytime you call the scroll API with this id you get another page of results.
Sorting#
If you want to sort the search results according to a sort order, you can specify this using the sort
parameter.
By default, the search service sorts by relevance. Relevance is measured according to a score.
To return the score, you can enable the showScore
parameter:
For a sort you can either sort by relevance or by a field.
For both you must define if the sorting is ascending (asc
) or (descending
).
In the following example, sorting is primarily by relevance in descending order. So most relevant search results come first. "But hey, there is a second and a third sort?" Yes, you can define secondary sorting. In this example, it means that whenever two search results have the same score, they will be sorted in descending order according to the modified field. And if they also have the same modified field, they will be sorted by their English title in ascending order.
Facets#
An important concept of searching are facets.
Facets are used for aggregations and filtering.
To enable facets in the search result, you must specify the filter
parameter.
The filter
parameter specifies in which search index we are searching.
As each search index has its own structure they provide different facets.
In the following example, you would search in the datasets index.
{
"result": {
"index": "dataset",
"count": 1616281,
"facets": [
...
{
"id": "country",
"title": "Provenance",
"items": [
{
"count": 601663,
"id": "de",
"title": "Germany"
},
...
]
}
],
"results": [...]
}
}
In the example above you can see a part of a facet.
A facet always contains an id
, a title
and a list of items
, sorted by the highest counting.
Each item contains an id
, a count
and a title
.
The counting of the facet is called aggregation.
Aggregation allows you to see how the data is distributed and get an overview of the entire search index.
It can also guide you if you don't know exactly what you are looking for.
You can see the aggregation and get an idea of how you would like to filter.
If you don't need the aggregation, you can disable it. Disabling the aggregation will remove the facets from search result, but saves some time for preparing.
If you only want some of the facets returned, you can use the aggregationAllFields
and aggregationFields
parameters:
GET http://localhost:8083/search?filter=dataset&aggregationAllFields=false&aggregationFields=country,format
By default, the aggregation values decrease the more you narrow down the search results by searching and filtering.
If you want aggregation to be not influenced by it, you can enable the globalAggregation
parameter:
To filter the search results by facets, you can use the facets
parameter.
In the following example, we filter for all records that are located in Germany and have a distribution in CSV or PDF format.
The values for filtering are selected from the respective facet objects.
By default, filtering across facets is combined with AND within a facet and across facets with OR.
If you would like to adjust, you can use the facetOperator
and facetGroupOperator
.
As with the default, we have facetOperator=OR
and facetGroupOperator=AND
.
In the following example, we set facetOperator=AND
.
Then, we filter for all records that are located in Germany and have a distribution in CSV and PDF format.
GET http://localhost:8083/search?filter=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND
In the following example, we set facetOperator=AND
and facetGroupOperator=OR
.
Then, we filter for all records that are located in Germany or have a distribution in CSV and PDF format.
GET http://localhost:8083/search?filter=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND
The following table list all possible filtering for this given example:
facetOperator=OR |
facetOperator=AND |
|
---|---|---|
facetGroupOperator=OR |
Filter for all records that are located in Germany or have a distribution in CSV or PDF format. | Filter for all records that are located in Germany or have a distribution in CSV and PDF format. |
facetGroupOperator=AND |
Filter for all records that are located in Germany and have a distribution in CSV or PDF format. (Default) | Filter for all records that are located in Germany and have a distribution in CSV and PDF format. |
More filtering#
Filter by date#
If you like to filter by a date, you can use the minDate
and maxDate
parameter.
Your input must comply with ISO 8601.
The administrator can configure which field is used for filtering, common ones are issued
, modified
or temporal
.
In the following example, we filter by a minimum and maximum date:
GET http://localhost:8083/search?filter=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z
Note: The filter
parameter must be set for these parameters. The index must have a configured date field.
Filter by DQV#
If you like to filter by a data quality value, you can use the minScoring
and maxScoring
parameter.
The administrator can configure which field is used for filtering, one is quality_meas.scoring
.
In the following example, we filter by a minimum and maximum scoring:
GET http://localhost:8083/search?filter=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z
Note: The filter
parameter must be set for these parameters. The index must have a configured dqv field.
Filter by country data#
If you like to filter by country data, you can enable countryData
parameter.
The administrator can configure which language codes do not belong to a country, e.g. eu
and io
.
In the following example, we filter by country data:
Note: The filter
parameter must be set for these parameters. The index must have a configured country field.
Filter by data services#
If you like to filter by data services, you can enable dataServices
parameter.
The administrator can configure field is used for filtering, one is distributions.access_service
.
In the following example, we filter by data services:
Note: The filter
parameter must be set for these parameters. The index must have a configured data service field.
Filter by bounding box#
If you like to filter by bounding box, you use the bounding box parameters.
The administrator can configure field is used for filtering, one is spatial
.
In the following example, we filter by a bounding box:
GET http://localhost:8083/search?filter=dataset&bboxMinLon=50&bboxMaxLon=60&bboxMinLat=45&bboxMaxLat=50
For the longitude bboxMinLon
must be smaller than bboxMaxLon
and both are between -180 and 180.
For the latitude bboxMinLat
must be smaller than bboxMaxLat
and both must be between -90 and 90.
Note: The filter
parameter must be set for these parameters. The index must have a configured data service field.
Filter by vocabulary#
If you search all vocabulary indices, you must set filter=vocabulary
.
In the following example, we search in all vocabulary indices:
If you like to search in a specific vocabulary, you can set the vocabulary
parameter.
Note: All indexing vocabularies can be listed over GET http://localhost:8083/vocabularies
Reduce#
If you like to reduce the search result list to a set of fields, you can use the includes
parameter.
In the following example, we only return id
and title
.
Fields#
Select fields#
By default, all fields are searched.
If you want to select certains fields for searching, you can use the fields
parameter.
In the following example, we only search in title
and keywords
.
Boost fields#
By default, all fields are weighted equally during a search.
The administrator can change this default behaviour, by giving certain values a boost.
If you want to specify a different weighting, you can use the boost
parameter.
In the following example, we set a custom boost for title
and keywords
.