How to use the APIs of piveau-hub#

piveau-hub offers to methods of interaction:

A comprehensive RESTful API, that offers all functionaltiies of piveau-hub.
A Single-Page-Application (piveau-hub-ui), that is mainly designed for the end user.

Using the repo API#

The main API and interaction point is offered by the piveau-hub-repo service. You find the OpenAPI documentation on the base route of the service (e.g. http://localhost:8081, if you followed the quick start guide). The documentation offers details about all available endpoints and interaction methods. To get you started we will create a catalogue and dataset by example. This requires you have piveau-hub running and the API key on hand. You find the latter in the configuration of the piveau-hub-repo.

Create a catalogue#

A catalogue is a container for organising and managing your datasets. Every dataset must be included within a catalog; it is not possible for a dataset to be outside of a catalogue.

PUT http://localhost:8081/catalogues/example-catalog
Content-Type: text/turtle
X-API-Key: {{api-key}}

@prefix dct: <http://purl.org/dc/terms/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .

<https://example.io/id/catalogue/example-catalog>
    a dcat:Catalog ;
    dct:type "dcat-ap";
    dct:title "Example Catalog"@en ;
    dct:description "This is an example Catalog"@en ;
    dct:language  <http://publications.europa.eu/resource/authority/language/ENG> ;

After a successful request, you find your catalogue here:

GET http://localhost:8081/catalogues/example-catalog

The catalogue will also be indexed by the piveau-hub-search service:

GET http://localhost:8083/catalogues/example-catalog

Create a dataset#

After creating a catalogue, now we can add a dataset to our catalogue.

PUT http://localhost:8081/catagloues/example-catalog/datasets/origin?originalId=example-dataset
Content-Type: text/turtle
Authorization: {{api-key}}

@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<https://example.io/set/data/example-dataset>
    a dcat:Dataset ;
    dct:title "Example Dataset 2"@en ;
    dct:description "This is an example Dataset" ;
    dct:issued "2015-08-28T00:00:00"^^xsd:dateTime ;
    dcat:distribution <https://example.io/set/distribution/1> .

<https://example.io/set/distribution/1>
    a dcat:Distribution ;
    dcat:accessURL <http://a-csv-file.com> ;
    dct:format <http://publications.europa.eu/resource/authority/file-type/CSV>  ;
    dct:title "Example Distribution 1" .

After a successful request, you find your dataset here:

GET http://localhost:8081/datasets/example-dataset

The dataset will also be indexed by the piveau-hub-search service:

GET http://localhost:8083/datasets/example-dataset

Using the search API#

Searching#

Basic Search#

To simply do a full-text search, you can use the q parameter. In the following example, we search for "health":

GET http://localhost:8083/search?q=health

And the following shows how to search for the term "health" in the dataset index:

GET http://localhost:8083/search?q=health&filters=dataset

If you do not define a filters query parameter, then search API will search in all available indices.

Autocomplete#

If you like to have an autocomplete feature on your search box UI, you can use the autocomplete feature by enabling the autocomplete parameter. The administrator can configure which field is used for autocomplete, for example title. In the following example we do an autocomplete search in the catalogue index:

GET http://localhost:8083/search?q=hea&filters=catalogue&autocomplete=true

Note: The filters query parameter must be set for these parameters. The index must have a configured autocomplete field. The filters query parameter is explained in the facets section.

Advanced Search#

Wildcard#

Wildcard search can optimize search results when the complete search term is unknown. Use ? (question mark) to replace a single character and * (asterisk) to replace zero or more characters. Example:

GET http://localhost:8083/search?q=ois?au dat*&filters=dataset

The example above returns datasets like "oisiau data," "oisoau dataset," or "oiseau datasets." Unlike basic search, wildcard search does not reduce terms to their root form when some letters are unknown. Note that more search terms that need to be checked for potential matches (e.g., a, b, c) can reduce search efficiency. Wildcards at the beginning of a keyword (e.g., ing or ?iseau) are ignored and treated literally to maintain performance. Invalid wildcard terms (e.g., ois?au OR (data AND organization with unbalanced parentheses)) will also be treated literally.

Exact Phrase#

Placing your search terms in double quotes treats them as a case-insensitive phrase. The search will return datasets with the keywords in the exact order specified. Example:

GET http://localhost:8083/search?q="manual public space"&filters=dataset

The example above returns datasets containing exactly "manual public space" and will not include variations like "manual space public" or "space public manual." If you search for "public spa", the results will only include datasets that contain "public spa."

Combined Search#

You can search across multiple indices at once by incuding the indices you want to search in the filters query parameter. The following shows how to search for the term "health" in the dataset and catalogue indices at once:

GET http://localhost:8083/search?q=health&filters=dataset,catalogue

If you want to include vocabulary index, then you can simply include it in filters the way it is explained here.

Certainly, you can enhance your search request and refine your search results by having more query parameters, as explained in the following sections.

Pagination#

By default, a search query returns 10 results per page. To increase the number of results per page, you can use the limit parameter. It allows to increase the number of results up to 1000 results per page. To go through the pages, you can use the page parameter.

It starts counting at 1, so to access the first page with 15 results per page you call:

GET http://localhost:8083/search?page=0&limit=15

To access the second page you would increase page by one:

GET http://localhost:8083/search?page=1&limit=15

By default, the search service returns the first page, i.e. page is set to 0.

Using page and limit is limited to maximum result window. It can be configured by the administrator and increased on demand. If page*limit is higher than the maximum result window, the search service won't return any results.

If you want to iterate through all the search results for a particular query, you should use the scroll API. The scroll API takes a snapshot of the current search result list. When you paginate with page and limit pages could change in between through indexing or deletion.

To use the scroll API, you must set the scroll parameter:

GET http://localhost:8083/search?scroll=true

{
  "result": {
    "count": 1634095,
    "scrollId": "FGluY2x1ZGVfY29udGV4dF91...",
    "results": [...]
  }
}

Afterwards you can use the scrollId to pass it to the scroll API:

GET http://localhost:8083/scroll?scrollId=FGluY2x1ZGVfY29udGV4dF91...

Everytime you call the scroll API with this id you get another page of results.

Sorting#

If you want to sort the search results according to a sort order, you can specify this using the sort parameter. By default, the search service sorts by relevance. Relevance is measured according to a score. To return the score, you can enable the showScore parameter:

GET http://localhost:8083/search?q=health&showScore=true

For a sort you can either sort by relevance or by a field. For both you must define if the sorting is ascending (asc) or (descending).

In the following example, sorting is primarily by relevance in descending order. So most relevant search results come first. "But hey, there is a second and a third sort?" Yes, you can define secondary sorting. In this example, it means that whenever two search results have the same score, they will be sorted in descending order according to the modified field. And if they also have the same modified field, they will be sorted by their English title in ascending order.

GET http://localhost:8083/search?sort=relevance+desc,modified+desc,title.en+asc

Sorting in combined search#

When you search across multiple indices at once, sorting is limited to fields present in all indices; fields not common to all will be ignored, and results will be sorted only by the common fields. For example, if a dataset has the title.en field but a vocabulary does not, the result of the following combined search request will be sorted solely by relevance:

GET http://localhost:8083/search?filters=dataset,vocabulary&sort=relevance+desc,title.en+asc

An important concept of searching are facets. Facets are used for aggregations and filtering. To enable facets in the search result, you must specify the filters parameter. The filters parameter specifies in which search indices we are searching. As each search index has its own structure they provide different facets. As an administrator, you can configure the facets by following this technical guide.

In the following example, you would search in the datasets index.

GET http://localhost:8083/search?filters=dataset

{
  "result": {
    "index": "dataset",
    "count": 1616281,
    "facets": [
      ...
      {
        "id": "country",
        "title": "Provenance",
        "items": [
          {
            "count": 601663,
            "id": "de",
            "title": "Germany"
          },
          ...
        ]
      }
    ],
    "results": [...]
  }
}

In the example above you can see a part of a facet. A facet always contains an id, a title and a list of items, sorted by the highest counting. Each item contains an id, a count and a title. The counting of the facet is called aggregation. Aggregation allows you to see how the data is distributed and get an overview of the entire search index. It can also guide you if you don't know exactly what you are looking for. You can see the aggregation and get an idea of how you would like to filter.

If you don't need the aggregation, you can disable it. Disabling the aggregation will remove the facets from search result, but saves some time for preparing.

GET http://localhost:8083/search?filters=dataset&aggregation=false

If you only want some of the facets returned, you can use the aggregationAllFields and aggregationFields parameters:

GET http://localhost:8083/search?filters=dataset&aggregationAllFields=false&aggregationFields=country,format

By default, the aggregation values decrease the more you narrow down the search results by searching and filtering. If you want aggregation to be not influenced by it, you can enable the globalAggregation parameter:

GET http://localhost:8083/search?filters=dataset&q=health&globalAggregation=true

To filter the search results by facets, you can use the facets query parameter. As an administrator, you can configure the facets to suit your specific requirements.

In the following example, we filter for all records that are located in Germany and have a distribution in CSV or PDF format. The values for filtering are selected from the respective facet objects.

GET http://localhost:8083/search?filters=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}

By default, filtering across facets is combined with AND within a facet and across facets with OR. If you would like to adjust, you can use the facetOperator and facetGroupOperator. As with the default, we have facetOperator=OR and facetGroupOperator=AND.

In the following example, we set facetOperator=AND. Then, we filter for all records that are located in Germany and have a distribution in CSV and PDF format.

GET http://localhost:8083/search?filters=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND

In the following example, we set facetOperator=AND and facetGroupOperator=OR. Then, we filter for all records that are located in Germany or have a distribution in CSV and PDF format.

GET http://localhost:8083/search?filters=dataset&facets={"format":["CSV", "PDF"],"country":["de"]}&facetOperator=AND

The following table list all possible filtering for this given example:

	`facetOperator=OR`	`facetOperator=AND`
`facetGroupOperator=OR`	Filter for all records that are located in Germany or have a distribution in CSV or PDF format.	Filter for all records that are located in Germany or have a distribution in CSV and PDF format.
`facetGroupOperator=AND`	Filter for all records that are located in Germany and have a distribution in CSV or PDF format. (Default)	Filter for all records that are located in Germany and have a distribution in CSV and PDF format.

More filtering#

Filter by date#

If you like to filter by a date, you can use the minDate and maxDate parameter. Your input must comply with ISO 8601. The administrator can configure which field is used for filtering, common ones are issued, modified or temporal. In the following example, we filter by a minimum and maximum date:

GET http://localhost:8083/search?filters=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z

Note: The filters parameter must be set for these parameters. The index must have a configured date field.

Filter by data quality value#

If you like to filter by a data quality value, you can use the minScoring and maxScoring parameter. The administrator can configure which field is used for filtering, one is quality_meas.scoring. In the following example, we filter by a minimum and maximum scoring:

GET http://localhost:8083/search?filters=dataset&minDate=2023-04-01T22:00:00.000Z&maxDate=2024-03-31T00:00:00.000Z

Note: The filters parameter must be set for these parameters. The index must have a configured dqv field.

Filter by country data#

If you like to filter by country data, you can enable countryData parameter. The administrator can configure which language codes do not belong to a country, e.g. eu and io. In the following example, we filter by country data:

GET http://localhost:8083/search?filters=dataset&countryData=true

Note: The filters parameter must be set for these parameters. The index must have a configured country field.

Filter by data services#

If you like to filter by data services, you can enable dataServices parameter. The administrator can configure field is used for filtering, one is distributions.access_service. In the following example, we filter by data services:

GET http://localhost:8083/search?filters=dataset&dataServices=true

Note: The filters parameter must be set for these parameters. The index must have a configured data service field.

Filter by geographic location (bounding box)#

If you like to filter by bounding box, you use the bounding box parameters. The administrator can configure field is used for filtering, one is spatial. In the following example, we filter by a bounding box:

GET http://localhost:8083/search?filters=dataset&bboxMinLon=50&bboxMaxLon=60&bboxMinLat=45&bboxMaxLat=50

For the longitude bboxMinLon must be smaller than bboxMaxLon and both are between -180 and 180. For the latitude bboxMinLat must be smaller than bboxMaxLat and both must be between -90 and 90.

Note: The filters parameter must be set for these parameters. The index must have a configured data service field.

Filter by vocabulary#

If you search all vocabulary indices, you must set filters=vocabulary. In the following example, we search in all vocabulary indices:

GET http://localhost:8083/search?filters=vocabulary

If you like to search in a specific vocabulary, you can set the vocabulary parameter.

GET http://localhost:8083/search?filters=vocabulary&vocabulary=access-right

Note: All indexing vocabularies can be listed over GET http://localhost:8083/vocabularies

Reduce#

If you like to reduce the search result list to a set of fields, you can use the includes parameter. In the following example, we only return id and title.

GET http://localhost:8083/search?includes=id,title

Fields#

Select fields#

By default, all fields are searched. If you want to select certains fields for searching, you can use the fields parameter. In the following example, we only search in title and keywords.

GET http://localhost:8083/search?fields=title,keywords

Boost fields#

By default, all fields are weighted equally during a search. The administrator can change this default behaviour, by giving certain values a boost. If you want to specify a different weighting, you can use the boost parameter. In the following example, we set a custom boost for title and keywords.

GET http://localhost:8083/search?boost.title=10&boost.keyword=3