Harvesting - piveau consus#
Consus is an extract, transform and load (ETL) like framework.
When you need to fetch data or metadata from a source, Consus provides you a high performant and high scalable solution based on microservices and container technology.
Concept#
The basic concept of Consus is that of a Pipe.
Technically speaking, a pipe is the orchestration of several modules, each module represents a step of processing data. An example Pipe is that of a harvester where data processing modules orchestrate to a chain, usually an importer, a transformer, and an exporter.
- Pipe
- A pipe is the chaining of data processing Pipe Segments. A pipe can be in two different states, definition and instance. Before a pipe can be executed, the definition needs to be "instantiated". Starting the pipe means passing an instance to the first segment.
- Pipe Descriptor
- JSON or YAML description of a Pipe.
- Pipe Definition
- The semantic content of a Pipe Descriptor. Usually, it contains some metadata about the Pipe and the chaining information of one or more Pipe Segments plus their configuration. It lacks the information about real connecting information, single execution information and usually any payload.
- Pipe Instance
- To execute a Pipe, the Pipe Definition must be instantiated to a pipe instance. Technically speaking, an instance is the Pipe Definition, applied for real addresses of segment implementations (Pipe Modules), execution information like run id and start time, an optional Pipe Payload, and optionally run specific segment configurations. You can then use the instance to start the pipe by passing it to the first segment.
- Pipe Segment
- A description of a single module, program or entity that is able to be part of a Pipe.
- Pipe Payload
- Data embedded in a Pipe Instance.
- Pipe Module
- An entity that implements a Pipe Segment.
- Pipe Run
- The execution of the Pipe. To start a run pass a Pipe Instance to the first Pipe Module.
In other words, a pipe must first be defined, then instantiated and finally this instance can be started.
Let's have a look on a minimal pipe definition. To define a pipe we use the pipe descriptor, either in JSON format or more user-friendly, in YAML format.
Minimum Pipe Definition
A corresponding pipe instance.
An example Pipe Instance
Installation#
A minimum Consus installation consists of following parts:
- At least one Pipe Module
- The Scheduler
- At least one Pipe Descriptor
Optionally, you can connect the modules to a ElasticStack instance for monitoring purposes and for a convenient frontend
the piveau-consus-monitoring-ui
component.
Pipe Modules#
Importer | |
---|---|
piveau-consus-importing-rdf |
Import metadata from an RDF source |
piveau-consus-importing-ckan |
Import metadata from ckan |
piveau-consus-importing-oaipmh |
Import metadata via OAI-PMH protocol |
piveau-consus-importing-sparql |
Import metadata from a SPARQL endpoint |
piveau-consus-importing-socrata |
Import metadata from Socrata |
piveau-consus-importing-udata |
Import metadata from uData |
Transformer | |
---|---|
piveau-consus-transforming-js |
Transforming data or metadata with JavaScript |
piveau-consus-transforming-xslt |
Transforming data or metadata with XSLT |
Exporter | |
---|---|
piveau-consus-exporting-hub |
Export metadata to the piveau hub |
The Scheduler#
Providing a Pipe#
Pipe definitions can be provided in two ways. Either from a git repository or from a file system.