Data Lake Sink

When using the Data Lake sink, the incoming events are stored in an InfluxDB.

Implementation

The concrete implementation comprises a DataLake class, a DataLake Controller class, a DataLake InfluxDBClient class and a DataLake Parameters class. The code is basically the same as for the InfluxDB sink (org.apache.streampipes.sinks.databases.jvm.influxdb).

DataLake Parameters Class

The parameter class defines the necessary parameters for the configuration of the sink.

influxDbHost: hostname/URL of the InfluxDB instance. (including http(s)://)
influxDbPort: port of the InfluxDB instance
databaseName: name of the database where events will be stored
measureName: name of the Measurement where events will be stored (will be created if it does not exist)
user: username for the InfluxDB server
password: password for the InfluxDB Server
timestampField: field which contains the required timestamp (field type = http://schema.org/DateTime)
batchSize: indicates how many events are written into a buffer, before they are written to the database
flushDuration: maximum waiting time for the buffer to fill the Buffer size before it will be written to the database in ms
dimensionProperties: list containing the tag fields (scope = dimension property)

DataLake Controller Class

In controller class, the model is declared for viewing and configuration in Pipeline Editor, and initializes sink on invocation of pipeline.

The measurement name and the timestamp field are derived from user input, the remaining parameters (except batch size and flush duration) from org.apache.streampipes.sinks.internal.jvm.config.SinksInternalJvmConfig. Batch size is fixed to 2000 events and flush duration is set to 500ms.

DataLake Class

References:

Apache StreamPipes Documentation: https://streampipes.apache.org/docs/docs/pe/org.apache.streampipes.sinks.internal.jvm.datalake/

Page tree

Data Lake Sink

Implementation

DataLake Parameters Class

DataLake Controller Class

DataLake Class