You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 32 Next »

When using the Data Lake sink, the incoming events are stored in an InfluxDB.

Implementation

org.apache.streampipes.sinks.internal.jvm.datalake

The concrete implementation comprises a Data Lake class, a Data Lake Controller class, a Data Lake InfluxDB Client class and a Data Lake Parameters class. The code is basically the same as for the InfluxDB sink (org.apache.streampipes.sinks.databases.jvm.influxdb).

Data Lake Parameters Class

The parameter class defines the necessary parameters for the configuration of the sink.

parameterdescription

influxDbHost

hostname/URL of the InfluxDB instance. (including http(s)://)
influxDbPortport of the InfluxDB instance
databaseNamename of the database where events will be stored
measureNamename of the Measurement where events will be stored (will be created if it does not exist)
userusername for the InfluxDB server
passwordpassword for the InfluxDB server
timestampFieldfield which contains the required timestamp (field type = http://schema.org/DateTime)
batchSizeindicates how many events are written into a buffer, before they are written to the database
flushDurationmaximum waiting time for the buffer to fill the Buffer size before it will be written to the database in ms
dimensionPropertieslist containing the tag fields (scope = dimension property)

Data Lake Controller Class

In controller class, the model is declared for viewing and configuration in Pipeline Editor, and initializes sink on invocation of pipeline.

The measurement name and the timestamp field are derived from user input, the remaining parameters (except batch size and flush duration) from org.apache.streampipes.sinks.internal.jvm.config.SinksInternalJvmConfig. Batch size is fixed to 2000 events and flush duration is set to 500 ms.

Data Lake Class

The data lake class itself essentially controls the saving of events to the database. For this purpose, it uses the Data Lake InfluxDB Client.

method namedescription
onInvocation

starting the DataLakeInfluxDbClient, registering and initializing new measurement series in InfluxDB

onEventadding empty label field to incoming event and storing event in database
onDetachstopping the DataLakeInfluxDbClient

Image data, unlike events, is not stored directly in database but as Image files in a corresponding directory (writeToImageFile).
In addition, the class contains two utility methods (registerAtDataLake and prepareString)

Data Lake InfluxDB Client Class

Client class that connects to InfluxDB and writes events directly to database. Uses the Data Lake Parameters described above.

method namedescription
validate

checks whether the influxDbHost is valid

connectconnects to the InfluxDB server, sets the database and initializes the batch-behaviour
databaseExistschecks whether the given database exists
createDatabasecreates a new database with the given name
savesaves an event to the connnected InfluxDB database
stopshuts down the connection to the InfluxDB server




References:

  • No labels