You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

When using the Data Lake sink, the incoming events are stored in an InfluxDB.

Implementation

The concrete implementation comprises a DataLake class, a DataLake Controller class, a DataLake InfluxDBClient class and a DataLake Parameters class. The code is basically the same as for the InfluxDB sink (org.apache.streampipes.sinks.databases.jvm.influxdb).

DataLake Parameters Class

The parameter class defines the necessary parameters for the configuration of the sink.

  • influxDbHost: hostname/URL of the InfluxDB instance. (including http(s)://)
  • influxDbPort: port of the InfluxDB instance
  • databaseName: name of the database where events will be stored
  • measureName: name of the Measurement where events will be stored (will be created if it does not exist)
  • user: username for the InfluxDB server
  • password: password for the InfluxDB Server
  • timestampField: field which contains the required timestamp (field type = http://schema.org/DateTime)

  • batchSize: indicates how many events are written into a buffer, before they are written to the database

  • flushDuration: maximum waiting time for the buffer to fill the Buffer size before it will be written to the database in ms
  • dimensionProperties: list containing the tag fields (scope = dimension property)

DataLake Controller Class

In controller class, the model is declared for viewing and configuration in Pipeline Editor, and initializes sink on invocation of pipeline.

The measurement name and the timestamp field are derived from user input, the remaining parameters (except batch size and flush duration) from org.apache.streampipes.sinks.internal.jvm.config.SinksInternalJvmConfig. Batch size is fixed to 2000 events and flush duration is set to 500ms.

DataLake Class



References:

  • No labels