You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

Currently we are inconsistent with the used data types. The question is: How can we ensure to have a clean data type concept in StreamPipes?

Which data types do we want to support?

Often the data sources have different supported data types, e.g. PLCs, OPC UA, csv files, Json,....

The challenge is to combine them all to ensure that the data is processed as expected.

Since most of the computation is performed within Java, the goal is to support the basic data types of Java:

Supported data types:

- Boolean
- Integer
- Double
- Float
- Long
- String

Common Problems

Here is a list of currently occurring  problems that must be considered when designing a solution:

1.Different numerical data types

  • E.g.:
    • MQTT data stream sends {"timestamp": 1660791892, "temperature": 10} … {"timestamp": 1, "temperature": 11.2}
    • CSV File set:
timestamptemperature

1660791892

10
166079289210
166079389210.1
  • What is the datatype for temperature of the event schema?
  • If the adapter defines integer, then there will be a problem for all float values that come later on
  • How can this be resolved?
    • As a default select always float for the datatype
    • Cast all float value to int
  • The problem currently is that the schema guessing uses the values of the first event to derive the data type


2.  Timestamps

  • Timestamps are usually have type long

3.  A user defines the data type of the property to integer

  • Processing elements with a keep strategy might change the numerical data type
    • E.g. Math Operator
  • During runtime the value is transformed to float
  • This results in an error in the data lake

Selected Solution

Schema guessing always uses Float as default data type: 

  • If a user wants a different data type, it must be manually changed in the UI 
  • When float or double is selected
    • Those two value options are possible during runtime (value=10 or value=10.0)
  • When selecting integer for the data type
    • Then the adapter must ensure that an integer is sent (value=10.2 must be transformed into value=10)
  • This approach solves Problem 1 & Problem 2
  • Solution of Problem 3:
    • This is currently only handled in the in the DataLake sink.
    • If a numerical value can not be casted to integer or long, it is changed to a float before it is stored

Alternative Solution

Only floats are supported as data types for numerical measurements

A user is not able to manually change the data type. That means the adapter is responsible to transform the value on ingestion.

This results, that the processing elements are certain that the numeric value is a float and no special cases must be considered.

With this approach all of the three listed problems would be solved.

  • No labels