Currently we are inconsistent with the used data types. The question is: How can we ensure to have a clean data type concept in StreamPipes?

Which data types do we want to support?

Often the data sources have different supported data types, e.g. PLCs, OPC UA, csv files, Json,....

The challenge is to combine them all to ensure that the data is processed as expected.

Since most of the computation is performed within Java, the goal is to support the basic data types of Java:

Supported data types:

- Boolean
- Integer
- Double
- Float
- Long
- String

Common Problems

Here is a list of currently occurring  problems that must be considered when designing a solution:

1.Different numerical data types

  • E.g.:
    • MQTT data stream sends {"timestamp": 1660791892, "temperature": 10} … {"timestamp": 1660792892, "temperature": 11.2}
    • CSV File set:
timestamptemperature

1660791892

10
166079289210
166079389210.1
  • What is the datatype for temperature of the event schema?
  • If the adapter defines integer, then there will be a problem for all float values that come later on
  • How can this be resolved?
    • Option 1: As a default select always float for the datatype
    • Option 2: Cast all float value to int
  • The schema guessing uses the values of the first event to derive the data type. Sometimes if the value has only zeros after the decimal point it is not included, so the system can not know it is a float value.

2.  Timestamps

  • Timestamps are usually have type long
  • They should not be floats

3.  A user defines the data type of the property to integer

  • Processing elements with a keep strategy might change the numerical data type
    • E.g. Math Operator
  • During runtime the value is transformed to float
  • This results in an error in the data lake


Please feel free to extend the problem lists if you find further issues.

Currently Selected Solution Strategy

Schema guessing always uses Float as default data type: 

  • If a user wants a different data type, it must be manually changed in the UI 
  • When the data type is changed to float or double, there are options during runtime (value=10 or value=10.0).
    • Both should work without any problems.
  • When selecting integer for the data type:
    • The adapter must ensure that an integer is sent (value=10.2 must be rounded to value=10)
  • This approach solves Problem 1 & Problem 2
  • Solution of Problem 3:
    • This is currently only handled in the in the DataLake Sink.
    • If a numerical value can not be casted to integer or long, it is changed to a float before it is stored

Alternative Solution

Only floats are supported as data types for numerical measurements. This means we would lose the other numerical data types.

Also the user would not be able to manually change the data type. The adapter is responsible to transform the value on ingestion and ensure that all values are floats.

Therefore, the processing elements are certain that the numeric value is a float and no special cases must be considered.

With this approach all of the three listed problems would be solved.

  • No labels