You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

(I) Experiment of the necessity of TimeseriesMetadata

Unable to render Jira issues macro, execution error.

After we store TimeseriesMetadata together with ChunkMetadata, the necessity of TimeseriesMetadata needs to be reconsidered. We need some experiments for decision.

TimeseriesMetadata for Aggregation query and raw data query under different circumstances for one timeseries in one tsfile.


Each chunk has 100 points. Each query contains 500 TsFiles.

(1) with TimeseriesMetadata: origin TimeseriesMetadata

(2) without TimeseriesMetadata: TimeseriesMetadata has no statistics

And test query for 1 timeseries in TsFile which have 1 timeseries and 1000 timeseries seperately.


Writing:

        String path =
            "/home/fit/szs/data/data/sequence/root.sg/0/"
                + chunkNum
                + "/test"
                + fileIndex
                + ".tsfile";
        File f = FSFactoryProducer.getFSFactory().getFile(path);
        if (f.exists()) {
          f.delete();
        }

        try (TsFileWriter tsFileWriter = new TsFileWriter(f)) {
          // only one timeseries
          tsFileWriter.registerTimeseries(
              new Path(Constant.DEVICE_PREFIX, Constant.SENSOR_1),
              new UnaryMeasurementSchema(Constant.SENSOR_1, TSDataType.INT64, TSEncoding.RLE));

          // construct TSRecord
          for (int i = 1; i <= chunkNum * 100; i++) {
            TSRecord tsRecord = new TSRecord(i, Constant.DEVICE_PREFIX);
            DataPoint dPoint1 = new LongDataPoint(Constant.SENSOR_1, i);
            tsRecord.addTuple(dPoint1);
            // write TSRecord
            tsFileWriter.write(tsRecord);
            if (i % 100 == 0) {
              tsFileWriter.flushAllChunkGroups();
            }
          }
        }


Raw data query:

for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) {
      // file path
      String path =
          "/home/fit/szs/data/data/sequence/root.sg/0/"
              + chunkNum
              + "/test"
              + fileIndex
              + ".tsfile";

      // raw data query
      try (TsFileSequenceReader reader = new TsFileSequenceReader(path);
          ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader)) {

        ArrayList<Path> paths = new ArrayList<>();
        paths.add(new Path(DEVICE1, "sensor_1"));

        QueryExpression queryExpression = QueryExpression.create(paths, null);

        long startTime = System.nanoTime();
        QueryDataSet queryDataSet = readTsFile.query(queryExpression);
        while (queryDataSet.hasNext()) {
          queryDataSet.next();
        }

        costTime += (System.nanoTime() - startTime);
      }
    }


Aggregation query:

long totalStartTime = System.nanoTime();
    for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) {
      // file path
      String path =
          "/home/fit/szs/data/data/sequence/root.sg/0/"
              + chunkNum
              + "/test"
              + fileIndex
              + ".tsfile";

      // aggregation query
      try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
        Path seriesPath = new Path(DEVICE1, "sensor_1");
        long startTime = System.nanoTime();
        TimeseriesMetadata timeseriesMetadata = reader.readTimeseriesMetadata(seriesPath, false);
        long count = timeseriesMetadata.getStatistics().getCount();
        costTime += (System.nanoTime() - startTime);
      }
    }
    System.out.println(
        "Total raw read cost time: " + (System.nanoTime() - totalStartTime) / 1000_000 + "ms");
    System.out.println("Index area cost time: " + costTime / 1000_000 + "ms");


1 timeseries in one tsfile:

chunk number


1

2

3

5

8

10

15

20

25

raw

with timeseriesMetadata

overall cost time (ms)

210

230

237

250

276

297

309

344

374

index area time (ms)

116

131

142

156

185

197

220

255

282

without timeseriesMetadata

overall cost time (ms)


219

223

242

267

287

302

334

357

index area time (ms)


131

136

155

182

200

219

251

274

count(*)

with timeseriesMetadata

overall cost time (ms)

89

90

91

93

93

93

94

97

97

index area time (ms)

15

16

16

16

16

16

16

17

17

without timeseriesMetadata

overall cost time (ms)


122

123

127

127

127

127

128

130

index area time (ms)


50

50

50

50

51

52

52

53

1000 timeseries in one tsfile: (query for 1 timeseries as well)

chunk number


1

235810152025

raw

with timeseriesMetadata

overall cost time (ms)

421

478550673910998139416371966
index area time (ms)

274

332403528763853124914961795

without timeseriesMetadata

overall cost time (ms)


4895376729031010137116501938
index area time (ms)


340393528758864123215111789

count(*)

with timeseriesMetadata

overall cost time (ms)

260

271290331399397562609647
index area time (ms)

133

142158197265267427472513

without timeseriesMetadata

overall cost time (ms)


307326359428447583620713
index area time (ms)


177195227296315447486553

Conclusion:

  1. Although the index area structure with no TimeseriesMetadata speeds up a little in raw data query,
    it reduces the speed a lot in aggregation query. => We should reserve TimeseriesMetadata.
  2. The time cost does not change in the data area of TsFile.

(II) Experiment about combine Chunk and Page

Unable to render Jira issues macro, execution error.

Do we need Chunk and Page, or reserve one is ok?


How many points can a chunk have when chunk size = 64K, 1M, 2M, 3M, and 4M?

(1) Write one timeseries in one TsFile, with long data type , random data.

(2) And adjust the number of points  by the size of chunk.

      try (TsFileWriter tsFileWriter = new TsFileWriter(f)) {
        // only one timeseries
        tsFileWriter.registerTimeseries(
            new Path(Constant.DEVICE_PREFIX, Constant.SENSOR_1),
            new UnaryMeasurementSchema(Constant.SENSOR_1, TSDataType.INT64, TSEncoding.RLE));

        // construct TSRecord
        for (int i = 1; i <= 7977; i++) { // change here
          TSRecord tsRecord = new TSRecord(i, Constant.DEVICE_PREFIX);
          DataPoint dPoint1 = new LongDataPoint(Constant.SENSOR_1, random.nextLong());
          tsRecord.addTuple(dPoint1);
          // write TSRecord
          tsFileWriter.write(tsRecord);
        }
      }


Here are the results:

chunk size

~64K

~1M

~2M

~3M

~4M

points number

7,977

125,000

260,000

390,000

520,000

page number

1

16

32

49

66

page size (uncompressed)

65398
=63.86K

65398
=63.86K

65398
=63.86K

65398
=63.86K

65398
=63.86K

page size (compressed)

64275
=62.77K

64275
=62.77K

64275
=62.77K

64275
=62.77K

64275
=62.77K


Discuss the scenarios below: (only one timeseries)

1. For a scenario that generates 5 data points per second. (one chunk one day) (5Hz frequency)

One day will generate 432,000 points (about 54 pages). Therefore, 1 chunk has 54 pages (about 3.4M).

2. For a scenario that generates one data point per second. (one chunk one day) (1Hz frequency)

One day will generate 86,400 points (about 11 pages). Therefore, 1 chunk has 11 pages (about 693K). 

3. For a scenario that generates 5 data points per minute. (one chunk one day) (1/12Hz frequency)

One day will generate 7200 points (about 1 pages). Therefore, 1 chunk has 1 page (about 56.6K).

4. For a scenario that generates one data point per minute. (one chunk one week) (1/60Hz frequency)

One week will generate 10080 points (about 1.3 pages). Therefore, 1 chunk has 1~2 pages (about 79.3K).


Reserve both chunk and page: (3 levels of index in the whole TsFile)

  • Chunk is the unit for I/O and page is the unit for query, which could supply multiple levels of index
  • Suitable for all kinds of query scenarios, whether aggregation query or raw data query

Reserve only page: (2 levels of index in the wholeTsFile)

  • Suitable for low frequency scenario, in which 1 chunk has only 1~2 pages
    (Note: There has already been adapted structure for chunk statistics and page statistics from 0.12)
  • Simple structure, which could reduce one level of index


(III) Experiment about how to store PageHeader

Unable to render Jira issues macro, execution error.



  • No labels