You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

(I) Experiment of the necessity of TimeseriesMetadata

Unable to render Jira issues macro, execution error.

After we store TimeseriesMetadata together with ChunkMetadata, the necessity of TimeseriesMetadata needs to be reconsidered. We need some experiments for decision.

TimeseriesMetadata for Aggregation query and raw data query under different circumstances for one timeseries in one tsfile.


Each chunk has 100 points. Each query contains 500 TsFiles.

(1) with TimeseriesMetadata: origin TimeseriesMetadata

(2) without TimeseriesMetadata: TimeseriesMetadata has no statistics


Writing:

        String path =
            "/home/fit/szs/data/data/sequence/root.sg/0/"
                + chunkNum
                + "/test"
                + fileIndex
                + ".tsfile";
        File f = FSFactoryProducer.getFSFactory().getFile(path);
        if (f.exists()) {
          f.delete();
        }

        try (TsFileWriter tsFileWriter = new TsFileWriter(f)) {
          // only one timeseries
          tsFileWriter.registerTimeseries(
              new Path(Constant.DEVICE_PREFIX, Constant.SENSOR_1),
              new UnaryMeasurementSchema(Constant.SENSOR_1, TSDataType.INT64, TSEncoding.RLE));

          // construct TSRecord
          for (int i = 1; i <= chunkNum * 100; i++) {
            TSRecord tsRecord = new TSRecord(i, Constant.DEVICE_PREFIX);
            DataPoint dPoint1 = new LongDataPoint(Constant.SENSOR_1, i);
            tsRecord.addTuple(dPoint1);
            // write TSRecord
            tsFileWriter.write(tsRecord);
            if (i % 100 == 0) {
              tsFileWriter.flushAllChunkGroups();
            }
          }
        }


Raw data query:

for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) {
      // file path
      String path =
          "/home/fit/szs/data/data/sequence/root.sg/0/"
              + chunkNum
              + "/test"
              + fileIndex
              + ".tsfile";

      // raw data query
      try (TsFileSequenceReader reader = new TsFileSequenceReader(path);
          ReadOnlyTsFile readTsFile = new ReadOnlyTsFile(reader)) {

        ArrayList<Path> paths = new ArrayList<>();
        paths.add(new Path(DEVICE1, "sensor_1"));

        QueryExpression queryExpression = QueryExpression.create(paths, null);

        long startTime = System.nanoTime();
        QueryDataSet queryDataSet = readTsFile.query(queryExpression);
        while (queryDataSet.hasNext()) {
          queryDataSet.next();
        }

        costTime += (System.nanoTime() - startTime);
      }
    }


Aggregation query:

long totalStartTime = System.nanoTime();
    for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) {
      // file path
      String path =
          "/home/fit/szs/data/data/sequence/root.sg/0/"
              + chunkNum
              + "/test"
              + fileIndex
              + ".tsfile";

      // aggregation query
      try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
        Path seriesPath = new Path(DEVICE1, "sensor_1");
        long startTime = System.nanoTime();
        TimeseriesMetadata timeseriesMetadata = reader.readTimeseriesMetadata(seriesPath, false);
        long count = timeseriesMetadata.getStatistics().getCount();
        costTime += (System.nanoTime() - startTime);
      }
    }
    System.out.println(
        "Total raw read cost time: " + (System.nanoTime() - totalStartTime) / 1000_000 + "ms");
    System.out.println("Index area cost time: " + costTime / 1000_000 + "ms");



chunk number


1

2

3

5

8

10

15

20

25

raw

with timeseriesMetadata

overall cost time (ms)

210

230

237

250

276

297

309

344

374

index area time (ms)

116

131

142

156

185

197

220

255

282

without timeseriesMetadata

overall cost time (ms)


219

223

242

267

287

302

334

357

index area time (ms)


131

136

155

182

200

219

251

274

count(*)

with timeseriesMetadata

overall cost time (ms)

89

90

91

93

93

93

94

97

97

index area time (ms)

15

16

16

16

16

16

16

17

17

without timeseriesMetadata

overall cost time (ms)


122

123

127

127

127

127

128

130

index area time (ms)


50

50

50

50

51

52

52

53

Conclusion:

  1. Although the index area structure with no TimeseriesMetadata speeds up a little (about 5%) in raw data query,
    it reduces the speed a lot in aggregation query (about 30%). => We should reserve TimeseriesMetadata.
  2. The time cost does not change in the data area of TsFile.

(II) Experiment about combine Chunk and Page

Unable to render Jira issues macro, execution error.

Do we need Chunk and Page, or reserve one is ok?


  • No labels