...
(2) without TimeseriesMetadata: TimeseriesMetadata has no statistics
And test query for 1 timeseries in TsFile which have 1 timeseries and 1000 timeseries seperately.
Writing:
Code Block |
---|
String path = "/home/fit/szs/data/data/sequence/root.sg/0/" + chunkNum + "/test" + fileIndex + ".tsfile"; File f = FSFactoryProducer.getFSFactory().getFile(path); if (f.exists()) { f.delete(); } try (TsFileWriter tsFileWriter = new TsFileWriter(f)) { // only one timeseries tsFileWriter.registerTimeseries( new Path(Constant.DEVICE_PREFIX, Constant.SENSOR_1), new UnaryMeasurementSchema(Constant.SENSOR_1, TSDataType.INT64, TSEncoding.RLE)); // construct TSRecord for (int i = 1; i <= chunkNum * 100; i++) { TSRecord tsRecord = new TSRecord(i, Constant.DEVICE_PREFIX); DataPoint dPoint1 = new LongDataPoint(Constant.SENSOR_1, i); tsRecord.addTuple(dPoint1); // write TSRecord tsFileWriter.write(tsRecord); if (i % 100 == 0) { tsFileWriter.flushAllChunkGroups(); } } } |
...
Code Block |
---|
long totalStartTime = System.nanoTime(); for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) { // file path String path = "/home/fit/szs/data/data/sequence/root.sg/0/" + chunkNum + "/test" + fileIndex + ".tsfile"; // aggregation query try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) { Path seriesPath = new Path(DEVICE1, "sensor_1"); long startTime = System.nanoTime(); TimeseriesMetadata timeseriesMetadata = reader.readTimeseriesMetadata(seriesPath, false); long count = timeseriesMetadata.getStatistics().getCount(); costTime += (System.nanoTime() - startTime); } } System.out.println( "Total raw read cost time: " + (System.nanoTime() - totalStartTime) / 1000_000 + "ms"); System.out.println("Index area cost time: " + costTime / 1000_000 + "ms"); |
1 timeseries in one tsfile:
chunk number | 1 | 2 | 3 | 5 | 8 | 10 | 15 | 20 | 25 | ||
raw | with timeseriesMetadata | overall cost time (ms) | 210 | 230 | 237 | 250 | 276 | 297 | 309 | 344 | 374 |
index area time (ms) | 116 | 131 | 142 | 156 | 185 | 197 | 220 | 255 | 282 | ||
without timeseriesMetadata | overall cost time (ms) | 219 | 223 | 242 | 267 | 287 | 302 | 334 | 357 | ||
index area time (ms) | 131 | 136 | 155 | 182 | 200 | 219 | 251 | 274 | |||
count(*) | with timeseriesMetadata | overall cost time (ms) | 89 | 90 | 91 | 93 | 93 | 93 | 94 | 97 | 97 |
index area time (ms) | 15 | 16 | 16 | 16 | 16 | 16 | 16 | 17 | 17 | ||
without timeseriesMetadata | overall cost time (ms) | 122 | 123 | 127 | 127 | 127 | 127 | 128 | 130 | ||
index area time (ms) | 50 | 50 | 50 | 50 | 51 | 52 | 52 | 53 |
1000 timeseries in one tsfile: (query for 1 timeseries as well)
chunk number | 1 | 2 | 3 | 5 | 8 | 10 | 15 | 20 | 25 | ||
raw | with timeseriesMetadata | overall cost time (ms) | 421 | 478 | 550 | 673 | 910 | 998 | 1394 | 1637 | 1966 |
index area time (ms) | 274 | 332 | 403 | 528 | 763 | 853 | 1249 | 1496 | 1795 | ||
without timeseriesMetadata | overall cost time (ms) | 489 | 537 | 672 | 903 | 1010 | 1371 | 1650 | 1938 | ||
index area time (ms) | 340 | 393 | 528 | 758 | 864 | 1232 | 1511 | 1789 | |||
count(*) | with timeseriesMetadata | overall cost time (ms) | 260 | 271 | 290 | 331 | 399 | 397 | 562 | 609 | 647 |
index area time (ms) | 133 | 142 | 158 | 197 | 265 | 267 | 427 | 472 | 513 | ||
without timeseriesMetadata | overall cost time (ms) | 307 | 326 | 359 | 428 | 447 | 583 | 620 | 713 | ||
index area time (ms) | 177 | 195 | 227 | 296 | 315 | 447 | 486 | 553 |
Conclusion:
- Although the index area structure with no TimeseriesMetadata speeds up a little (about 5%) in raw data query,
it reduces the speed a lot in aggregation query (about 30%). => We should reserve TimeseriesMetadata. - The time cost does not change in the data area of TsFile.
...