Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
long totalStartTime = System.nanoTime();
    for (int fileIndex = 0; fileIndex < fileNum; fileIndex++) {
      // file path
      String path =
          "/home/fit/szs/data/data/sequence/root.sg/0/"
              + chunkNum
              + "/test"
              + fileIndex
              + ".tsfile";

      // aggregation query
      try (TsFileSequenceReader reader = new TsFileSequenceReader(path)) {
        Path seriesPath = new Path(DEVICE1, "sensor_1");
        long startTime = System.nanoTime();
        TimeseriesMetadata timeseriesMetadata = reader.readTimeseriesMetadata(seriesPath, false);
        long count = timeseriesMetadata.getStatistics().getCount();
        costTime += (System.nanoTime() - startTime);
      }
    }
    System.out.println(
        "Total raw read cost time: " + (System.nanoTime() - totalStartTime) / 1000_000 + "ms");
    System.out.println("Index area cost time: " + costTime / 1000_000 + "ms");



chunk number


1

2

3

5

8

10

15

20

25

raw

with timeseriesMetadata

overall cost time (ms)

210

230

237

250

276

297

309

344

374

index area time (ms)

116

131

142

156

185

197

220

255

282

without timeseriesMetadata

overall cost time (ms)


219

223

242

267

287

302

334

357

index area time (ms)


131

136

155

182

200

219

251

274

count(*)

with timeseriesMetadata

overall cost time (ms)

89

90

91

93

93

93

94

97

97

index area time (ms)

15

16

16

16

16

16

16

17

17

without timeseriesMetadata

overall cost time (ms)


122

123

127

127

127

127

128

130

index area time (ms)


50

50

50

50

51

52

52

53


Conclusion:

  1. Although the index area structure with no TimeseriesMetadata speeds up a little (about 5%) in raw data query,
    it reduces the speed a lot in aggregation query (about 30%). => We should reserve TimeseriesMetadata.
  2. The time cost does not change in the data area of TsFile.

...

Do we need Chunk and Page, or reserve one is ok?


How many points can a chunk have when chunk size = 64K, 1M, 2M, 3M, and 4M?

chunk size

~64K

~1M

~2M

~3M

~4M

points number

7,977

125,000

260,000

390,000

520,000

page number

1

16

32

49

66

page size (uncompressed)

65398
=63.86K

65398
=63.86K

65398
=63.86K

65398
=63.86K

65398
=63.86K

page size (compressed)

64275
=62.77K

64275
=62.77K

64275
=62.77K

64275
=62.77K

64275
=62.77K


(III) Experiment about how to store PageHeader

Jira
serverASF JIRA
serverId5aa69414-a9e9-3523-82ec-879b028fb15b
keyIOTDB-1833