View file | ||||
---|---|---|---|---|
|
测试基本参数
对应JIRA:https://issues.apache.org/jira/browse/IOTDB-1628
...
2021-11-03 00:14:51,510 [pool-15-IoTDB-FlushTask-Submit-Pool-1] INFO o.a.i.d.e.s.TsFileProcessor:615 - Memtable PrimitiveMemTable{planIndex=[949947,950649]} has been added to flushing list
2021-11-03 00:14:51,506 [DataClientThread-131] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:736,sizeToReserveForConfig:1000
2021-11-03 00:14:51,512 [pool-15-IoTDB-FlushTask-Submit-Pool-1] INFO o.a.i.d.e.s.StorageGroupProcessor:1257 - close a sequence tsfile processor root.test.g_7-0
2021-11-03 00:14:51,512 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): Before compaction index 949958-950698, compactIndex 949959, removeSize 2, committedLogSize 738, maxAppliedLog 950693
2021-11-03 00:14:51,512 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): After compaction index 949960-950698, committedLogSize 736
2021-11-03 00:14:51,512 [pool-3-IoTDB-Flush-3] INFO o.a.i.d.e.s.TsFileProcessor:995 - Ended file /data/cluster/first-rest-test/apache-iotdb/sbin/../data/data/sequence/root.test.g_15/0/0/1635869637646-459-0-0.tsfile
2021-11-03 00:14:51,519 [pool-3-IoTDB-Flush-3] INFO o.a.i.d.e.s.StorageGroupProcessor:1997 - signal closing storage group condition in root.test.g_15-0
2021-11-03 00:14:51,519 [pool-3-IoTDB-Flush-3] INFO o.a.i.d.e.s.TsFileProcessor:1009 - Storage group root.test.g_15/0 close the file /data/cluster/first-rest-test/apache-iotdb/sbin/../data/data/sequence/root.test.g_15/0/0/1635869637646-459-0-0.tsfile, TsFile size is 52079411, time consumption of flushing metadata is 708ms
2021-11-03 00:14:51,519 [pool-3-IoTDB-Flush-3] INFO o.a.i.d.e.f.MemTableFlushTask:88 - The memTable size of SG root.test.g_7/0 is 170240000, the avg series points num in chunk is 88, total timeseries number is 120000
2021-11-03 00:14:51,510 [ClusterClient-56] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:735,sizeToReserveForConfig:1000
2021-11-03 00:14:51,530 [ClusterClient-56] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505128-505869, compactIndex 505130, removeSize 3, committedLogSize 738, maxAppliedLog 505854
2021-11-03 00:14:51,530 [ClusterClient-56] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505131-505869, committedLogSize 735
2021-11-03 00:14:51,531 [DataClientThread-243] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1998,sizeToReserveForNew:736,sizeToReserveForConfig:1000
2021-11-03 00:14:51,531 [DataClientThread-243] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9): Before compaction index 1239623-1241878, compactIndex 1239622, removeSize 1476, committedLogSize 2212, maxAppliedLog 1239623
2021-11-03 00:14:51,532 [DataClientThread-243] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9): After compaction index 1239623-1241878, committedLogSize 2212
2021-11-03 00:14:51,538 [ClusterClient-58] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,538 [ClusterClient-58] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505131-505871, compactIndex 505131, removeSize 1, committedLogSize 738, maxAppliedLog 505854
2021-11-03 00:14:51,538 [ClusterClient-58] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505132-505871, committedLogSize 737
2021-11-03 00:14:51,541 [Data(172.20.70.7:9003)-root.test.g_16] INFO o.a.i.d.w.n.ExclusiveWriteLogNode:127 - WAL BufferOverflow !
2021-11-03 00:14:51,547 [Data(172.20.70.8:9005)-root.test.g_17] INFO o.a.i.d.w.n.ExclusiveWriteLogNode:127 - WAL BufferOverflow !
2021-11-03 00:14:51,559 [ClusterClient-59] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,559 [ClusterClient-59] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505132-505871, compactIndex 505132, removeSize 1, committedLogSize 738, maxAppliedLog 505855
2021-11-03 00:14:51,559 [ClusterClient-59] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505133-505871, committedLogSize 737
2021-11-03 00:14:51,562 [ClusterClient-38] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,562 [ClusterClient-38] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505133-505871, compactIndex 505133, removeSize 1, committedLogSize 738, maxAppliedLog 505855
2021-11-03 00:14:51,562 [ClusterClient-38] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505134-505871, committedLogSize 737
2021-11-03 00:14:51,588 [ClusterClient-47] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,588 [ClusterClient-47] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505134-505872, compactIndex 505134, removeSize 1, committedLogSize 738, maxAppliedLog 505856
2021-11-03 00:14:51,588 [ClusterClient-47] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505135-505872, committedLogSize 737
2021-11-03 00:14:51,596 [DataClientThread-131] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,596 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): Before compaction index 949960-950699, compactIndex 949960, removeSize 1, committedLogSize 738, maxAppliedLog 950697
2021-11-03 00:14:51,596 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): After compaction index 949961-950699, committedLogSize 737
2021-11-03 00:14:51,682 [ClusterClient-25] WARN o.a.i.c.s.m.RaftMember:950 - Data(172.20.70.9:9007): Failed to synchronize with the leader after 20223ms
2021-11-03 00:14:51,690 [ClusterClient-25] INFO o.a.i.d.s.TSServiceImpl:812 - Cost: 43063 ms, sql is SELECT count(s_2292) FROM root.test.g_8.d_28 WHERE time >= 1633017602400 AND time <= 1633017852400 AND root.test.g_8.d_28.s_2292 > -5
2021-11-03 00:14:51,700 [ClusterClient-25] WARN o.a.i.d.s.TSServiceImpl:1926 - Error occurred in query process:
org.apache.iotdb.db.exception.StorageEngineException: org.apache.iotdb.cluster.exception.CheckConsistencyException: check consistency failed, error message=mid consistency, localAppliedId is smaller than the leaderCommitId
at org.apache.iotdb.cluster.query.reader.ClusterReaderFactory.getReaderByTimestamp(ClusterReaderFactory.java:1026)
at org.apache.iotdb.cluster.query.reader.ClusterReaderFactory.getSeriesReaderByTime(ClusterReaderFactory.java:161)
at org.apache.iotdb.cluster.query.reader.ClusterReaderFactory.getReaderByTimestamp(ClusterReaderFactory.java:130)
at org.apache.iotdb.cluster.query.aggregate.ClusterAggregateExecutor.getReaderByTime(ClusterAggregateExecutor.java:95)
at org.apache.iotdb.db.query.executor.AggregationExecutor.executeWithValueFilter(AggregationExecutor.java:357)
at org.apache.iotdb.db.query.executor.QueryRouter.aggregate(QueryRouter.java:137)
at org.apache.iotdb.db.qp.executor.PlanExecutor.processDataQuery(PlanExecutor.java:538)
at org.apache.iotdb.cluster.query.ClusterPlanExecutor.processQuery(ClusterPlanExecutor.java:101)
at org.apache.iotdb.db.service.TSServiceImpl.createQueryDataSet(TSServiceImpl.java:1098)
at org.apache.iotdb.db.service.TSServiceImpl.internalExecuteQueryStatement(TSServiceImpl.java:729)
at org.apache.iotdb.db.service.TSServiceImpl.executeQueryStatement(TSServiceImpl.java:627)
at org.apache.iotdb.service.rpc.thrift.TSIService$Processor$executeQueryStatement.getResult(TSIService.java:2423)
at org.apache.iotdb.service.rpc.thrift.TSIService$Processor$executeQueryStatement.getResult(TSIService.java:2403)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:38)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:38)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:248)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.iotdb.cluster.exception.CheckConsistencyException: check consistency failed, error message=mid consistency, localAppliedId is smaller than the leaderCommitId
at org.apache.iotdb.cluster.exception.CheckConsistencyException.<clinit>(CheckConsistencyException.java:34)
at org.apache.iotdb.cluster.server.member.RaftMember$MidCheckConsistency.postCheckConsistency(RaftMember.java:810)
at org.apache.iotdb.cluster.server.member.RaftMember.waitUntilCatchUp(RaftMember.java:909)
at org.apache.iotdb.cluster.server.member.RaftMember.syncLeader(RaftMember.java:858)
at org.apache.iotdb.cluster.server.member.RaftMember.syncLeaderWithConsistencyCheck(RaftMember.java:765)
at org.apache.iotdb.cluster.query.reader.ClusterReaderFactory.getReaderByTimestamp(ClusterReaderFactory.java:1024)
... 18 common frames omitted
2021-11-03 00:14:51,703 [ClusterClient-10] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,703 [ClusterClient-10] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505135-505873, compactIndex 505135, removeSize 1, committedLogSize 738, maxAppliedLog 505867
2021-11-03 00:14:51,703 [ClusterClient-10] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505136-505873, committedLogSize 737
2021-11-03 00:14:51,704 [DataClientThread-131] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:736,sizeToReserveForConfig:1000
2021-11-03 00:14:51,704 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): Before compaction index 949961-950701, compactIndex 949962, removeSize 2, committedLogSize 738, maxAppliedLog 950697
2021-11-03 00:14:51,704 [DataClientThread-131] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.8, metaPort:9005, nodeIdentifier:2041880328, dataPort:40005, clientPort:6667, clientIp:172.20.70.8): After compaction index 949963-950701, committedLogSize 736
2021-11-03 00:14:51,710 [Data(172.20.70.8:9005)-root.test.g_7] INFO o.a.i.d.e.s.TsFileProcessor:149 - create a new tsfile processor /data/cluster/first-rest-test/apache-iotdb/sbin/../data/data/sequence/root.test.g_7/0/0/1635869691710-578-0-0.tsfile
2021-11-03 00:14:51,713 [Data(172.20.70.9:9007)-root.test.g_8] WARN o.a.i.c.l.a.DataLogApplier:149 - [ApplyInsert] applyPhysicalPlan slowly for 21319ms
2021-11-03 00:14:51,713 [Data(172.20.70.9:9007)-root.test.g_8] WARN o.a.i.c.l.a.AsyncDataLogApplier:208 - [ApplyInternal] Data(172.20.70.9:9007):apply log InsertTabletPlan {deviceId:root.test.g_8.d_88, timesRange[1633037436200,1633037438000]},term:1,index:1239624 slowly for 21320ms
2021-11-03 00:14:51,713 [check-log-applier-Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9)] WARN o.a.i.c.l.m.RaftLogManager:960 - Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9):wait log InsertTabletPlan {deviceId:root.test.g_8.d_88, timesRange[1633037436200,1633037438000]},term:1,index:1239624 to apply slowly for 21319ms
2021-11-03 00:14:51,717 [Data(172.20.70.8:9005)-root.test.g_7] INFO o.a.i.d.w.n.ExclusiveWriteLogNode:90 - create the WAL folder /data/cluster/first-rest-test/apache-iotdb/sbin/../data/wal/root.test.g_7/0-1635869691710-578-0-0.tsfile.
2021-11-03 00:14:51,717 [Data(172.20.70.8:9005)-root.test.g_7] INFO o.a.i.d.c.IoTDBThreadPoolFactory:87 - new single thread pool: Flush-WAL-Thread-root.test.g_7/0-1635869691710-578-0-0.tsfile
2021-11-03 00:14:51,750 [DataClientThread-30] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1999,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,750 [DataClientThread-30] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9): Before compaction index 1239623-1241879, compactIndex 1239625, removeSize 1477, committedLogSize 2214, maxAppliedLog 1239626
2021-11-03 00:14:51,750 [DataClientThread-30] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.9, metaPort:9007, nodeIdentifier:2041883899, dataPort:40007, clientPort:6667, clientIp:172.20.70.9): After compaction index 1239626-1241879, committedLogSize 2211
2021-11-03 00:14:51,840 [ClusterClient-67] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
2021-11-03 00:14:51,857 [ClusterClient-67] INFO o.a.i.c.l.m.RaftLogManager:873 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): Before compaction index 505136-505874, compactIndex 505136, removeSize 1, committedLogSize 738, maxAppliedLog 505873
2021-11-03 00:14:51,857 [ClusterClient-67] INFO o.a.i.c.l.m.RaftLogManager:887 - Node(internalIp:172.20.70.7, metaPort:9003, nodeIdentifier:2041893354, dataPort:40003, clientPort:6667, clientIp:172.20.70.7): After compaction index 505137-505874, committedLogSize 737
2021-11-03 00:14:51,860 [Data(172.20.70.9:9007)-root.test.g_8] INFO o.a.i.d.w.n.ExclusiveWriteLogNode:127 - WAL BufferOverflow !
2021-11-03 00:14:51,922 [DataClientThread-314] ERROR o.a.i.c.l.m.RaftLogManager:615 - Trigger compact:numToReserveForNew:1000,sizeToReserveForNew:737,sizeToReserveForConfig:1000
第三次测试添加的日志代码
第四次测试(排除仅剩的没有日志的可能)
- 被测试分支:cluster_sync_leader_bug(stormbroken仓库中),commitID6c0425e
- 启动时间2021年11月04日10点49分42秒
- 7号节点
- 首次出现mid consistency报错:2021年11月04日11点22分16秒,持续到2021年11月04日11点52分17秒。
- 出现OOM的报错:11月04日11点06分24秒,之后大量出现OOM
- 在启动后2分钟开始间歇性出现Failed to process no query InsertTabletPlan报错,之后11点22分16秒开始出现第一个mid consistency(往往会在之前伴随一个Failed to synchronize with leader after 20000+ms),在11点29分02秒出现第一个Current query is time out xxx,之后间歇性出现mid consistency,下午16点37分26点开始出现集群节点失联。
- 查看全部日志发现Failed to process no query plan附近出现Raft log buffer overflow和WAL BufferOverflow。
- 8号节点
- 没有出现mid consistency报错
- 没有出现OOM报错
- 9号节点
- 没有出现mid consistency报错
- 出现OOM的报错:11月04日16点45分51秒,之后大量出现OOM
- 7号节点
- 测试结果:新增的更细致的日志没有出现,存在部分单机版写入较慢的情况,但是具体耗时也不太像是在GC上消耗。
- 所有新添加的日志没有出现.....g
部分日志截图
同时对应时间段服务器的相关监控信息
使用Zabbix查看具体的情况,网址http://172.20.70.11/zabbix.php?name=&ip=&dns=&port=&status=-1&evaltype=0&tags%5B0%5D%5Btag%5D=&tags%5B0%5D%5Boperator%5D=0&tags%5B0%5D%5Bvalue%5D=&maintenance_status=1&filter_name=&filter_show_counter=0&filter_custom_time=0&sort=name&sortorder=ASC&show_suppressed=0&action=host.view,用户Admin,密码iotdb2019
...
经过大量尝试和最终确认,初步断定是由于过长时间的DataSnapshot导致的分布式crash。
第六次测试(定位测试,针对snapshot添加了大量的日志)
- 被测试分支:cluster_sync_leader_bug(stormbroken仓库中),commitID ce9d0992
- 经过一段时间后,复现了mid consistency,基本确定是snapshot出现的问题。
- 本次测试基本配置为3节点3副本
- 7号节点
- 2022.1.4 22:37:04 开始出现LogCatchUp超时,共计31次,最高452.2s(2022.1.4 23:17:38),多数集中在10s以上
- 2022.1.4 22:54:39 doSnapshot 耗时 15.6s
- 2022.1.4 22:55:47 发送Snapshot 耗时19.0s,执行SnapShotCatchUp 耗时19.8s,Call Snapshot CatchUp 耗时 19.8s
- 2022 1.4 23:16:53 出现OOM:Java heap space
- 2022 1.4 23:50:34 出现OOM: GC overhead limit exceeded
- 8号节点
- 2022.1.4 22:42:34 出现LogCatchUp超时,耗时350.3s
- 2022 1.4 22:54:41 - 2022 1.4 23:02:16 出现OOM,反复出现Java heap space或GC overhead limit exceeded
- 2022.1.4 23:02:29 出现LogSnapshotCatchUp超时,耗时1068.4s
- 2022.1.4 23:14:16-30 出现3次 PartitionedSnapshot 安装到单个slot慢,分别耗时4.0, 4.1、7.4s,上面3次之中出现了FileSnapshot 安装tsfile较慢(4.9s),最终导致2022.1.4 23:14:34 出现PartitionSnapshot整体安装慢,耗时26.3s
- 9号节点
- 2022.1.4 22:38:28 - 2022.1.4 23:51:44 零散出现LogCatchUp超时,大多数集中在3-10s
- 2022.1.4 22:54:48 - 2022.1.4 22:55:20 出现6次PartitionSnapshot安装到单个slot慢,分别耗时4.3, 3.4, 3.7, 9.4, 10.0, 3.0s,最终导致2022.1.4 22:55:20 出现PartitionSnapshot整体安装慢,耗时39.4s
- 随后导致2022.1.4 22:55:30 出现PartitionSnapshot整体安装慢,耗时24.7s, 2022.1.4 22:55:47 出现PartitionSnapshot整体安装慢,耗时18.5s
- 2022.1.4 23:14:06 doSnapshot 耗时26.4s
- 共性特点:
- load New TsFile 持锁时间不长
- MetaSimpleSnapshot 没有出现超时
- 7号节点
经过测试确认,leader节点出现大量的LogSnapshotCatchUp超时,而follower节点主要出现PartitionSnapshot安装超时,导致大量请求堆积,最后OOM。