1.问题

目前每次查询内存中的数据时,需要将tvlist拷贝并排序,排序后的结果并不放回memtable,多个查询可能反复排序某一个tvlist,存在较大的性能损失,关键代码如下所示:


```

// 拿到tvlist 
IWritableMemChunk memChunk = memTableMap.get(deviceId).get(measurement);
// 拷贝
TVList chunkCopy = memChunk.getTVList().clone();
// 排序
chunkCopy.sort();

// 注意,排序后的tvlist没有放回memtable,意味着下次查询还需要重新排序
chunkCopy.setDeletionList(deletionList);
return new ReadOnlyMemChunk(measurement, dataType, encoding, chunkCopy, props, getVersion());
```


2.目标

(1)减少查询排序tvlist的开销

(2)减少查询拷贝已排序的tvlist的开销


3.方案

针对上述两个目标,制定如下解决方案:

(1)将某查询排序后的tvlist放回memtable中,这样下一次查询如果tvlist中新写入的数据是顺序的,则不需要排序,如果存在乱序,现有Timsort方式也可以在前半部分有序的情况下节省排序的CPU开销。

(2)使用写时复制的思想(在这里就是排序时复制,因为只有排序会改变tvlist中旧的数据)。对已排序的tvlist进行引用计数,当下次查询到来时,如果数据有序,则不需要拷贝tvlist,直接使用引用进行查询即可;当查询或flush需要排序时,再拷贝一份tvlist进行排序,原有进行中的查询引用的tvlist并不改变。


4.测试

(1)正确性测试:CI + Session 并发读写测试

(2)性能测试:benchmark读写并发测试


  • No labels

2 Comments

  1. Good thought. There are a couple of questions:

    1. What if concurrent querying? Each individual querying is not aware of the other is sorting.
    2. Is it possible to combine clone and sorting, i.e. sort without clone?
    3. Assuming the sorting is only on timestamp of the chunk, will it help for query that qualifies on value?
  2. Hi~ Thanks for your questions! Here are some explanations: 

    1. Clone and sort part are synchronized to avoid concurrent sorting which are redundant.
    2. If the reference count of a memtable is 0, we just sort without clone. Or we have to clone before modifying memtable.
    3. This improvement can't improve the performance of query that qualifies on value.