Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The main idea is to use buffers for writing to disk so that the majority of time during I/O is spent on writing to file, not to switching between files.

The file write operation should send large number of bytes. There are 3 options for this

  1. Accumulate bytes from serialized dumpEntries into a large ByteBuffer and send it to disk once it becomes full. This requires in-memory data copying from small buffers to large buffer.
  2. Accumulate ByteBuffers each containing one serialized dump entry and send all ByteBuffers to disk using write(ByteBuffer[]) operation.
  3. Serialize dump entries directly to large ByteBuffer. This is good for dump executor threads, but doesn't seem very good for transaction threads.

Need to investigate: Assuming that we want to write the same total number of bytes at what range of byte buffer array size invocation of write(ByteBuffer) and write(ByteBuffer[]) is the same?

ByteBuffer trip cycle will look like this

ByteBufferPool => partition/transaction thread => partition queue => queue dispatcher thread => disk writer queue => disk writer thread => free ByteBuffer queue => ByteBuffer releaser thread => ByteBufferPool

ByteBuffer pool

Current solution uses thread local ByteBuffers with expanding size. This is ok for single threaded usage, but not suitable for passing to other threads. And this is also not good for transaction threads.

We can use pool of ByteBuffers which provides newly allocated or reused ByteBuffers and doesn't go beyond its predefined limit. For example

class ByteBufferPool

  • ByteBufferPool(size) - constructor with maximum number of bytes to allocate by all buffers together.   

  • ByteBuffer acquire(minSize) - returns ByteBuffer that has size = power of 2 and not smaller than minSize. If there is no buffer in pool, then a new buffer is allocated. If overall size occupied by all buffers has reached the limit, then this method will block until required buffer(s) returned to the pool.
  • release(ByteBuffer) - returns buffer to the pool and signals waiter on acquire

Writing to disk

We can often see that disk IO operation is the bottleneck in Ignite. So we should try to make writing to disk efficient. 

There should be a separate thread that saves data to disk and it should do minimum work besides writing to disk. For example, it could take buffers from queue and write to file. The buffers should be made ready by another thread and possibly returning buffers to the pool should be also delegated to another thread.

How many writer threads?

For desktop PCs it doesn't make sense to use more than one writer thread. But for servers using RAID storages writing to several files in parallel could be faster overall. The solution should be build in assumption that there could be several writers. Need to make sure that writers don't pick up data for the same file.

Partition queue


Compression

Encryption


Other ideas

One of the proposed ideas was to switch from writing to several partition-specific files to single dump file. This idea wasn't considered much because of change complexity and limitation for multi-threaded I/O which could be beneficial on some server storages. And it is still possible to achieve sequential writes with multiple partition files.