The scope of this Memory Management is about how runtime operators (e.g. sort, groupby, join..) manage the memory space under the given memory budget constraints.

We encapsulate the memory buffer management logic out of the implementation of each algorithm. Each operator only focuses on the algorithm details. It only needs to care if the entire memory budget is used or not. The memory manager will take care of which tuple should be stored in which specific frame and how to allocate/merge/deallocate the different size of frames.

The concept of memory management contains two parts: FramePool and BufferManager. 

  • The FramePool takes care of the allocate/deallocate a Frame of variable size. It doesn't care about how to store tuples. In the case of Big-Object it may also "merge" the unused frames. Physically the "merge" operation is implemented by deallocating the to-be-merged frames and creating a new Frame of given size. 
    FramePool
  • The BufferManager will deal with the logic of storing tuples into a Frame which is obtained from FramePool.
    BufferManager
    • IFrameBufferManager 
      This manager copys the a entire frame into the memory space. It is used in ExternalSort operators.
    • ITupleBufferManager
      This manager inserts each tuple into the memory space. It mainly used in the HeapSort operator which also need to delete tuple from the buffer.
    • IParititionedTupleBufferManager
      This manager inserts each tuple into different partition buffer. It used in the HashGroupby and HashJoin operator.

The code is in the org.apache.hyracks.dataflow.std.buffermanage folder under the hyracks-dataflow-std project.

The big-object case will introduce some tricky logic for each type of managers. (To be moved to wiki)

 

  • No labels