This document discusses new API, configurations and aspects of HDFS persistence layer in Geode. We assume the reader is familiar with basic Geode constructs like Regions, Members, gfsh etc.

Operational Data Tier

Geode provides ability to cache key-value (KV) sets in memory. For big-datasets use cases, it is assumed that the entire data set cannot be managed in memory. So Geode will provide a configurable KV retention/eviction policy. The data in memory is available for fast querying and is referred to as Operational Data. Operational data set typically consists of recently accessed records. Whenever a key lookup fails in operational data, Geode will execute lookup on HDFS and add it to the operational dataset if it meets retention criteria. When used like this, Geode will provide a reliable, fast and easy access to HDFS data.

...

Geode collects all write operation and persists them on HDFS. These records are never evicted from HDFS unless deleted by user. Hence full record of all data records collected by Geode are present on HDFS, referred to as HDFS Tier. The update log managed on HDFS is similar to the oplog (operational log) maintained on local disk. The data on HDFS will be visible “externally”, for e.g. readable from a MR job or Hive query. This way data managed by Geode can be used for analytics. At any instance, Operational data is a subset of HDFS data.

...

Each write operation will be streamed to Operational data store (in-memory region) and HDFS buffers simultaneously. In general data flow to HDFS and Geode Regions will be independent of each other.
Each new/updated record will go through eviction logic test. Existing data will be checked again on need basis (heap limit trigger) or as configured by user.

PlantUML
(*) --> [PUT] Handler --> ===B1=== --> Buffer --> HDFS ===B1=== --> "EvictionPolicy" --> HdfsRegion --> Scheduler --> "EvictionPolicy" HdfsRegion ..> [cache miss] HDFS

Data Flow

Put KV

PlantUML

participant User
participant Handler
participant HdfsRegion
participant OperationalData
participant Filter
 
User->Handler: Put KV
activate Handler
Handler->HdfsRegion: Add to buffer
activate HdfsRegion
HdfsRegion->Handler:
HdfsRegion-->HDFS: Asynchronous
deactivate HdfsRegion
Handler->Filter: Test eviction logic
activate Filter
Filter->Handler: True/False
deactivate Filter
Handler->OperationalData: Put in cache
activate OperationalData
OperationalData->Handler: Return V*
deactivate OperationalData
Handler->User: Old V*
deactivate Handler

...

Space shortcuts

Page tree

Versions Compared

Old Version 3

New Version 4

Key

Operational Data Tier

Data Flow

Put KV

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 3

New Version 4

Key

Operational Data Tier

Data Flow

Put KV