Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Overview

Sizing a Geode deployment is a process that involves some number crunching, as well as experimentation and testing. To arrive at reasonably accurate values for the key sizing parameters that will work well in practice, some experimentation and testing is required involving representative data and workload, starting at a very small scale.

...

  • Choice of Geode region type. Different regions have different per entry overhead. This overhead is documented (see below), and is also included in the sizing spreadsheet.
  • Choice of the serialization mechanism. Geode offers multiple serialization options, as well as the ability to have values stored serialized. As mentioned above, Geode PDX serialization is the generally recommended serialization mechanism due to its space and performance benefits.
  • Choice of Keys. Smaller and simpler keys are more efficient in terms of both space and performance.
  • Use of indexes. Indexing incurs a per entry overhead, as documented in the below mentioned section of the User’s Guide.

The section Memory Requirements for Cached Data of the Geode User’s Guide provides more detailed information and guidelines on this topic.

If the data value objects are small, but great in number, the per-entry overhead can add up to a significant memory requirement. This overhead can be reduced by grouping multiple data values into a single entry or by using containment relationships. For instance, you may choose to have your Order objects contain their line items instead of having a separate OrderLineItems region. If this option is available, it is worth considering as it may yield performance improvements in addition to space savings.

...

Partitioned Region Scalability

Geode partitioned regions scale out by rebalancing their data buckets (partitions) in order to distribute the data evenly across all available nodes in a cluster. When new nodes are added to the cluster, rebalancing causes some buckets to move from the old to the new nodes such that the data is evenly balanced across all the nodes. For this to work well, so that the end result is a well balanced cluster, for each partitioned region there should be at least one order of magnitude more buckets than data nodes. In general, the more buckets the better the data distribution. However, since the number of buckets cannot be changed dynamically, without downtime, it has to be chosen with the projected horizontal scale-out taken into account. Otherwise, over time as the system scales out, the data may become less evenly distributed. In the extreme case, when the number of nodes exceeds the number of buckets, adding new nodes has no effect; the ability to scale out is lost.

...

Choice of redundancy may be driven by data size, and whether data can be retrieved from some other backing store or Geode is the only store. Other considerations might go into that decision as well. For instance, Geode can be deployed in an active/active configuration in two data centers such that each can take on the entire load, but only will do so only if necessitated by a failure. In such deployments like that typically there are 4 live copies of the data at any time, 2 in each datacenter. A failure of 2 nodes in a single datacenter would cause data loss in that datacenter, but the other datacenter would take over the entire workload until those 2 nodes can be restored. Another possibility might be to set redundancy to 2 (for a total of 3 copies of data) in order to have high availability even in case of a single node failure, and avoid paying the price of rebalancing when a single node fails. Instead of rebalancing, a failed node is restarted, and in the meantime there are still 2 copies of data.

...

Geode Queues

If any Geode queueing is capabilities are used, such as for WAN distribution, client subscription, or asynchronous event listeners, it is important to consider the queues’ capacity in the context of the desired SLA. For example, for how long should gateway or client subscription queues be able to keep queueing events when the connection is lost? Given that, how large should the queues have be able to grow? The best way to find out is by watching the queues’ growth during sizing experiments, using Geode statistics (more on this in Vertical Sizing section of The Sizing Process, below) .

...

Total memory and system requirements can be approximated using the attached sizing spreadsheet, System_Sizing_Worksheet.xlsx, which calculates in takes into account all the Geode region related per-entry overhead, and takes into account the desired memory headroom. The spreadsheet formulas are rough approximations that serve to inform a very high level estimate, as they do not account for any other overhead (buffers, threads, queues, application workload, etc). In addition, the results obtained from the spreadsheet do not have any performance context. For this reason, the next step is to take the results for memory allocation per server obtained from the spreadsheet and use them as the starting point for the vertical sizing.

...