Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Setting executor memory size is more complicated than simply setting it to be as large as possible. There are several things that need to be taken into consideration:

 

  • More executor memory means it can enable mapjoin optimization for more queries.

  • More executor memory, on the other hand, becomes unwieldy from GC perspective.

  • Some experiments shows that HDFS client doesn’t handle concurrent writers well, so it may face race condition if executor cores are too many.

...

After you’ve decided on how much memory each executor receives, you need to decide how many executors will be allocated to queries. In the GA release Spark dynamic executor allocation will be supported. However for this beta only static resource allocation can be used. Based on the physical memory in each node and the configuration of  spark.executor.memory and spark.yarn.executor.memoryOverhead, you will need to choose the number of instances and set spark.executor.instances. 


Now a real world example. Assuming 10 nodes with 64GB of memory per node with 12 virtual cores, e.g., yarn.nodemanager.resource.cpu-vcores=12. One node will be used as the master and as such the cluster will have 9 slave nodes. We’ll configure spark.executor.cores to 6. Given 64GB of ram yarn.nodemanager.resource.memory-mb will be 50GB. We’ll determine the amount of memory for each executor as follows: 50GB * (6/12) = 25GB. We’ll assign 20% to spark.yarn.executor.memoryOverhead, or 5120, and 80% to spark.executor.memory, or 20GB.

On this 9 node cluster we’ll have two executors per host. As such we can configure spark.executor.instances somewhere between 2 and 18. A value of 18 would utilize the entire cluster. 

Common Issues (Green are resolved, will be removed from this list)

...