THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- If the tables being joined are bucketized on the join columns, and the buckets are number of buckets in one table is a multiple of each otherof the number of buckets in the other table, the buckets can be joined with each other. If table A has 8 4 buckets are and table B has 4 buckets, the following join
can be done on the mapper only. Instead of fetching B completely for each mapper of A, only the required buckets are fetched. For the query above, the mapper processing bucket 1 for A will only fetch bucket 1 of B. It is not the default behavior, and is governed by the following parameterCode Block SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key
Code Block set hive.optimize.bucketmapjoin = true
- If the tables being joined are sorted and bucketized on the join columns, and they have the same number of buckets are same, a sort-merge join can be performed. The corresponding buckets are joined with each other at the mapper. If both A and B have 4 buckets,
can be done on the mapper only. The mapper for the bucket for A will traverse the corresponding bucket for B. This is not the default behavior, and the following parameters need to be set:Code Block SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM A a join B b on a.key = b.key
Code Block set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true;