THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- If the tables being joined are sorted and bucketized on the join columns, and they have the same number of buckets, a sort-merge join can be performed. The corresponding buckets are joined with each other at the mapper. If both A and B have 4 buckets,
can be done on the mapper only. The mapper for the bucket for A will traverse the corresponding bucket for B. This is not the default behavior, and the following parameters need to be set:Code Block SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM A a join B b on a.key = b.key
Code Block set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; set hive.optimize.bucketmapjoin = true; set hive.optimize.bucketmapjoin.sortedmerge = true;
Mapjoin restrictions.
- If all but one of the tables being joined are small, the join can be performed as a map only job. The query
does not need a reducer. For every mapper of A, B is read completely.Code Block SELECT /*+ MAPJOIN(b) */ a.key, a.value FROM a join b on a.key = b.key
- The following is not supported.
- Union Followed by a Mapjoin
- Lateral View Followed by a Mapjoin
- Reduce Sink (Group By/Join/Sort By/Cluster By/Distribute By) Followed by MapJoin
- Mapjoin Followed by Union
- Mapjoin Followed by join
- Mapjoin Followed by Mapjoin