THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- incremental reads with insert_overwrite
- Lets say there is a pipeline setup using incremental reads. table1 --(transformation)--> table2
- if there is an insert overwrite on table1, previous records are considered 'deleted' in table1. But, incremental reads on table1 will not send deletions with current implementation.
- scaling insert overwrite
- Partitioner implementation today stores all file groups effected in memory. We may have to make the internal data structures of Partitioner spillable, if 'insert_overwrite_table' is done on a large table with large number of file groups.
Recommendation
...
After weighing trade-offs, we are inclined towards going with Option 1. Option 3 would add a lot of complexity. Option 2 looks like a cleaner approach, but is introducing new behavior of ignoring file groups. Option1 is more practical, fits in with existing framework and is relatively easier to implement.
...