Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • incremental reads with insert_overwrite
    • Lets say there is a pipeline setup using incremental reads. table1 --(transformation)--> table2 
    • if there is an insert overwrite on table1, previous records are considered 'deleted'  in table1. But, incremental reads on table1 will not send deletions with current implementation.
  • scaling insert overwrite
    • Partitioner implementation today stores all file groups effected in memory. We may have to make the internal data structures of Partitioner spillable, if 'insert_overwrite_table' is done on a large table with large number of file groups.


Recommendation

...

After weighing trade-offs, we are inclined towards going with Option 1.  Option 3 would add a lot of complexity. Option 2 looks like a cleaner approach, but is introducing new behavior of ignoring file groups. Option1 is more practical, fits in with existing framework and is relatively easier to implement.  

...