Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 5.3
Warning

This is an incomplete example.

This is a rough example of how to use Apache Flume and Apache Hive together to consolidate many flume files into larger files that are loaded into a Hive warehouse and are queryable via Hive's SQL dialect..

...

Some caveats to be aware of: load moves the file, and this still has small files. Next draft will probably use 'alter' and create a new data to write dedupe'd data.This solves 3 potential data ingestion problems – small files, duplicates, and bucketing data into date related groups.