Warning |
---|
This is an incomplete example. |
This is a rough example of how to use Apache Flume and Apache Hive together to consolidate many flume files into larger files that are loaded into a Hive warehouse and are queryable via Hive's SQL dialect..
...
Some caveats to be aware of: load moves the file, and this still has small files. Next draft will probably use 'alter' and create a new data to write dedupe'd data.This solves 3 potential data ingestion problems – small files, duplicates, and bucketing data into date related groups.