THIS IS A TEST INSTANCE. ALL YOUR CHANGES WILL BE LOST!!!!
...
- With the MetaStore event configuration in place on the source cluster, the
NOTIFICATION_LOG
table in the MetaStore will be populated with events on the successful execution of metadata operations such asCREATE
,ALTER
, andDROP
. - These events can be read and converted into
ReplicationTasks
usingorg.apache.hive.hcatalog.api.HCatClient.getReplicationTasks(long, int, String, String).
ReplicationTasks
encapsulate a set of commands to execute on the source Hive instance (typically to export data) and another set to execute on the replica instance (typically to import data). The commands are provided as HQL strings.- The
ReplicationTask
also serves as a place where database and table names mappings can be declared andStagingDirectoryProvider
implementations configured for the resolution of paths at both the source and destination:org.apache.hive.hcatalog.api.repl.ReplicationTask.withDbNameMapping(Function<String, String>)
org.apache.hive.hcatalog.api.repl.ReplicationTask.withTableNameMapping(Function<String, String>)
org.apache.hive.hcatalog.api.repl.ReplicationTask.withSrcStagingDirProvider(StagingDirectoryProvider)
org.apache.hive.hcatalog.api.repl.ReplicationTask.withDstStagingDirProvider(StagingDirectoryProvider)
- The HQL commands provided by the tasks must then be executed against the source Hive and then the destination (aka the replica). One way of doing this is to open up a JDBC connection to the respective HiveServer and submit the tasks HQL queries.
- It is necessary to maintain the position within the notification log so that replication tasks are applied only once. This can be achieved by maintaining a record of the last successfully executed event's id (
task.getEvent().getEventId()
) and providing this as an offset this when sourcing the next batch of events. - To avoid losing or missing events that require replication, it may be wise to poll for replication tasks at a frequency significantly greater that derived from the
hive.metastore.event.db.listener.timetolive
property.
Replication to AWS/EMR/S3
...