Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Custom ReplicationTaskFactories. Protocol whitelist for S3.

...

Code Block
titlehive-site.xml configuration for replication
  <property>
    <name>hive.metastore.event.listeners</name>
    <value>org.apache.hive.hcatalog.listener.DbNotificationListener</value>
  </property>
  <property>
    <name>hive.metastore.event.db.listener.timetolive</name>
    <value>86400s</value>
  </property>

The system uses the org.apache.hive.hcatalog.api.repl.exim.EximReplicationTaskFactory by default. This uses EXPORT and IMPORT commands to capture, move, and ingest the metadata and data that needs to be replicated. It is however possible to provide custom implementations by setting the hive.repl.task.factory Hive configuration property.

Typical mode of operation

...

At this time it is not possible to replicate to tables on EMR that have a path location on S3. This is due to a bug in the dependency of the IMPORT command in the EMR distribution (checked in AMI-4.2.0).  If using the EximReplicationTaskFactory you may need to add the relevant S3 protocols to your Hive configurations:

Code Block
titleHiveConf configuration for ExIm on S3
  <property>
    <name>hive.exim.uri.scheme.whitelist</name>
    <value>hdfs,s3a</value>
  </property>