...
Similar to case(a), but sets up db-level replication that excludes table/view 'Q4' and all table/view names that have prefix 'T' and numeric suffix of any length. For example, 'T3', 'T400', 't255' etc. The table/view names are case-insensitive in nature and hence table/view name with prefix 't' would also be excluded from dump.
...
The presence of a FROM <init-evid> tag makes this dump not a bootstrap, but a dump which looks at the event log to produce a delta dump. FROM 200 TO 1400 is self-evident in that it will go through event ids 200 to 1400 looking for events from the relevant db.
...
This is an example of changing the replication policy/scope dynamically during incremental replication cycle.
In first case, a full DB replication policy "sales" is changed to a replication policy that includes only table/view names with only alphabets "sales.['[a-z]+']" such as "stores", "products" etc. The REPL LOAD using this dump would intelligently drops the tables which are excluded as per the new policy. For instance, table with name 'T5' would be automatically dropped during REPL LOAD if it is already there in target cluster.
In second case, policy is again changed to include table/view 'Q5' and in this case, Hive would intelligently bootstrap the table/view 'Q5' in the current incremental dump. The same is applicable for table/view renames where
(i) REPL DUMP sales WITH ('hive.repl.include.external.tables'='false', 'hive.repl.dump.metadata.only'='true');
The REPL DUMP command has an optional WITH clause to set command-specific configurations to be used when trying to dump. These configurations are only used by the corresponding REPL DUMP command and won't be used for other queries running in the same session. In this example, we set the configurations to exclude external tables and also include only metadata and don't dump data.
Return values:
- Error codes returned as return error codes (and over jdbc if with HS2)
- Returns 2 columns in the ResultSet:
- <dir-name> - the directory to which it has dumped info.
- <last-evid> - the last event-id associated with this dump, which might be the end-evid, or the curr-evid, as the case may be.
Note:
Now, the dump generated will be similar to the kind of dumps generated by EXPORTs, in that it will contain a _metadata file, but it will not contain the actual data files, instead using a _files file as an indirection to the actual files. One more aspect of REPL DUMP is that it does not take a directory as an argument on where to dump into. Instead, it creates its own dump directory inside a root dir specified by a new HiveConf parameter, hive.repl.rootdir
, which will configure a root directory for dumps, and returns the dumped directory as part of the return value from it. It is intended also that we will introduce a replication dumpdir cleaner which will periodically clean it up.
This call is intended to be synchronous, and expects the caller to wait for the result.
If HiveConf parameter hive.in.
If HiveConf parameter hive.in.test
is false
, REPL DUMP will not use a new dump location, thus it will garble an existing dump. Hence before taking an incremental dump, clear the bootstrap dump location if hive
.in.test is false.
Return values:
- Error codes returned as return error codes (and over jdbc if with HS2)
- Returns 2 columns in the ResultSet:
- <dir-name> - the directory to which it has dumped info.
- <last-evid> - the last event-id associated with this dump, which might be the end-evid, or the curr-evid, as the case may be.
This call is intended to be synchronous, and expects the caller to wait for the result.in.test
is false.
Bootstrap note : The FROM clause means that we read the event log to determine what to dump. For bootstrapping, we would not use FROM.
...
The REPL LOAD command has an optional WITH clause to set command-specific configurations to be used when trying to copy from the source cluster. These configurations are only used by the corresponding REPL LOAD command and won't be used for other queries running in the same session.
...
REPL STATUS
REPL STATUS <dbname>[.<tablename>];
Return values:
- Error codes returned as normal.
- Returns the last replication state (event ID) for the given database.
Bootstrap, Revisited
When we introduced the notion of a need for bootstrap, we said that the problem of time passing during the bootstrap was something of a problem that needed solving separately.
...