Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

About

Schema tool helps to initialise and upgrade metastore database and hive sys schema.

Metastore Schema Verification

...

...

Hive

...

Hive now records the schema version in the metastore database and verifies that the metastore schema version is compatible with Hive binaries that are going to accesss access the metastore. Note that the Hive properties to implicitly create or alter the existing schema are disabled by default. Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore:

...

By default the configuration property hive.metastore.schema.verification is false and metastore to implicitly write true. If we set it false, metastore implicitly writes the schema version if it's not matching. To enable disable the strict schema verification, you need to set this property to true false in hive-site.xml.

See Hive Metastore Administration for general information about the metastore.

The Hive Schema Tool

Info
titleVersion

Introduced in Hive 0.12.0. See HIVE-5301. (Also see HIVE-5449 for a bug fix.)

The Hive The Hive distribution now includes an offline tool for Hive metastore schema manipulation. This tool can be used to initialize the metastore schema for the current Hive version. It can also handle upgrading the schema from an older version to current. It tries to find the current schema from the metastore if it is available. This will be applicable to future upgrades like 0.12.0 to 0.13.0. In case of upgrades from older releases like 0.7.0 or 0.10.0, you can specify the schema version of the existing metastore as a command line option to the tool.

The schematool figures out the SQL scripts required to initialize or upgrade the schema and then executes those scripts against the backend database. The metastore DB connection information like JDBC URL, JDBC driver and DB credentials are extracted from the Hive configuration. You can provide alternate DB credentials if needed.

The schematool Command

The schematool command invokes the Hive schema tool with these options:

Warning: You should always initialize the metastore schema first and the Hive schema second. The Hive schema initialization checks the metastore database schema and if it is not initialized, it puts some objects into the metastore database as well to make sure hive schema is running. That also means if you want to run the metastore schema initialization after the hive one, it will fail as it finds a database that already contains some objects. 

The schematool Command

The schematool command invokes the Hive schema tool with these options:

No Format
$ usage: schemaTool
 -alterCatalog <arg>                Alter a catalog, requires
                                    --catalogLocation and/or
                                    --catalogDescription parameter as well
 -catalogDescription <arg>          Description of new catalog
 -catalogLocation <arg>             Location of new catalog, required when
                                    adding a catalog
 -createCatalog <arg>               Create a catalog, requires
                                    --catalogLocation parameter as well
 -createLogsTable <arg>             Create table for Hive
                                    warehouse/compute logs
 -createUser                        Create the Hive user, set hiveUser to
                                    the db admin user and the hive
                                    password to the db admin password with
                                    this
 -dbOpts <databaseOpts>             Backend DB specific options
 -dbType <databaseType>             Metastore database type
 -driver <driver>                   driver name for connection
 -dropAllDatabases                  Drop all Hive databases (with
                                    CASCADE). This will remove all managed
                                    data!
 -dryRun                            list SQL scripts (no execute)
 -fromCatalog <arg>                 Catalog a moving database or table is
                                    coming from.  This is required if you
                                    are moving a database or table.
 -fromDatabase <arg>                Database a moving table is coming
                                    from.  This is required if you are
                                    moving a table.
 -help                              print this message
 -hiveDb <arg>                      Hive database (for use with
                                    createUser)
 -hivePassword <arg>                Hive password (for use with
                                    createUser)
 -hiveUser <arg>                    Hive user (for use with createUser)
 -ifNotExists                       If passed then it is not an error to
                                    create an existing catalog
 -info                              Show config and schema details
 -initOrUpgradeSchema               Initialize or upgrade schema to latest
                                    version
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>             Schema initialization to a version
 -mergeCatalog <arg>                Merge databases from a catalog into
                                    other, Argument is the source catalog
                                    name Requires --toCatalog to indicate
                                    the destination catalog
 -metaDbType <metaDatabaseType>     Used only if upgrading the system
                                    catalog for hive
 -moveDatabase <arg>                Move a database between catalogs.
                                    Argument is the database name.
                                    Requires --fromCatalog and --toCatalog
                                    parameters as well
 -moveTable <arg>                   Move a table to a different database.
                                    Argument is the table name. Requires
                                    --fromCatalog, --toCatalog,
                                    --fromDatabase, and --toDatabase
   
No Format
$ schematool -help
usage: schemaTool
 -dbType <databaseType>             Metastore database type
 -driver <driver>                parameters as well.
 Driver-passWord name<password> for connection
 -dryRun            Override config file password
 -retentionPeriod <arg>             ListSpecify SQLlogs scriptstable (noretention execute)period
 -helpservers <serverList>              a comma-separated list of servers used
          Print this message
 -info                       in location validation in the format
  Show config and schema details
 -initSchema                        Schema initialization
 -initSchemaTo <initTo>  of scheme://authority (e.g.
         Schema initialization to a version
 -metaDbType <metaDatabaseType>     Used only if upgrading the system catalog for hive
 -passWord <password>       hdfs://localhost:8000)
 -toCatalog <arg>      Override config file password
 -upgradeSchema         Catalog a moving database or table is
      Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -url <url>                  going to.  This is required if Connectionyou urlare
 to the database
 -userName <user>                   Override config file user name
 -verbose       moving a database or table.
 -toDatabase <arg>              Only print SQL statements
(Additional catalogDatabase relateda optionsmoving addedtable inis Hivegoing 3.0.0 (HIVE-19135] release are below.
 -createCatalog <catalog>to.
              Create catalog with given name
 -catalogLocation <location>        Location of new catalog, required when adding a catalog
This -catalogDescriptionis <description>required if Descriptionyou ofare newmoving cataloga
 -ifNotExists                       If passed then it is not an error to create an existing catalogtable.
 -moveDatabase <database>upgradeSchema                     Move a database between catalogs.  All tables under it would still be under it as part of new catalog. Argument is the database name. Requires --fromCatalog and --toCatalog parameters as well
 -moveTable  <table>Schema upgrade
 -upgradeSchemaFrom <upgradeFrom>   Schema upgrade from a version
 -url <url>                         connection url to the database
 -userName <user>    Move a table to a different database.  Argument is the table name. Requires --fromCatalog, --toCatalog, --fromDatabase, and --toDatabase 
 -toCatalog  <catalog> Override config file user name
 -validate                 Catalog a moving database or table is going to. Validate Thisthe isdatabase
 required-verbose if you are moving a database or table.
 -fromCatalog <catalog>             Catalog a moving database oronly tableprint isSQL coming from.statements
 -yes  This is required if you are moving a database or table.
 -toDatabase  <database>            Database a moving table isDon't goingask to.for confirmation Thiswhen isusing
 required if you are moving a table.
 -fromDatabase <database>           Database a moving table is coming from.  This is required if you are moving a table.


-dropAllDatabases.  

The dbType is required and can be one of:

The dbType "mssql" was added in Hive 0.13.1 with HIVE-6862.
No Format
 derby|mysql|postgres|oracle|mssql
Info
titleVersion
|hive

Note: dbType=hive only can be used on Hive sys schema. The others are metastore db types and in case of dbType=hive, it is mandatory to set  metaDbType as well. 

Usage Examples

  • Initialize to current schema for a new Hive setup:

    No Format
    $ schematool -dbType derby -initSchema Initializing the schema to: 4.0.0-beta-2
    Metastore connection URL:       	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connectionconnection Driver :   	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:      	 APP
    Starting metastore schema initialization to 4.0.13.0-beta-2
    Initialization script hive-schema-4.0.13.0-beta-2.derby.sql
    Initialization script completed
    schemaTool completed
    


  • Get schema information:

    No Format
    $ schematool -dbType derby -info 
    Metastore connection URL:        	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connectionconnection Driver :   	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       	 APP
    Hive distribution version:       0.13.0	 4.0.0-beta-2
    Metastore schema version:        0.13.0
    schemaTool completed
    
    Attempt to get schema information with older metastore:
    	 4.0.0-beta-2 


  • Init schema to for a given version: 

    No Format
    $ schematool -dbType derby -infoinitSchemaTo 3.1.0 
    Metastore connection URL:        URL:	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connectionconnection Driver :   	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:     	  APP
    HiveStarting distributionmetastore version:schema initialization to     0.133.1.0
    org.apache.hadoop.hive.metastore.HiveMetaException: Failed to get schema version.
    *** schemaTool failed ***
    

    Since the older metastore doesn't store the version information, the tool reports an error retrieving it.

    Initialization script hive-schema-3.1.0.derby.sql 



  • Upgrade schema from an 03.101.0 release by specifying the 'from' version:

    No Format
    $ schematool -dbType derby -upgradeSchemaFrom 0.10 -dbType derby -upgradeSchemaFrom 3.1.0 Upgrading from the user input version 3.1.0
    Metastore connection URL:        	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connectionconnection Driver :  	  org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:       	 APP
    Starting upgrade metastore schema from version 0.10.0 to 0.13.0
    Upgrade script upgrade-0.10.0-to-0.11.0.derby.sql
    Completed upgrade-0.10.0-to-0.11.0.derby.sql3.1.0 to 4.0.0-beta-2
    Upgrade script upgrade-03.111.0-to-03.122.0.derby.sql
    Completed upgrade-03.111.0-to-03.122.0.derby.sql
    Upgrade script upgrade-0.12.0-to-0.13.0.derby.sql
    Completed upgrade-4.0.12.0-beta-1-to-4.0.13.0-beta-2.derby.sql
    schemaTool completed
    


  • Upgrade dry run can be used to list the required scripts for the given upgrade.

    No Format
    $ build/dist/bin/schematool -dbType derby -upgradeSchemaFrom 3.1.0.7.0 -dryRun
    Metastore Connection -dryRun Upgrading from the user input version 3.1.0
    Metastore connection URL:	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore connection Driver :   	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:	       APP
    Starting upgrade metastore schema from version 03.71.0 to 0.13.0
    Upgrade script upgrade-0.74.0.0-to-0.8.0.derby.sqlbeta-2
    Upgrade script upgrade-03.81.0-to-03.92.0.derby.sql
    Upgrade script upgrade-03.92.0-to-4.0.10.0-alpha-1.derby.sql
    Upgrade script upgrade-4.0.10.0-alpha-1-to-4.0.11.0-alpha-2.derby.sql
    Upgrade script upgrade-4.0.11.0-alpha-2-to-4.0.12.0-beta-1.derby.sql
    Upgrade script upgrade-4.0.12.0-beta-1-to-4.0.13.0-beta-2.derby.sql
    schemaTool completed
    

    This is useful if you just want to find out all the required scripts for the schema upgrade.

  • Initialise Hive sys schema 

    No Format
    $ ./schematool -dbType hive -metaDbType derby -initSchema  --verbose -url="jdbc:hive2://localhost:10000"

    Note 1: As 

  • Moving a database and tables under it from default Hive catalog to a custom spark catalog

    No Format
    build/dist/bin/schematool -moveDatabase db1 -fromCatalog hive -toCatalog spark
    
    


  • Moving a table from Hive catalog to Spark Catalog

    No Format
    # Create the desired target database in spark catalog if it doesn't already exist.
    beeline ... -e "create database if not exists newdb";
    schematool -moveDatabase newdb -fromCatalog hive -toCatalog spark
    
    # Now move the table to target db under the spark catalog.
    schematool -moveTable table1 -fromCatalog hive -toCatalog spark  -fromDatabase db1 -toDatabase newdb