Page History

...

4. Offload the remote table to local cluster, run CTAS (example below pulls in all the data into the local table, 
   but you can pull in select columns and rows by applying predicates)

0: jdbc:hive2://localhost:10000> create table default.emr_clone as select * from test_emr_tbl;

INFO : Compiling command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc): create table default.emr_clone as select * from test_emr_tbl

INFO : Completed compiling command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc); Time taken: 6.42 seconds

INFO : Compiling command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc): create table default.emr_clone as select * from test_emr_tbl

INFO : Semantic Analysis Completed (retrial = false)

INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:test_emr_tbl.tblkey, type:int, comment:null), FieldSchema(name:test_emr_tbl.descr, type:string, comment:null)], properties:null)

INFO : Completed compiling command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc); Time taken: 1.781 seconds

INFO : Concurrency mode is disabled, not creating a lock manager

INFO : Executing command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc): create table default.emr_clone as select * from test_emr_tbl

WARN : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, impala) or using Hive 1.X releases.

INFO : Query ID = ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc

INFO : Total jobs = 3

INFO : Launching Job 1 out of 3

INFO : Starting task [Stage-1:MAPRED] in serial mode

INFO : Number of reduce tasks determined at compile time: 1

INFO : In order to change the average load for a reducer (in bytes):

INFO : set hive.exec.reducers.bytes.per.reducer=<number>

INFO : In order to limit the maximum number of reducers:

INFO : set hive.exec.reducers.max=<number>

INFO : In order to set a constant number of reducers:

INFO : set mapreduce.job.reduces=<number>

INFO : number of splits:1

INFO : Submitting tokens for job: job_local1608643179_0003

INFO : Executing with tokens: []

INFO : The url to track the job: http://localhost:8080/

INFO : Job running in-process (local Hadoop)

INFO : 2024-01-29 18:26:19,582 Stage-1 map = 0%, reduce = 0%

INFO : 2024-01-29 18:26:20,790 Stage-1 map = 100%, reduce = 0%

INFO : 2024-01-29 18:26:21,810 Stage-1 map = 100%, reduce = 100%

INFO : Ended Job = job_local1608643179_0003

INFO : Starting task [Stage-7:CONDITIONAL] in serial mode

INFO : Stage-4 is selected by condition resolver.

INFO : Stage-3 is filtered out by condition resolver.

INFO : Stage-5 is filtered out by condition resolver.

INFO : Starting task [Stage-4:MOVE] in serial mode

INFO : Moving data to directory file:/tmp/hive/warehouse/external/.hive-staging_hive_2024-01-29_18-26-14_861_862309277586351757-1/-ext-10001 from file:/tmp/hive/warehouse/external/.hive-staging_hive_2024-01-29_18-26-14_861_862309277586351757-1/-ext-10003

INFO : Starting task [Stage-0:MOVE] in serial mode

INFO : Moving data to directory file:/tmp/hive/warehouse/external/emr_clone from file:/tmp/hive/warehouse/external/.hive-staging_hive_2024-01-29_18-26-14_861_862309277586351757-1/-ext-10001

INFO : Starting task [Stage-8:DDL] in serial mode

INFO : Starting task [Stage-2:STATS] in serial mode

INFO : Executing stats task

INFO : Table default.emr_clone stats: [numFiles=1, numRows=2, totalSize=18, rawDataSize=16, numFilesErasureCoded=0]

INFO : MapReduce Jobs Launched:

INFO : Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS

INFO : Total MapReduce CPU Time Spent: 0 msec

INFO : Completed executing command(queryId=ngangam_20240129182608_db20e2bb-1db3-473f-9564-0d81b01228bc); Time taken: 6.492 seconds

INFO : OK

2 rows affected (14.802 seconds)

0: jdbc:hive2://localhost:10000> select count(*) from default.emr_clone;

...

Space shortcuts

Child pages

Versions Compared

Old Version 12

New Version 13

Key