Page History

...

Option	Description	Default value
`dataFrame`	DataFrame instance (subclass of `org.apache.spark.sql.DataFrame`).	`null`
`dataFrameCallback`	Instance of `org.apache.camel.component.spark.DataFrameCallback` interface.	`null`

Hive jobs

Instead of working with RDDs or DataFrame Spark component can also receive Hive SQL queries as payloads. To send Hive query to Spark component, use the following URI:

Code Block

language	java
title	Spark RDD producer

spark:hive

The following snippet demonstrates how to send message as an input to a job and return results:

Code Block

language	java
title	Calling spark job

long carsCount = template.requestBody("spark:hive?collect=false", "SELECT * FROM cars", Long.class);
List<Row> cars = template.requestBody("spark:hive", "SELECT * FROM cars", List.class);

The table we want to execute query against should be registered in a HiveContext before we query it. For example in Spring such registration could look as follows:

Code Block

language	java
title	Spark RDD definition

@Bean
DataFrame cars(HiveContext hiveContext) {
  	DataFrame jsonCars = hiveContext.read().json("/var/data/cars.json");
 	jsonCars.registerTempTable("cars");
	return jsonCars;
}

Hive jobs options

Option	Description	Default value
`collect`	Indicates if results should be collected (as a list of `org.apache.spark.sql.Row` instances) or if `count()` should be called against those.	`true`

Include Page

	Endpoint See Also
	Endpoint See Also

Child pages

Versions Compared

Old Version 14

New Version 15

Key

Hive jobs

Hive jobs options