Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

OptionDescriptionDefault value
dataFrameDataFrame instance (subclass of org.apache.spark.sql.DataFrame).null
dataFrameCallbackInstance of org.apache.camel.component.spark.DataFrameCallback interface.null

 

Hive jobs

 Instead of working with RDDs or DataFrame Spark component can also receive Hive SQL queries as payloads. To send Hive query to Spark component, use the following URI:

Code Block
languagejava
titleSpark RDD producer
spark:hive

The following snippet demonstrates how to send message as an input to a job and return results:

Code Block
languagejava
titleCalling spark job
long carsCount = template.requestBody("spark:hive?collect=false", "SELECT * FROM cars", Long.class);
List<Row> cars = template.requestBody("spark:hive", "SELECT * FROM cars", List.class);

The table we want to execute query against should be registered in a HiveContext before we query it. For example in Spring such registration could look as follows:

Code Block
languagejava
titleSpark RDD definition
@Bean
DataFrame cars(HiveContext hiveContext) {
  	DataFrame jsonCars = hiveContext.read().json("/var/data/cars.json");
 	jsonCars.registerTempTable("cars");
	return jsonCars;
}

 

Hive jobs options

OptionDescriptionDefault value
collectIndicates if results should be collected (as a list of org.apache.spark.sql.Row instances) or if count() should be called against those.true

 

Include Page
Endpoint See Also
Endpoint See Also