Page History

...

Commands to schedule and run clustering

Quick start using Inline Clustering

Info

icon	false

import org.apache.hudi.QuickstartUtils._

import scala.collection.JavaConversions._

import org.apache.spark.sql.SaveMode._

import org.apache.hudi.DataSourceReadOptions._

import org.apache.hudi.DataSourceWriteOptions._

import org.apache.hudi.config.HoodieWriteConfig._

val tableName = "hudi_trips_cow"

val basePath = "/tmp/hudi_trips_cow"

val dataGen = new DataGenerator(Array("2020/03/11"))

val updates = convertToStringList(dataGen.generateInserts(10))

val df = spark.read.json(spark.sparkContext.parallelize(updates, 1));

df.write.format("org.apache.hudi").

options(getQuickstartWriteConfigs).

option(PRECOMBINE_FIELD_OPT_KEY, "ts").

option(RECORDKEY_FIELD_OPT_KEY, "uuid").

option(PARTITIONPATH_FIELD_OPT_KEY, "partitionpath").

option(TABLE_NAME, tableName).

option("hoodie.parquet.small.file.limit", "0").

option("hoodie.clustering.inline", "true").

option("hoodie.clustering.inline.max.commits", "4").

option("hoodie.clustering.plan.strategy.target.file.max.bytes", "1073741824").

option("hoodie.clustering.plan.strategy.small.file.limit", "629145600").

option("hoodie.clustering.plan.strategy.sort.columns", ""). //optional, if sorting is needed as part of rewriting data

mode(Append).

save(basePath);

Setup for Async clustering

Clustering can be scheduled and run asynchronously using WriteClient APIs

Schedule clustering API can be found here
Execute clustering API can be found here

Some caveats

...

Space shortcuts

Page tree

Versions Compared

Old Version 17

New Version 18

Key

Commands to schedule and run clustering

Quick start using Inline Clustering

Setup for Async clustering

Some caveats

There is WIP to fix these limitations. But these issues are worth mentioning:
...

Space shortcuts

Page tree

Page History

Versions Compared

Old Version 17

New Version 18

Key

Commands to schedule and run clustering

Quick start using Inline Clustering

Setup for Async clustering

Some caveats

There is WIP to fix these limitations. But these issues are worth mentioning:...

There is WIP to fix these limitations. But these issues are worth mentioning:
...