Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Spark 2.2.1 is the latest stability version and  has added new features and improved the performance. CarbonData 1.3.0 integrate with it for getting the advantage of it after upgrading.

Support Streaming

CarbonData supports Supports streaming ingestion for real-time data. After the real-time data is ingested into carbon store, it can be queried from compute engine like SparkSQL.

Pre Aggregate Support 

CarbonData supports Supports pre aggregating of data so that "group by" kind of queries can fetch data much faster(around 10X performance faster). You can create as many aggregate tables as require as datamaps to improve their query performance. 

Support Time Series(Alpha feature)

CarbonData supports Supports to create multiple pre-aggregate tables for the time hierarchy and CarbonData can do automatic roll-up for the queries on these hierarchies.Note, this feature is alpha feature

CTAS(CREATE TABLE AS SELECT)

CarbonData supports Supports to create a CarbonData table from any of the Parquet/Hive/Carbon table. This is beneficial when you want to create CarbonData table from any other Parquet/Hive table and use the Carbon query engine to query and achieve better query results. This can be also used for backing up the data.

Stardard Partitioning

In 1.3.0, CarbonData supports Supports stardard Partition, similar as spark and hive partiton, this allows you to use any columns to create partition for improving query performance significantly.

Support External DB & Table Path

CarbonData supports Supports external DB and Table path. Now while creating DB or table, you can specify the location where the DB or table needs to be stored. 

Support query data with specified dataload

Support query data with specified segments(one dataload generates one segment), users  can query data as per the real required data.

Support Boolean Data Type

...