Page History

...

CarbonData is a high-performance big data store solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookups and ad-hoc OLAP analysis. Due to varied business driven analysis, and the demand for flexibility of data analytics, big data domain is shadowed with data duplication and increased data management cost. CarbonData provides a new converged data storage to address data de-duplication, and supports various application scenarios. CarbonData has been deployed in 20+ enterprise production environments, largest single cluster (100+ nodes) managing data of tens of trillions. The I/O scanning and computing performance is improved by leveraging features such as multi-level index, dictionary encoding, pre-aggregation, dynamic partitioning, and quasi-real-time data query; there by achieving second-level response to analytics query on tens of trillions of data.on detail record, streaming analytics, etc. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 5PB data (more than 10 trillion records) with response time less than 3 seconds!

We encourage you to use We encourage everyone to download the release https://dist.apache.org/repos/dist/release/carbondata/1.4.0/, and feedback through the CarbonData user mailing lists!

...

In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.

Supports SDK

Provided Carbon SDK to write and read CarbonData files through Java API, supporting Avro schema and JSON data.

Supports External Table with Location

Now you can create external table by specifying the location of Carbon data files.

Supports Streaming with Pre-Aggregate Table

Now you can create pre-aggregate table on streaming tables. This enhances OLAP type of query performance on streaming tables.

Supports Partition with Pre-Aggregate

Enhancement for BI

Supports Streaming on Pre-Aggregate Table

Now you can create pre-aggregate table on streaming tables. While CarbonData's streaming ingest feature reduces the time for data availability, now you can enjoy query performance improvement also by leveraging Pre-Aggregate Table. After creating Pre-Aggregate Table by using 'preaggregate' DataMap, the data conversion in streaming table will include automatic aggregation. Queries on this table will be rewritten into two parts, one part on the streaming data and another part on the pre-aggregated data. Since the pre-aggregated data is much less than original data, the query will be much faster.

Supports Partition on Pre-Aggregate Table

If you create a Pre-Aggregate Table ('preaggregate' DataMap) on a partitioned main table, the Pre-Aggregate Table is also partitioned based on the same column. Since the partition is aligned, when you perform data management operation like create/drop/overwrite on the main table, the same operation will be done automatically on the aggregate table, Now when you drop the partition column in the main table, the same column can be dropped in the aggregate table keeping both in sync.

Enhanced Data Load performance

Now the data load performance has been enhanced

Supports External Table with Location

Now you can create external table by specifying the location of Carbon data files.

Supports SDK

Provided Carbon SDK to write and read CarbonData files through Java API, supporting Avro schema and JSON data.

Supports Lucene Index for Text Search (Alpha feature)

...

Page tree

Versions Compared

Old Version 5

New Version 6

Key

Supports SDK

Supports External Table with Location

Supports Streaming with Pre-Aggregate Table

Supports Partition with Pre-Aggregate

Enhancement for BI

Supports Streaming on Pre-Aggregate Table

Supports Partition on Pre-Aggregate Table

Enhanced Data Load performance

Supports External Table with Location

Supports SDK

Supports Lucene Index for Text Search (Alpha feature)