Apache CarbonData 1.3.0 Release

Apache CarbonData community is pleased to announce the release of the Version 1.3.0 in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for a faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In turn, it will help to speed up queries an order of magnitude faster over PetaBytes of data.

We encourage everyone to download the release <<release_path>>, and feedback through the CarbonData user mailing lists!

This release note provides information on the new features, improvements, and bug fixes of this release.

What’s New in Version 1.3.0?

In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.

Support Spark 2.2.1

Spark 2.2.1 is the latest stability version and has added new features and improved the performance. CarbonData 1.3.0 integrate with it for getting the advantage of it after upgrading.

Support Streaming

CarbonData supports streaming ingestion for real-time data. After the real-time data is ingested into carbon store, it can be queried from compute engine like SparkSQL.

Pre Aggregate Support

CarbonData supports pre aggregating of data so that "group by" kind of queries can fetch data much faster(around 10X performance faster). You can create as many aggregate tables as require as datamaps to improve their query performance.

Support Time Series(Alpha feature)

CarbonData supports to create multiple pre-aggregate tables for the time hierarchy and CarbonData can do automatic roll-up for the queries on these hierarchies.Note, this feature is alpha feature

CTAS(CREATE TABLE AS SELECT)

CarbonData supports to create a CarbonData table from any of the Parquet/Hive/Carbon table. This is beneficial when you want to create CarbonData table from any other Parquet/Hive table and use the Carbon query engine to query and achieve better query results. This can be also used for backing up the data.

Stardard Partitioning

In 1.3.0, CarbonData supports stardard Partition, similar as spark and hive partiton, this allows you to use any columns to create partition for improving query performance significantly.

Support External DB & Table Path

CarbonData supports external DB and Table path. Now while creating DB or table, you can specify the location where the DB or table needs to be stored.

Support Boolean Data Type

Please find the detailed JIRA list: << JIRA List >>

Page tree