Introduction

Apache CarbonData (incubating) is an open source project of The Apache Software Foundation (ASF). CarbonData is a new big data native file format for faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency, in turn it will help speedup queries an order of magnitude faster over PetaBytes of data.

The Apache CarbonData community is pleased to announce the availability of CarbonData 0.2.0 which is the 3rd stable release.

We encourage everyone to download the release, and feedback through the CarbonData user mailing lists!

In this release, there are more than 30+ new feature and improvements , more than 80+ bug fixes , please find the detail at :

Please find the detail JIRA list :

Bug

[CARBONDATA-152] - Carbon is not giving proper result with double value
[CARBONDATA-153] - Record count is not matching while loading the data when one data node went down in HA setup
[CARBONDATA-157] - for decimal(n,n) column, when filter has int value, then will trow exception
[CARBONDATA-158] - Load data failed when first line is null in data
[CARBONDATA-160] - Data mismatch issue issue in case of multiple loads with dictionary column with different key size
[CARBONDATA-165] - Behavior need to be corrected when fact csv have header for ALL_DICTIONARY
[CARBONDATA-167] - UndeclaredThrowableException thrown instead of data loading fail when fileheader has unsupported characters in file/command
[CARBONDATA-169] - COLUMNDICT and ALL_DICT_PATH can not be used together
[CARBONDATA-170] - Delete the lock files which are created after unlock.
[CARBONDATA-171] - Block distribution not proper when the number of active executors more than the node size
[CARBONDATA-173] - Error info is not proper when measure use COLUMNDICT
[CARBONDATA-174] - When hadoop.tmp.dir configured incorrectly, hdfs lock of carbon would throw exception.
[CARBONDATA-176] - Should not allow deletion of compacted segment.
[CARBONDATA-177] - Greater than and Less than filter returning wrong result
[CARBONDATA-178] - table not exist when execute show segments using spark-sql and beeline the same time
[CARBONDATA-179] - describe new table show old table's schema
[CARBONDATA-180] - give proper error message when dataloading with wrong delimiter value
[CARBONDATA-183] - Blocks are allocated to single node when Executors configured is based on the ip address.
[CARBONDATA-184] - Complex types data load is not loading the data with special character delimiters like " ^ * - .
[CARBONDATA-185] - "DROP CUBE" need change to "DROP TABLE" in CarbonDatasourceRelation.scala
[CARBONDATA-186] - Except Compaction all other alter operations on carbon table should not be performed.
[CARBONDATA-187] - when using Decimal type as dictionary the generated surrogate key would mismatch for the same values during increment load
[CARBONDATA-189] - Drop database dbname cascade should be restricted in carbondata
[CARBONDATA-190] - Data mismatch issue
[CARBONDATA-191] - load data is null when quote char is single and no '\n' being end.
[CARBONDATA-192] - Invalidate table from hive context while dropping the table
[CARBONDATA-194] - ArrayIndexOfBoundException thrown when number of columns in row more than the max number of columns in univocity parser settings
[CARBONDATA-195] - Select query with AND filter failing for empty '' operand value of numeric column
[CARBONDATA-198] - Implementing system level lock for compaction.
[CARBONDATA-199] - when subquery with sort and filter the result is empty
[CARBONDATA-201] - Add comment Option
[CARBONDATA-203] - Use static string to set Hadoop configuration
[CARBONDATA-204] - Query statistics issue
[CARBONDATA-205] - Can't pass compile, the case of DataCompactionLockTest is failed.
[CARBONDATA-208] - User should be able to turn on and off the STATISTIC log
[CARBONDATA-215] - Correct the file headers of classes
[CARBONDATA-216] - Files should be deleted as this feature not supported now.
[CARBONDATA-217] - Data mismatch issue in After compaction
[CARBONDATA-219] - compaction with out data load is failing.
[CARBONDATA-220] - TimeStampDirectDictionaryGenerator_UT.java is not running in the build
[CARBONDATA-222] - Query issue for all dimensions are no dictionary columns
[CARBONDATA-224] - Fixed data mismatch issue in case of Dictionary Exclude column for Numeric data type
[CARBONDATA-226] - Delete load by ID message when the compacted segment is present is wrong.
[CARBONDATA-227] - In block distribution parralelism is decided initially and not re initialized after requesting new executors
[CARBONDATA-229] - Array Index of bound exception thrown from dictionary look up while writing sort index file
[CARBONDATA-234] - wrong message is printed in the logs each time when the compaction is done.
[CARBONDATA-238] - CarbonOptimizer shouldn't add CarbonDictionaryCatalystDecoder for HiveTable
[CARBONDATA-239] - Failure of one compaction in queue should not affect the others.
[CARBONDATA-241] - OOM error during query execution in long run
[CARBONDATA-242] - NOT IN with Null filter results are not compatible With Hive
[CARBONDATA-244] - Load and delete segment by id queries giving inconsistent results when we execute parallely
[CARBONDATA-245] - Actual Exception is getting lost in case of data dictionary file generation.
[CARBONDATA-246] - compaction is wrong in case if last segment is not assigned to an executor.
[CARBONDATA-247] - Higher MAXCOLUMNS value in load DML options is leading to out of memory error
[CARBONDATA-248] - There was no header in driver statistics table and scan block time was always zero
[CARBONDATA-250] - Throw exception and fail the data load if provided MAXCOLUMNS value is not proper
[CARBONDATA-251] - making the auto compaction as blocking call.
[CARBONDATA-252] - Filter result is not proper when Double data type values with 0.0 and -0.0 will be used
[CARBONDATA-253] - Duplicate block loading when distribution is based on blocklet
[CARBONDATA-255] - keyword SEGMENT should be used instead of LOAD In data management dml because LOAD is not supported now
[CARBONDATA-260] - Equal or lesser value of MAXCOLUMNS option than column count in CSV header results into array index of bound exception
[CARBONDATA-261] - clean files is updating the stale segments to the table status.
[CARBONDATA-262] - limit query memory and thread leak issue
[CARBONDATA-268] - CarbonOptimizer has performance problem
[CARBONDATA-271] - Non Filter data mismatch issue
[CARBONDATA-272] - Two test case are failing , on second time maven build without 'clean'
[CARBONDATA-273] - Some constants should be written using carbon common constants instead of direct values
[CARBONDATA-280] - when table properties is repeated it only set the last one
[CARBONDATA-288] - In hdfs bad record logger is failing in writting the bad records
[CARBONDATA-289] - Support MB/M for table block size and update the doc about this new feature.
[CARBONDATA-294] - Timestamp Data Error
[CARBONDATA-304] - Load data failure when set table_blocksize=2048
[CARBONDATA-310] - Compilation failed when using spark 1.6.2
[CARBONDATA-315] - Data loading fails if parsing a double value returns infinity
[CARBONDATA-316] - Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION
[CARBONDATA-317] - CSV having only space char is throwing NullPointerException
[CARBONDATA-319] - Bad Records logging for column LONG data type is not proper
[CARBONDATA-320] - problem when dropped a table during all data nodes are down.
[CARBONDATA-334] - Correct Some Spelling Mistakes
[CARBONDATA-339] - Align storePath name in generateGlobalDictionary() of GlobalDictionaryUtil.scala
[CARBONDATA-358] - Compaction is not working in latest release
[CARBONDATA-359] - is null & null functions are not working when data fetching from sub query
[CARBONDATA-360] - On Dictionary excluded column, condition is not working if value is not in ''
[CARBONDATA-363] - Block loading issue in case of blocklet distribution
[CARBONDATA-364] - Drop table is behaving inconsistently
[CARBONDATA-365] - Compaction fails when table is created with configured block size
[CARBONDATA-366] - Incorrect load data behaviour in mentioned scenario
[CARBONDATA-385] - Select query is giving cast exception
[CARBONDATA-417] - [Bad Records] Not created and not writen log file when logger is True and action as Fail

Improvement

[CARBONDATA-63] - provide recommend values for different scenarios
[CARBONDATA-80] - Dictionary values should be equally distributed in buckets while loading in memory
[CARBONDATA-117] - BlockLet distribution for optimum resource usage
[CARBONDATA-125] - Remove usage of currentRestructureNumber from the code
[CARBONDATA-132] - Parse some Spark Exception which can be shown to driver side and show them directly.
[CARBONDATA-188] - Compress CSV file while loading
[CARBONDATA-206] - Two same name class, need to optimize.
[CARBONDATA-207] - The document of "DDL operations on CarbonData", "MINOR/MAJOR" in compaction section need to provide more detail explanation
[CARBONDATA-209] - DROP TABLE in all testcase
[CARBONDATA-223] - Remove unused code from Carbon Parser
[CARBONDATA-231] - Rename repeared table names in same test file and add drop tables.
[CARBONDATA-240] - Use SQLContext to query CarbonData directly without creating table
[CARBONDATA-249] - As LongType.simpleString in spark is "bigint", Carbon will convert Long to BigInt
[CARBONDATA-254] - Code Inspection Optiminization
[CARBONDATA-263] - Configurable blocklet distribution
[CARBONDATA-281] - improve the test cases in LCM module.
[CARBONDATA-282] - Add segment management example
[CARBONDATA-292] - add COLUMNDICT operation info in DML operation guide
[CARBONDATA-293] - Add scan_blocklet_num for query statistics
[CARBONDATA-311] - Log the data size of blocklet during data load.
[CARBONDATA-330] - Fix compiler warnings - Java related
[CARBONDATA-337] - Correct Inverted Index spelling mistakes
[CARBONDATA-338] - Remove the method arguments as they are never used inside the method
[CARBONDATA-411] - test

New Feature

[CARBONDATA-164] - Add template for pull requests
[CARBONDATA-200] - Add performance statistics logs to record the query time taken by carbon
[CARBONDATA-210] - Support loading BZIP2 compressed CSV file
[CARBONDATA-211] - Support compress CarbonData file create table options
[CARBONDATA-212] - Use SQLContext to read CarbonData file
[CARBONDATA-213] - Remove thrift complier dependency
[CARBONDATA-257] - Make CarbonData readable through Spark/MapReduce program
[CARBONDATA-267] - Set block_size for table on table level
[CARBONDATA-278] - IS NULL and IS NOT NULL shall be push down to carbon
[CARBONDATA-279] - [DataLoading]Save a DataFrame to CarbonData file without writing CSV file
[CARBONDATA-286] - Support Append mode when writing Dataframe to CarbonData

Wish

[CARBONDATA-81] - please support that carbon-spark-sql can load property-file
[CARBONDATA-336] - Align the the name description

Page tree

Apache CarbonData 0.2.0-incubating Released

Introduction

Bug

Improvement

New Feature

Wish