Page History

...

Compaction performance is optimised through prefetching the data while reading carbon files.

Improved Blocklet DataMap pruning in driver

Blocklet DataMap pruning is improved using multi-thread processing in driver.

CarbonData SDK

SDK Supports C++ Interfaces for writing CarbonData files

...

Enable Local dictionary by default.
Make inverted index false by default.
Sort temp files during data loading are now compressed by default with Snappy compression to improve IO.

New Configuration Parameters

Configuration name	Default Value	Range
carbon.push.rowfilters.for.vector	false	NA

Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=1234100612344320

Sub-task

[CARBONDATA-2930] - Support customize column compressor
[CARBONDATA-2981] - Support read primitive data type in CSDK
[CARBONDATA-2997] - Support read schema from index file and data file in CSDK
[CARBONDATA-3000] - Provide C++ interface for writing carbon data
[CARBONDATA-3003] - Suppor read batch row in CSDK
[CARBONDATA-3004] - Fix bug in writing dataframe to carbon table while the field order is different
[CARBONDATA-3038] - Add annotation for carbon properties and mark whether is dynamic configuration
[CARBONDATA-3044] - Handle exception in CSDK
[CARBONDATA-3056] - Implement concurrent reading through CarbonReader
[CARBONDATA-3057] - Implement Vectorized CarbonReader for SDK
[CARBONDATA-3063] - Support set carbon property in CSDK
[CARBONDATA-3095] - Optimize the documentation of SDK/CSDK
[CARBONDATA-3131] - Update the requested columns to the Scan

Bug

[CARBONDATA-2996] - readSchemaInIndexFile can't read schema by folder path
[CARBONDATA-2998] - Refresh column schema for old store(before V3) for SORT_COLUMNS option
[CARBONDATA-3002] - Fix some spell error and remove the data after test case finished running
[CARBONDATA-3007] - Fix error in document
[CARBONDATA-3025] - Add SQL support for cli, and enhance CLI , add more metadata to carbon file
[CARBONDATA-3026] - clear expired property that may cause GC problem
[CARBONDATA-3029] - Failed to run spark data source test cases in windows env
[CARBONDATA-3036] - Carbon 1.5.0 B010 - Select query fails when min/max exceeds and index tree cached
[CARBONDATA-3040] - Fix bug for merging bloom index
[CARBONDATA-3058] - Fix some exception coding in data loading
[CARBONDATA-3060] - Improve CLI and fix other bugs in CLI tool
[CARBONDATA-3062] - Fix Compatibility issue with cache_level as blocklet
[CARBONDATA-3065] - by default disable inverted index for all the dimension column
[CARBONDATA-3066] - ADD documentation for new APIs in SDK
[CARBONDATA-3069] - fix bugs in setting cores for compaction
[CARBONDATA-3077] - Fixed query failure in fileformat due stale cache issue
[CARBONDATA-3078] - Exception caused by explain command for count star query without filter
[CARBONDATA-3081] - NPE when boolean column has null values with Vectorized SDK reader
[CARBONDATA-3083] - Null values are getting replaced by 0 after update operation.
[CARBONDATA-3084] - data load with float datatype falis with internal error
[CARBONDATA-3098] - Negative value exponents giving wrong results
[CARBONDATA-3106] - Written_BY_APPNAME is not serialized in executor with GlobalSort
[CARBONDATA-3117] - Rearrange the projection list in the Scan
[CARBONDATA-3120] - apache-carbondata-1.5.1-rc1.tar.gz Datamap's core and plan project, pom.xml, is version 1.5.0, which results in an inability to compile properly
[CARBONDATA-3122] - CarbonReader memory leak
[CARBONDATA-3123] - JVM crash when reading through CarbonReader
[CARBONDATA-3124] - Updated log message in Unsafe Memory Manager and changed faq.md accordingly.
[CARBONDATA-3132] - Unequal distribution of tasks in case of compaction
[CARBONDATA-3134] - Wrong result when a column is dropped and added using alter with blocklet cache.

New Feature

[CARBONDATA-2977] - Write uncompress_size to ChunkCompressMeta in the file

Improvement

...

[CARBONDATA-3008] - make yarn-local and multiple dir for temp data enable by default
[CARBONDATA-3009] - Optimize the entry point of code for MergeIndex
[CARBONDATA-3019] - Add error log in catch block to avoid to abort the exception which is thrown from catch block when there is an exception thrown in finally block
[CARBONDATA-3022] - Refactor ColumnPageWrapper
[CARBONDATA-3024] - Use Log4j directly
[CARBONDATA-3030] - Remove no use parameter in test case
[CARBONDATA-3031] - Find wrong description in the document for 'carbon.number.of.cores.while.loading'
[CARBONDATA-3032] - Remove carbon.blocklet.size from properties template
[CARBONDATA-3034] - Combing CarbonCommonConstants
[CARBONDATA-3035] - Optimize parameters for unsafe working and sort memory
[CARBONDATA-3039] - Fix Custom Deterministic Expression for rand() UDF
[CARBONDATA-3041] - Optimize load minimum size strategy for data loading
[CARBONDATA-3042] - Column Schema objects are present in Driver and Executor even after dropping table
[CARBONDATA-3046] - remove outdated configurations in template properties
[CARBONDATA-3047] - UnsafeMemoryManager fallback mechanism in case of memory not available
[CARBONDATA-3048] - Added Lazy Loading For 2.2/2.1
[CARBONDATA-3050] - Remove unused parameter doc
[CARBONDATA-3051] - unclosed streams cause tests failure in windows env
[CARBONDATA-3052] - Improve drop table performance by reducing the namenode RPC calls during physical deletion of files
[CARBONDATA-3053] - Un-closed file stream found in cli
[CARBONDATA-3054] - Dictionary file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen
[CARBONDATA-3061] - Add validation for supported format version and Encoding type to throw proper exception to the user while reading a file
[CARBONDATA-3064] - Support separate audit log
[CARBONDATA-3067] - Add check for debug to avoid string concat
[CARBONDATA-3071] - Add CarbonSession Java Example
[CARBONDATA-3074] - Change default sort temp compressor to SNAPPY
[CARBONDATA-3075] - Select Filter fails for Legacy store if DirectVectorFill is enabled
[CARBONDATA-3087] - Prettify DESC FORMATTED output
[CARBONDATA-3088] - enhance compaction performance by using prefetch
[CARBONDATA-3104] - Extra Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry
[CARBONDATA-3112] - Optimise decompressing while filling the vector during conversion of primitive types
[CARBONDATA-3113] - Fixed Local Dictionary Query Performance and Added reusable buffer for direct flow
[CARBONDATA-3118] - Parallelize block pruning of default datamap in driver for filter query processing
[CARBONDATA-3121] - CarbonReader build time is huge
[CARBONDATA-3136] - JVM crash with preaggregate datamap

Compaction performance is optimised through prefetching the data while reading carbon files.

Page tree

Versions Compared

Old Version 4

New Version 5

Key

Improved Blocklet DataMap pruning in driver

CarbonData SDK

SDK Supports C++ Interfaces for writing CarbonData files

New Configuration Parameters

Sub-task

Bug

New Feature

Improvement