Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Compaction performance is optimised through prefetching the data while reading carbon files.

Improved Blocklet DataMap pruning in driver

Blocklet DataMap pruning is improved using multi-thread processing in driver.

CarbonData SDK

SDK Supports C++ Interfaces for writing CarbonData files

...

  • Enable Local dictionary by default.
  • Make inverted index false by default.
  • Sort temp files during data loading are now compressed by default with Snappy compression to improve IO.

New Configuration Parameters

Configuration nameDefault ValueRange
carbon.push.rowfilters.for.vectorfalse

NA


Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=1234100612344320

Sub-task

Bug

  • [CARBONDATA-2996] - readSchemaInIndexFile can't read schema by folder path
  • [CARBONDATA-2998] - Refresh column schema for old store(before V3) for SORT_COLUMNS option
  • [CARBONDATA-3002] - Fix some spell error and remove the data after test case finished running
  • [CARBONDATA-3007] - Fix error in document
  • [CARBONDATA-3025] - Add SQL support for cli, and enhance CLI , add more metadata to carbon file
  • [CARBONDATA-3026] - clear expired property that may cause GC problem
  • [CARBONDATA-3029] - Failed to run spark data source test cases in windows env
  • [CARBONDATA-3036] - Carbon 1.5.0 B010 - Select query fails when min/max exceeds and index tree cached
  • [CARBONDATA-3040] - Fix bug for merging bloom index
  • [CARBONDATA-3058] - Fix some exception coding in data loading
  • [CARBONDATA-3060] - Improve CLI and fix other bugs in CLI tool
  • [CARBONDATA-3062] - Fix Compatibility issue with cache_level as blocklet
  • [CARBONDATA-3065] - by default disable inverted index for all the dimension column
  • [CARBONDATA-3066] - ADD documentation for new APIs in SDK
  • [CARBONDATA-3069] - fix bugs in setting cores for compaction
  • [CARBONDATA-3077] - Fixed query failure in fileformat due stale cache issue
  • [CARBONDATA-3078] - Exception caused by explain command for count star query without filter
  • [CARBONDATA-3081] - NPE when boolean column has null values with Vectorized SDK reader
  • [CARBONDATA-3083] - Null values are getting replaced by 0 after update operation.
  • [CARBONDATA-3084] - data load with float datatype falis with internal error
  • [CARBONDATA-3098] - Negative value exponents giving wrong results
  • [CARBONDATA-3106] - Written_BY_APPNAME is not serialized in executor with GlobalSort
  • [CARBONDATA-3117] - Rearrange the projection list in the Scan
  • [CARBONDATA-3120] - apache-carbondata-1.5.1-rc1.tar.gz Datamap's core and plan project, pom.xml, is version 1.5.0, which results in an inability to compile properly
  • [CARBONDATA-3122] - CarbonReader memory leak
  • [CARBONDATA-3123] - JVM crash when reading through CarbonReader
  • [CARBONDATA-3124] - Updated log message in Unsafe Memory Manager and changed faq.md accordingly.
  • [CARBONDATA-3132] - Unequal distribution of tasks in case of compaction
  • [CARBONDATA-3134] - Wrong result when a column is dropped and added using alter with blocklet cache.

New Feature

Improvement

...

  • [CARBONDATA-3008] - make yarn-local and multiple dir for temp data enable by default
  • [CARBONDATA-3009] - Optimize the entry point of code for MergeIndex
  • [CARBONDATA-3019] - Add error log in catch block to avoid to abort the exception which is thrown from catch block when there is an exception thrown in finally block
  • [CARBONDATA-3022] - Refactor ColumnPageWrapper
  • [CARBONDATA-3024] - Use Log4j directly
  • [CARBONDATA-3030] - Remove no use parameter in test case
  • [CARBONDATA-3031] - Find wrong description in the document for 'carbon.number.of.cores.while.loading'
  • [CARBONDATA-3032] - Remove carbon.blocklet.size from properties template
  • [CARBONDATA-3034] - Combing CarbonCommonConstants
  • [CARBONDATA-3035] - Optimize parameters for unsafe working and sort memory
  • [CARBONDATA-3039] - Fix Custom Deterministic Expression for rand() UDF
  • [CARBONDATA-3041] - Optimize load minimum size strategy for data loading
  • [CARBONDATA-3042] - Column Schema objects are present in Driver and Executor even after dropping table
  • [CARBONDATA-3046] - remove outdated configurations in template properties
  • [CARBONDATA-3047] - UnsafeMemoryManager fallback mechanism in case of memory not available
  • [CARBONDATA-3048] - Added Lazy Loading For 2.2/2.1
  • [CARBONDATA-3050] - Remove unused parameter doc
  • [CARBONDATA-3051] - unclosed streams cause tests failure in windows env
  • [CARBONDATA-3052] - Improve drop table performance by reducing the namenode RPC calls during physical deletion of files
  • [CARBONDATA-3053] - Un-closed file stream found in cli
  • [CARBONDATA-3054] - Dictionary file cannot be read in S3a with CarbonDictionaryDecoder.doConsume() codeGen
  • [CARBONDATA-3061] - Add validation for supported format version and Encoding type to throw proper exception to the user while reading a file
  • [CARBONDATA-3064] - Support separate audit log
  • [CARBONDATA-3067] - Add check for debug to avoid string concat
  • [CARBONDATA-3071] - Add CarbonSession Java Example
  • [CARBONDATA-3074] - Change default sort temp compressor to SNAPPY
  • [CARBONDATA-3075] - Select Filter fails for Legacy store if DirectVectorFill is enabled
  • [CARBONDATA-3087] - Prettify DESC FORMATTED output
  • [CARBONDATA-3088] - enhance compaction performance by using prefetch
  • [CARBONDATA-3104] - Extra Unnecessary Hadoop Conf is getting stored in LRU (~100K) for each LRU entry
  • [CARBONDATA-3112] - Optimise decompressing while filling the vector during conversion of primitive types
  • [CARBONDATA-3113] - Fixed Local Dictionary Query Performance and Added reusable buffer for direct flow
  • [CARBONDATA-3118] - Parallelize block pruning of default datamap in driver for filter query processing
  • [CARBONDATA-3121] - CarbonReader build time is huge
  • [CARBONDATA-3136] - JVM crash with preaggregate datamap


Compaction performance is optimised through prefetching the data while reading carbon files.