Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Carbondata supports customised column compressor so that user can add their own implementation of compressor. To customise compressor, user can directly use its full class name while creating table or setting it to carbon property.

Performance Improvements

Optimised

...

carbondata scan performance

Carbondata scan performance is improved by avoiding multiple data copies in case of vector flow. This is achieved through short circuit the read and vector filling, it means fill the data directly to vector after reading the data from file with out any intermediate copies.  

Row Filter pruning Now row level filter processing is handled in execution engine after pruning the , only blocklet and pages using the filter in carbonpage pruning is handled in CarbonData for vector flow. This is controlled by property  carbon.push.rowfilters.for.vector and default it is false. 

...

To enable integration with non java based execution engines, CarbonData supports C++ writer JNI wrapper to write the CarbonData files. These writers It can be integrated with any execution engine and write data to CarbonData files without the dependency on Spark or Hadoop.

...

  • Added more CLI enhancements by adding more options.
  • Supported fallback mechanism, when offheap memory is not enough then switch to onheap
  • Enable Local dictionary by default.
  • Make inverted index false by default.
  • instead of failing the job
  • Supported a separate audit log.
  • Support read batch row in CSDK to improve performance.

Behaviour Change

  • Enable Local dictionary by default.
  • Make inverted index false by default.

New Configuration Parameters

...