Apache CarbonData community is pleased to announce the release of the Version 1.3.0 in The Apache Software Foundation (ASF). CarbonData is a new BigData native file format for a faster interactive query using advanced columnar storage, index, compression, and encoding techniques to improve computing efficiency. In turn, it will help to speed up queries an order of magnitude faster over PetaBytes of data.
We encourage everyone to download the release https://archive.apache.org/dist/carbondata/1.3.0/, and feedback through the CarbonData user mailing lists!
This release note provides information on the new features, improvements, and bug fixes of this release.
What’s New in Version 1.3.0?
In this version of CarbonData, following are the new features added for performance improvements, compatibility, and usability of CarbonData.
Support Spark 2.2.1
Spark 2.2.1 is the latest stable version and has added new features and improved the performance. CarbonData 1.3.0 integrate with it for getting the advantage of it after upgrading.
Support Streaming
Supports streaming ingestion for real-time data. After the real-time data is ingested into carbon store, it can be queried from compute engine like SparkSQL.
Pre Aggregate Support
Supports pre aggregating of data so that "group by" kind of queries can fetch data much faster(around 10X performance faster). You can create as many aggregate tables as require as datamaps to improve their query performance.
Support Time Series (Alpha feature)
Supports to create multiple pre-aggregate tables for the time hierarchy and CarbonData can do automatic roll-up for the queries on these hierarchies.Note, this feature is alpha feature
CTAS (CREATE TABLE AS SELECT)
Supports to create a CarbonData table from any of the Parquet/Hive/Carbon table. This is beneficial when you want to create CarbonData table from any other Parquet/Hive table and use the Carbon query engine to query and achieve better query results. This can be also used for backing up the data.
Standard Partitioning
Supports standard partition, similar to spark and hive partition, this allows you to use any columns to create a partition for improving query performance significantly.
Support External DB & Table Path
Supports external DB and Table path. Now while creating DB or table, you can specify the location where the DB or table needs to be stored.
Support Query Data with Specified Dataload
Support query data with specified segments (one dataload generates one segment), users can query data as per the real required data, this would be very helpful to improve query performance.
Support Boolean Data Type
Please find the detailed JIRA list: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12320220&version=12341004
Sub-task
- [CARBONDATA-1173] - Streaming Ingest: Write path framework implementation
- [CARBONDATA-1174] - Streaming Ingest: Write path schema validation/inference
- [CARBONDATA-1175] - Streaming Ingest: Write path data conversion/transformation
- [CARBONDATA-1176] - Streaming Ingest: Write path streaming segment/file creation
- [CARBONDATA-1517] - 1. Support CTAS in carbon and support creating aggregation tables using CTAS.And update aggregation table information to main table schema.
- [CARBONDATA-1518] - 2. Support creating timeseries while creating main table.
- [CARBONDATA-1519] - 3. Create UDF for timestamp to extract year,month,day,hour and minute from timestamp and date
- [CARBONDATA-1520] - 4 Load aggregation tables from main table after finish.
- [CARBONDATA-1523] - . Add the API in carbon layer to get suitable aggregation table for group by query. Update query plan in carbon optimizer to support aggregation tables for group by queries.
- [CARBONDATA-1524] - 8. Refresh the cache of main table after droping of aggregation table.
- [CARBONDATA-1526] - 10. Handle compaction in aggregation tables.
- [CARBONDATA-1528] - 12. Handle alter table scenarios for aggregation table
- [CARBONDATA-1538] - Implement bitset pipe-lining in carbondata filtering. It will improve performance and needed for FG Datamap
- [CARBONDATA-1576] - Support create and drop datamap by SQL
- [CARBONDATA-1579] - Support show/describe datamap information by SQL
- [CARBONDATA-1585] - Support show/describe streaming table information by SQL
- [CARBONDATA-1586] - Support handoff from row format to columnar format
- [CARBONDATA-1591] - Support query from specified Segment (ThreadSafe)
- [CARBONDATA-1610] - ALTER TABLE set streaming property
- [CARBONDATA-1611] - Block UPDATE/DELETE command for streaming table
- [CARBONDATA-1612] - Block DELETE SEGMENT BY ID for streaming table
- [CARBONDATA-1614] - SHOW SEGMENT should include the streaming property
- [CARBONDATA-1616] - Add document for streaming ingestion usage
- [CARBONDATA-1656] - Reject ALTER TABLE command for streaming table
- [CARBONDATA-1667] - Remove DirectLoad feature
- [CARBONDATA-1668] - Remove isTableSplitPartition while loading
- [CARBONDATA-1669] - Clean up code in CarbonDataRDDFactory
- [CARBONDATA-1701] - support thread safe api for segment reading
- [CARBONDATA-1702] - Add Documentation for SEGMENT READING feature
- [CARBONDATA-1817] - Reject create datamap on streaming table
- [CARBONDATA-1854] - Add support for implicit column filter
- [CARBONDATA-1855] - Add outputformat in carbon.
- [CARBONDATA-1856] - Support insert/load data for partition table.
- [CARBONDATA-1857] - Create a system level switch for supporting standard partition or carbon custom partition.
- [CARBONDATA-1858] - Support querying data from partition table.
- [CARBONDATA-1859] - Support drop partition in carbon
- [CARBONDATA-1860] - Support insertoverwrite for a specific partition.
- [CARBONDATA-1861] - Support show partitions
- [CARBONDATA-1862] - Support compaction for partition table .
- [CARBONDATA-1863] - Clean segment information while using clean table command
- [CARBONDATA-1872] - Clean up unused constant in CarbonCommonConstant
- [CARBONDATA-1924] - Add restriction for creating streaming table as partition table.And support PARTITION syntax to LOAD command
- [CARBONDATA-1925] - Support expression inside aggregate expression in create and load data on Pre aggregate table
- [CARBONDATA-1926] - Support expression inside aggregate expression during query on Pre Aggregate table
- [CARBONDATA-1927] - Support sub query on Pre Aggregate table
- [CARBONDATA-1933] - Support Spark 2.2.1 for carbon partition tables
- [CARBONDATA-1948] - Update help document for the change made for CARBONDATA-1929
- [CARBONDATA-1999] - Block drop table and delete streaming segment while streaming is in progress
- [CARBONDATA-2009] - REFRESH TABLE Limitation When HiveMetaStore is used
- [CARBONDATA-2010] - block aggregation main table to set streaming property
- [CARBONDATA-2044] - IDG update for CARBONDATA-2043 Configurable wait time for requesting executors and minimum registered executors ratio to continue the block distribution
- [CARBONDATA-2116] - Documentation for CTAS
- [CARBONDATA-2126] - Documentation for Create Database
- [CARBONDATA-2127] - Documentation for Hive Standard Partition
- [CARBONDATA-2128] - Documentation update for Table Path
Bug
- [CARBONDATA-1032] - NumberFormatException and NegativeArraySizeException for select with in clause filter limit for unsafe true configuration
- [CARBONDATA-1055] - Record count mismatch for Carbon query compared with Parquet for TPCH query 15
- [CARBONDATA-1192] - Unable to Select Data From more than one table in hive
- [CARBONDATA-1218] - In case of data-load failure the BadRecordsLogger.badRecordEntry map holding the task Status is not removing the task Entry.
- [CARBONDATA-1224] - Going out of memory if more segments are compacted at once in V3 format
- [CARBONDATA-1247] - Block pruning not working for date type data type column
- [CARBONDATA-1249] - Wrong order of columns in redirected csv for bad records
- [CARBONDATA-1258] - CarbonData should not allow loading Date Type values violating the boundary condition ("0001-01-01" through "9999-12-31")
- [CARBONDATA-1278] - Data Mismatch issue when dictionary column filter values doesn't exists in dictionary
- [CARBONDATA-1304] - Support IUD with single_pass
- [CARBONDATA-1326] - Fixed high priority findbug issues
- [CARBONDATA-1352] - Test case Execute while creating Carbondata jar.
- [CARBONDATA-1410] - Thread leak issue in case of data loading failure
- [CARBONDATA-1449] - GC issue in case of date filter if it is going to rowlevel executor
- [CARBONDATA-1454] - Block pruning not working when wrong data is given in filter
- [CARBONDATA-1473] - Unable To use Greater than Operator on Date Type In Hive
- [CARBONDATA-1480] - Datamap Example. Min Max Index implementation.
- [CARBONDATA-1486] - Fixed issue of table status updation on insert overwrite failure and exception thrown while deletion of stale folders
- [CARBONDATA-1504] - Refresh of segments in datamap for update and partition is not working if the segments are cached
- [CARBONDATA-1512] - Failed to run sqls concurrently
- [CARBONDATA-1514] - Sort Column Property is not getting added in case of alter operation
- [CARBONDATA-1515] - Fixed NPE in Data loading
- [CARBONDATA-1529] - Partition Table link not working in the README.md
- [CARBONDATA-1533] - Fixed decimal data load fail issue and restricted max characters per column
- [CARBONDATA-1536] - Default value of carbon.bad.records.action is FORCE
- [CARBONDATA-1537] - Can't use data which is loaded in V1 format (0.2 version) in V3 format (current master )
- [CARBONDATA-1574] - No_Inverted is applied for all newly added column irrespect of specified in tableproperties
- [CARBONDATA-1596] - ClassCastException is thrown by IntermediateFileMerger for decimal columns
- [CARBONDATA-1618] - Fix issue of not supporting table comment
- [CARBONDATA-1619] - Loading data to a carbondata table with overwrite=true many times will cause NullPointerException
- [CARBONDATA-1627] - one job failed among 100 job while performing select operation with 100 different thread
- [CARBONDATA-1651] - Unsupported Spark2 BooleanType
- [CARBONDATA-1658] - Thread Leak Issue in No Sort
- [CARBONDATA-1660] - Incorrect result displays while executing select query with where clause for decimal data type
- [CARBONDATA-1661] - Incorrect output of select query with timestamp data type on presto CLI
- [CARBONDATA-1680] - Carbon 1.3.0-Partitioning:Show Partition for Hash Partition doesn't display the partition id
- [CARBONDATA-1689] - Fix parent pom issues and correct CI link of README
- [CARBONDATA-1690] - Query failed after swap table by renaming
- [CARBONDATA-1691] - Carbon 1.3.0-Partitioning:Document needs to be updated for Table properties (Sort_Scope) in create table
- [CARBONDATA-1694] - Incorrect exception on presto CLI while executing select query after applying alter drop column query on a table
- [CARBONDATA-1699] - Filter is not working properly
- [CARBONDATA-1700] - Failed to load data to existed table after spark session restarted
- [CARBONDATA-1711] - Carbon1.3.0-DataMap - Show datamap on table <par_table> does not work
- [CARBONDATA-1713] - Carbon1.3.0-Pre-AggregateTable - Aggregate query on main table fails after creating pre-aggregate table when upper case used for column name
- [CARBONDATA-1714] - Carbon1.3.0-Alter Table - Select columns with is null and limit throws ArrayIndexOutOfBoundsException after multiple alter
- [CARBONDATA-1719] - Carbon1.3.0-Pre-AggregateTable - Empty segment is created when pre-aggr table created in parallel with table load, aggregate query returns no data
- [CARBONDATA-1720] - Wrong data displayed for <= filter for timestamp column(dictionary column)
- [CARBONDATA-1726] - Carbon1.3.0-Streaming - Null pointer exception is thrown when streaming is started in spark-shell
- [CARBONDATA-1728] - Carbon1.3.0- DB creation external path : Delete data with select in where clause not successful for large data
- [CARBONDATA-1729] - The compatibility issue with hadoop <= 2.6 and 2.7
- [CARBONDATA-1731] - Carbon1.3.0- DB creation external path: Update fails incorrectly with error for table created in external db location
- [CARBONDATA-1733] - While load is in progress, Show segments is throwing NPE
- [CARBONDATA-1736] - Carbon1.3.0-Pre-AggregateTable -Query from segment set is not effective when pre-aggregate table is present
- [CARBONDATA-1737] - Carbon1.3.0-Pre-AggregateTable - Pre-aggregate table loads partially when segment filter is set on the main table
- [CARBONDATA-1740] - Carbon1.3.0-Pre-AggregateTable - Query plan exception for aggregate query with order by when main table is having pre-aggregate table
- [CARBONDATA-1742] - Fix NullPointerException in SegmentStatusManager
- [CARBONDATA-1743] - Carbon1.3.0-Pre-AggregateTable - Query returns no value if run at the time of pre-aggregate table creation
- [CARBONDATA-1749] - Carbon1.3.0- DB creation external path : mdt file is not created in directory as per configuration in carbon.properties
- [CARBONDATA-1750] - SegmentStatusManager.readLoadMetadata showing NPE if tablestatus file is empty
- [CARBONDATA-1751] - Modify sys.err to AnalysisException when uses run related operation except IUD,compaction and alter
- [CARBONDATA-1752] - There are some scalastyle error should be optimized in CarbonData
- [CARBONDATA-1753] - Missing 'org.scalatest.tools.Runner' when run test with streaming module
- [CARBONDATA-1755] - Carbon1.3.0 Concurrent Insert overwrite-update: User is able to run insert overwrite and update job concurrently.
- [CARBONDATA-1759] - (Carbon1.3.0 - Clean Files) Clean command is not working correctly for segments marked for delete due to insert overwrite job
- [CARBONDATA-1760] - Carbon 1.3.0- Pre_aggregate: Proper Error message should be displayed, when parent table name is not correct while creating datamap.
- [CARBONDATA-1761] - (Carbon1.3.0 - DELETE SEGMENT BY ID) In Progress Segment is marked for delete if respective id is given in delete segment by id query
- [CARBONDATA-1764] - Fix issue of when create table with short data type
- [CARBONDATA-1766] - fix serialization issue for CarbonAppendableStreamSink
- [CARBONDATA-1767] - Remove dependency of Java 1.8
- [CARBONDATA-1774] - Not able to fetch data from a table with Boolean data type in presto
- [CARBONDATA-1775] - (Carbon1.3.0 - Streaming) Select query fails with java.io.EOFException when data streaming is in progress
- [CARBONDATA-1776] - Fix some possible test error that are related to compaction
- [CARBONDATA-1777] - Carbon1.3.0-Pre-AggregateTable - Pre-aggregate tables created in Spark-shell sessions are not used in the beeline session
- [CARBONDATA-1781] - (Carbon1.3.0 - Streaming) Select * & select column fails but select count(*) is success when .streaming file is removed from HDFS
- [CARBONDATA-1783] - (Carbon1.3.0 - Streaming) Error "Failed to filter row in vector reader" when filter query executed on streaming data
- [CARBONDATA-1789] - Carbon1.3.0 Concurrent Load-Drop: user is able to drop table even if insert/load job is running
- [CARBONDATA-1793] - Insert / update is allowing more than 32000 characters for String column
- [CARBONDATA-1795] - Fix code issue of all examples
- [CARBONDATA-1796] - While submitting new job to Hadoop, token should be generated for accessing paths
- [CARBONDATA-1797] - Segment_Index compaction should take compaction lock to support concurrent scenarios better
- [CARBONDATA-1799] - CarbonInputMapperTest is failing
- [CARBONDATA-1802] - Carbon1.3.0 Alter:Alter query fails if a column is dropped and there is no key column
- [CARBONDATA-1806] - Carbon1.3.0 Load with global sort: Load fails If a table is created with sort scope as global sort
- [CARBONDATA-1807] - Carbon1.3.0-Pre-AggregateTable - Pre-aggregate creation not throwing error for wrong syntax and results in further query failures
- [CARBONDATA-1808] - (Carbon1.3.0 - Alter Table) Inconsistency in create table and alter table usage for char and varchar column
- [CARBONDATA-1810] - Bad record path is not correct for UT
- [CARBONDATA-1814] - (Carbon1.3.0 - Streaming) Nullpointereception in spark shell when the streaming started with table streaming altered from default(false) to true
- [CARBONDATA-1824] - Carbon 1.3.0 - Spark 2.2-Residual segment files left over when load failure happens
- [CARBONDATA-1826] - Carbon 1.3.0 - Spark 2.2: Describe table & Describe Formatted shows the same result
- [CARBONDATA-1828] - Carbon 1.3.0 - Spark 2.2 Empty CSV is being loaded successfully.
- [CARBONDATA-1829] - Carbon 1.3.0 - Spark 2.2: Insert is passing when Hive is having Float and Carbon is having INT value and load file is having single precision decimal value
- [CARBONDATA-1831] - Carbon 1.3.0 - BAD_RECORDS: Data Loading with Action as Redirect & logger enable is not logging the logs in the defined path.
- [CARBONDATA-1832] - Table cache should be cleared when dropping table
- [CARBONDATA-1833] - Should fix BindException in TestStreamingTableOperation
- [CARBONDATA-1839] - Data load failed when using compressed sort temp file
- [CARBONDATA-1840] - carbon.data.file.version default value is not correct in http://carbondata.apache.org/configuration-parameters.html
- [CARBONDATA-1842] - Fix 'wrong argument number' error of class Cast for Spark 2.2 when pattern matching
- [CARBONDATA-1846] - Incorrect output on presto CLI while executing IN operator with multiple load
- [CARBONDATA-1848] - Streaming sink should adapt spark 2.2
- [CARBONDATA-1868] - Carbon Spark-2.2 Integration Phase 2
- [CARBONDATA-1876] - clean all the InProgress segments for all databases during session initialization
- [CARBONDATA-1878] - JVM crash after off-heap-sort disabled
- [CARBONDATA-1881] - insert overwrite not working properly for pre-aggregate tables
- [CARBONDATA-1882] - select a table with 'group by' and perform insert overwrite to another carbon table it fails
- [CARBONDATA-1885] - Test error in AlterTableValidationTestCase
- [CARBONDATA-1886] - Stale folders are not getting deleted on deletion on table status file
- [CARBONDATA-1887] - block pruning not happening is carbon for ShortType and SmallIntType columns
- [CARBONDATA-1888] - Compaction is failing in case of timeseries
- [CARBONDATA-1891] - None.get when creating timeseries table after loading data into main table
- [CARBONDATA-1893] - Data load with multiple QUOTECHAR characters in syntax should fail
- [CARBONDATA-1895] - Fix issue of create table if not exits
- [CARBONDATA-1896] - Clean files operation improvement
- [CARBONDATA-1899] - Add CarbonData concurrency test case
- [CARBONDATA-1907] - Avoid unnecessary logging to improve query performance for no dictionary non string columns
- [CARBONDATA-1910] - do not allow tupleid, referenceid and positionReference as columns names
- [CARBONDATA-1911] - Enhance Carbon Test cases
- [CARBONDATA-1912] - Getting error trace on spark-sql console while executing compaction and alter table rename commands
- [CARBONDATA-1913] - Global Sort data dataload fails for big with RPC timeout exception
- [CARBONDATA-1914] - Dictionary Cache Access Count Maintenance
- [CARBONDATA-1916] - Correct the database location path during carbon drop databsae
- [CARBONDATA-1918] - Incorrect data is displayed when String is updated using Sentences
- [CARBONDATA-1920] - Sparksql query result is not same as presto on same sql
- [CARBONDATA-1929] - carbon property configuration validation
- [CARBONDATA-1930] - Dictionary not found exception is thrown when filter expression is given in aggergate table query
- [CARBONDATA-1931] - DataLoad failed for Aggregate table when measure is used for groupby
- [CARBONDATA-1934] - Incorrect results are returned by select query in case when the number of blocklets for one part file are > 1 in the same task
- [CARBONDATA-1935] - Fix the backword compatibility issue for tableInfo deserialization
- [CARBONDATA-1936] - Bad Record logger is not working properly in Carbon Partition
- [CARBONDATA-1937] - NULL values on Non string partition columns throws exception
- [CARBONDATA-1940] - Select query on preaggregate table created with group by clause throws exception: Column does not exist
- [CARBONDATA-1943] - Load static partition with LOAD COMMAND creates multiple partitions
- [CARBONDATA-1944] - Special character like comma (,) cannot be loaded on partition columns
- [CARBONDATA-1946] - Exception thrown after alter data type change operation on dictionary exclude integer type column
- [CARBONDATA-1947] - fix select * issue after compaction, delete and clean files operation
- [CARBONDATA-1949] - DESC formatted command displays sort scope twice
- [CARBONDATA-1950] - DESC FORMATTED command is displaying wrong comment for sort_scope as global_sort
- [CARBONDATA-1953] - Pre-aggregate Should inherit sort column,sort_scope,dictionary encoding from main table
- [CARBONDATA-1954] - CarbonHiveMetastore is not being updated while dropping the Pre-Aggregate table
- [CARBONDATA-1955] - Delta DataType calculation is incorrect for long type
- [CARBONDATA-1956] - Select query with sum, count and avg throws exception for pre aggregate table
- [CARBONDATA-1957] - create datamap query fails on table having dictionary_include
- [CARBONDATA-1964] - SET command does not set the parameters correctly
- [CARBONDATA-1965] - SET command is not setting the parameter carbon.options.sort.scope
- [CARBONDATA-1966] - SET command for carbon.properties.filepath is not setting the property
- [CARBONDATA-1967] - Auto compaction is not working for partition table. And carbon indexfiles are merging even after configured as false
- [CARBONDATA-1972] - Compaction after update of whole data fails in partition table.
- [CARBONDATA-1973] - User Should not Be able to give the duplicate column name in partition even if its case sensitive
- [CARBONDATA-1974] - Exception when to load data using static partition for uniqdata table
- [CARBONDATA-1975] - Wrong input metrics displayed for carbon
- [CARBONDATA-1976] - Support combination of dynamic and static partitions. And fix concurrent partition load issue.
- [CARBONDATA-1977] - Aggregate table loading is working in partition table
- [CARBONDATA-1978] - Preaggregate table loading failed when using HiveMetastore
- [CARBONDATA-1979] - implicit column filtering logic to directly validate the blocklet ID instead of Block
- [CARBONDATA-1980] - Partition information is added while restore or refresh the table. And also query is not working if there is nay upper case letter in partition column.
- [CARBONDATA-1981] - Error occurs while building project in windows environment
- [CARBONDATA-1982] - Loading data into partition table with invalid partition column should throw proper exception
- [CARBONDATA-1984] - Double datatype Compression Bug
- [CARBONDATA-1985] - Insert into failed for multi partitioned table for static partition
- [CARBONDATA-1986] - Insert over write into partitioned table with dynamic partition throws error
- [CARBONDATA-1987] - Make package name and directory paths consistent;remove duplicate file CarbonColumnValidator
- [CARBONDATA-1988] - Drop partition is not removing the partition folder from hdfs
- [CARBONDATA-1989] - Drop partition is dropping table data
- [CARBONDATA-1991] - Select query from a streaming table throws ClassCastException
- [CARBONDATA-2001] - Unable to save a dataframe result as carbondata streaming table
- [CARBONDATA-2005] - Location attribute with table properties in create table command throws parser exception
- [CARBONDATA-2011] - CarbonStreamingQueryListener throwing ClassCastException
- [CARBONDATA-2013] - executing alter query on non-carbon table gives error, "table can not found in database"
- [CARBONDATA-2014] - update table status for load failure only after first entry
- [CARBONDATA-2015] - Restricted maximum length of bytes per column
- [CARBONDATA-2016] - Exception displays while executing compaction with alter query
- [CARBONDATA-2017] - Error occurs when loading multiple files
- [CARBONDATA-2020] - Fisrt time query performance after upgrade from old version 1.1 to latest 1.3 version is degraded
- [CARBONDATA-2021] - when delete is success and update is failed while writing status file then a stale carbon data file is created.
- [CARBONDATA-2022] - Query With table alias is not hitting pre aggregate table
- [CARBONDATA-2024] - After update empty folder is being created for compacted segments
- [CARBONDATA-2028] - Select Query failed with preagg having timeseries and normal agg table together
- [CARBONDATA-2029] - Query with expression is giving wrong result
- [CARBONDATA-2030] - avg with Aggregate table for double data type is failed.
- [CARBONDATA-2031] - Select column with is null for no_inverted_index column throws java.lang.ArrayIndexOutOfBoundsException
- [CARBONDATA-2035] - Incorrect assert in code leads to tests failed
- [CARBONDATA-2036] - Insert overwrite on static partition cannot work properly
- [CARBONDATA-2038] - Java tests should use JUnit assertion instead of the Java native one
- [CARBONDATA-2039] - Add relative blocklet id during initialization in the blocklet data map
- [CARBONDATA-2042] - Data Mismatch issue in case of Timeseries Year, Month and Day level table
- [CARBONDATA-2046] - agg Query failed when non supported aggregate is present in Query
- [CARBONDATA-2048] - Data delete should be rejected when insert overwrite is in progress
- [CARBONDATA-2049] - CarbonCleanFilesCommand table path problem
- [CARBONDATA-2051] - Added like query ends with and contains with filter push down suport to carbondata
- [CARBONDATA-2053] - Add events for streaming
- [CARBONDATA-2057] - Support specify path when creating pre-aggregate table
- [CARBONDATA-2058] - Streaming throw NullPointerException after batch loading
- [CARBONDATA-2060] - Fix InsertOverwrite on partition table
- [CARBONDATA-2061] - Check for only valid IN_PROGRESS segments
- [CARBONDATA-2063] - Tests should not depend on each other
- [CARBONDATA-2066] - uniqwithoutheader.csv file gets deleted when "duplicate values" test case is run
- [CARBONDATA-2068] - Drop datamap should work for timeseries
- [CARBONDATA-2069] - Data is not loaded into preaggregate table when table is created when data load is in progress for main table
- [CARBONDATA-2070] - when hive metastore is enabled, create preaggregate table on decimal column of main table is failing
- [CARBONDATA-2075] - When dropping datamap without IF EXIST, if the datamap does not exist, we should throw MalformedCarbonCommandException
- [CARBONDATA-2077] - Drop datamap should throw exception if table doesn't exist, even though there is IF EXISTS
- [CARBONDATA-2081] - some time Spark-sql and beeline operation is not being reflected to each other.
- [CARBONDATA-2082] - Timeseries pre-aggregate table should support the blank space
- [CARBONDATA-2083] - Timeseries pre-aggregate table should support hour != 1 , others are the same
- [CARBONDATA-2084] - Timeseries pre-aggregate table should support min and max when the create datamap don't as select max and min
- [CARBONDATA-2086] - Create datamap should throw exception if using improper string
- [CARBONDATA-2087] - Order rule should keep consistent for create datamap
- [CARBONDATA-2089] - Test cases is incorrect because it always run success no matter whether the SQL thrown exception
- [CARBONDATA-2092] - Fix compaction bug to prevent the compaction flow from going through the restructure compaction flow
- [CARBONDATA-2094] - Filter DataMap Tables in "Show Table Command"
- [CARBONDATA-2095] - Incorrect data is displayed after stream segment is converted to batch segment .
- [CARBONDATA-2098] - Add documentation for pre-aggregate tables
- [CARBONDATA-2102] - Fix measure min/max value problem while reading from old store
- [CARBONDATA-2104] - Add concurrent command testcase for insert overwrite and insert
- [CARBONDATA-2105] - Incorrect result displays after creating data map
- [CARBONDATA-2107] - Average query is failing when data map has both sum(column) and avg(column) of big int, int type
- [CARBONDATA-2110] - option of TempCsv should be removed since the default delimiter may conflicts with field value
- [CARBONDATA-2112] - Data getting garbled after datamap creation when table is created with GLOBAL SORT
- [CARBONDATA-2113] - Count(*) and select * are not working on old store with V2 format
- [CARBONDATA-2117] - Fixed Synchronization issue while creating multiple carbon session
- [CARBONDATA-2119] - CarbonDataWriterException thrown when loading using global_sort
- [CARBONDATA-2120] - Fixed data mismatch for No dictionary numeric data type
- [CARBONDATA-2121] - Remove tempCSV option for Carbon Dataframe Writer
- [CARBONDATA-2122] - Redirect Bad Record Path Should Throw Exception on Empty Location
- [CARBONDATA-2130] - Find some Spelling error in CarbonData
New Feature
- [CARBONDATA-1552] - Add support of Spark-2.2 in carbon.
- [CARBONDATA-1573] - Support Database Location Configuration while Creating Database/ Support Creation of carbon Table in the database location
- [CARBONDATA-1592] - Add Event Listener interface to Carbondata
- [CARBONDATA-1617] - Merging carbonindex files for each segment.
- [CARBONDATA-1746] - Count Star optimization
- [CARBONDATA-1822] - Support DDL to register the CarbonData table from existing carbon table data
- [CARBONDATA-1844] - Support specify tablePath when creating table
- [CARBONDATA-1884] - Add CTAS support to carbondata
- [CARBONDATA-1921] - Update product document with Merge Index Feature
- [CARBONDATA-1922] - Update product document with Ignoring empty line OPTION
- [CARBONDATA-1968] - Support external table
- [CARBONDATA-2097] - Restriction added to partition table on alter command
Improvement
- [CARBONDATA-1199] - Change Unsafe configuration to dynamic
- [CARBONDATA-1288] - Secure Dictionary Server Port
- [CARBONDATA-1439] - Wrong Error message shown for Bad records even when BAD_RECORDS_LOGGER_ENABLE is set to true
- [CARBONDATA-1444] - CarbonData unsupport Boolean data type
- [CARBONDATA-1481] - Compaction support global_sort
- [CARBONDATA-1505] - Get the detailed blocklet information using default BlockletDataMap for other datamaps
- [CARBONDATA-1531] - Format module should support BOOLEAN
- [CARBONDATA-1539] - Change DataType from enum to class
- [CARBONDATA-1547] - Table should be dropped after running test
- [CARBONDATA-1568] - Optimize annotation of code
- [CARBONDATA-1594] - Add scale and decimal to DecimalType
- [CARBONDATA-1597] - Remove spark1 integration
- [CARBONDATA-1602] - Remove unused declaration in spark-common module
- [CARBONDATA-1624] - If SORT_SCOPE is non-GLOBAL_SORT with Spark, set 'carbon.number.of.cores.while.loading' dynamically as per the available executor cores
- [CARBONDATA-1626] - add datasize and index size to table status file
- [CARBONDATA-1628] - Re-factory LoadTableCommand to reuse code for streaming ingest in the future
- [CARBONDATA-1652] - Add example for spark integration
- [CARBONDATA-1653] - Rename aggType to measureType
- [CARBONDATA-1662] - Make ArrayType and StructType contain child DataType
- [CARBONDATA-1674] - Carbon 1.3.0-Partitioning:Describe Formatted Should show the type of partition as well.
- [CARBONDATA-1686] - Upgrade Presto Version of Current Carbon Data to 0.186
- [CARBONDATA-1698] - support table level compaction configuration
- [CARBONDATA-1704] - Filter Optimization
- [CARBONDATA-1706] - Making index merge DDL insensitive to the property
- [CARBONDATA-1709] - Support sort_columns option in dataframe writer
- [CARBONDATA-1732] - Add S3 support in FileFactory
- [CARBONDATA-1734] - Ignore empty line while reading CSV
- [CARBONDATA-1738] - Block direct load on pre-aggregate table
- [CARBONDATA-1739] - Clean up store path interface
- [CARBONDATA-1741] - Remove AKSK in Log
- [CARBONDATA-1745] - Remove local metastore path
- [CARBONDATA-1756] - Improve Boolean data compress rate by changing RLE to SNAPPY algorithm
- [CARBONDATA-1765] - Remove repeat code of Boolean
- [CARBONDATA-1768] - Upgrade univocity parser to 2.2.1
- [CARBONDATA-1769] - Change alterTableCompaction to support transfer tableInfo
- [CARBONDATA-1770] - Update documents and consolidate DDL,DML,Partition docs
- [CARBONDATA-1771] - While segment_index compaction, .carbonindex files of invalid segments are also getting merged
- [CARBONDATA-1778] - Support clean garbage segments for all
- [CARBONDATA-1779] - GeneriVectorizedReader for Presto
- [CARBONDATA-1785] - Add Coveralls codecoverage badge to carbondata
- [CARBONDATA-1801] - Remove unnecessary mdk computation code
- [CARBONDATA-1803] - Changing format of Show segments
- [CARBONDATA-1804] - Make FileOperations Pluggable
- [CARBONDATA-1805] - Optimize pruning for dictionary loading
- [CARBONDATA-1812] - provide API to get table dynamic information(table size and last modified time)
- [CARBONDATA-1815] - Add AtomicRunnableCommand abstraction
- [CARBONDATA-1816] - Changing BAD_RECORDS_ACTION default action to FAIL
- [CARBONDATA-1818] - Should make carbon.streaming.segment.max.size as configurable
- [CARBONDATA-1819] - Remove profiles for Spark-2.1 and Spark-2.2 in module assembly and spark-common-test
- [CARBONDATA-1821] - Incorrect headings in documentation
- [CARBONDATA-1834] - Multi-user concurrent scene: when running "insert overwrite" task,and parallely executing select query task
- [CARBONDATA-1837] - Reusing old row to reduce memory consumption
- [CARBONDATA-1838] - Refactor SortStepRowUtil to make it more readable
- [CARBONDATA-1843] - Block CTAS and external table syntax
- [CARBONDATA-1864] - Using org.apache.spark.SPARK_VERSION instead of sparkSession.version
- [CARBONDATA-1866] - refactor CarbonLateDecodeRule to split different rule for better usablity
- [CARBONDATA-1867] - Add support for task/segment level pruning
- [CARBONDATA-1870] - Add dictionary path support to carbondata
- [CARBONDATA-1880] - Global Sort maybe generates many small files
- [CARBONDATA-1883] - Improvement in merge index code
- [CARBONDATA-1892] - Documentation update for disability of single_pass on first load
- [CARBONDATA-1894] - Add compactionType Parameter to compaction event
- [CARBONDATA-1897] - Remove column group information in DESC TABLE command
- [CARBONDATA-1898] - Like, Contains, Ends With query optimization in case of or filter
- [CARBONDATA-1900] - Modify loadmetadata to store timestamp long value(in ms) instead of formated date string for fields "loadStartTime" and "timestamp"
- [CARBONDATA-1901] - Fixed Pre aggregate data map creation and query parsing
- [CARBONDATA-1903] - Fix some code issues in carbondata
- [CARBONDATA-1906] - Update registerTempTable method because it was marked deprecated
- [CARBONDATA-1923] - Remove file after running test class
- [CARBONDATA-1928] - Separate the lock property for concurrent load and others
- [CARBONDATA-1932] - Add version for CarbonData
- [CARBONDATA-1939] - Added show segments validation test case
- [CARBONDATA-1945] - documentation for bad records action needs to be updated
- [CARBONDATA-1970] - (Carbon1.3.0 - Spark 2.2) Use Spark 2.2.1 as default version for Profile Spark-2.2
- [CARBONDATA-2012] - Add Transaction support for pre-aggregation table load
- [CARBONDATA-2019] - Enhancement of merge index compaction feature to support creation of merge index file on old store where index file does not contain the blocklet info
- [CARBONDATA-2027] - Fix the Randomly failing Concurrent Test Cases For Ci
- [CARBONDATA-2034] - Improve query performance
- [CARBONDATA-2037] - Store carbondata locations in datamap to make the datamap retrieval faster
- [CARBONDATA-2043] - Configurable wait time for requesting executors and minimum registered executors ratio to continue the block distribution
- [CARBONDATA-2047] - Clean up temp folder after task completion in case of partitioning
- [CARBONDATA-2064] - Add compaction listener
- [CARBONDATA-2076] - Refactored code segregated process meta and process data in load command
- [CARBONDATA-2078] - Add “IF NOT EXISTS” feature for create datamap
- [CARBONDATA-2088] - Optimize syntax for creating timeseries pre-aggregate table
- [CARBONDATA-2090] - Should fix the error message of alter streaming property
- [CARBONDATA-2096] - Should add test case for 'merge_small_files' distribution
- [CARBONDATA-2100] - Should add test case to the result of hand off
- [CARBONDATA-2101] - Restrict Direct query on aggregation and timeseries data map
- [CARBONDATA-2108] - RefactorUnsafe sort property
- [CARBONDATA-2109] - config of dataframe load with tempCSV is invalid ,such as QUOTECHAR
- [CARBONDATA-2111] - TPCH query which has multiple joins inside does not return any rows.
- [CARBONDATA-2123] - Refactor datamap schema thrift and datamap provider to use short name and classname
- [CARBONDATA-2139] - Optimize CTAS documentation and test case
Task
- [CARBONDATA-1347] - implement Columnar Reading Of Data for presto Integration
- [CARBONDATA-1598] - Remove all spark 1.x info(CI, readme, documents, etc.)
- [CARBONDATA-1659] - Remove spark 1.x info in module 'carbondata-spark-common-cluster-test'
- [CARBONDATA-1792] - Adding example of data management for Spark2.X
- [CARBONDATA-1865] - Skip Single Pass for first data load.
- [CARBONDATA-1941] - Document update for Lock Retry
- [CARBONDATA-1942] - Documentation for Concurrent Lock Retries
- [CARBONDATA-2040] - Add standardpartition example and optimize partition test cases
- [CARBONDATA-2050] - Add example of query data with specified segments
- [CARBONDATA-2054] - Add an example: how to use CarbonData batch load to integrate with Spark Streaming.
- [CARBONDATA-2106] - Update product document with page level reader property