INTRODUCTION
Support update and delete over Big Data.
DESCRIPTION
It supports batch updates like daily update scenarios for OLAP and Base+Delta file based design.
As the systems are not OLTP systems, the data is updated offline.
Also the data for OLAP systems are not very frequently changing data, so updates are made in batches.
Updates are :
Periodic to dimension table
Batched to fact table
Maintain ACID properties while updating Data:
Update is atomic.
Update is immediately visible.
Allow concurrent query during update operations.
Single statement autocommit support, does not support OLTP-style transactions.
If the update fails, the user will not get affected.
If it updates, then it ensures that the data is correct.
While the update query is in progress and simultaneously a query is fired, it will get results from the previous stored values.
The query will work on the existing data, till this update is not committed.
FLOW SEQUENCE
Since the data in CarbonData files is immutable, the updates and delete are done via maintaining two files namely:
Insert Delta :
Stores newly added rows
CarbonData file format
Delete Delta :
Store RowId* of rows that are deleted
Bitmap file format
I) Update Flow Sequence
Figure 1 : Flow Sequence for Update
Update flow:
Find all rows that need to be updated, by executing the subquery.
Write the “Delete Delta” file
Write the “Insert Delta” file
II) Delete
Figure 2 : Flow Sequence for Delete
Delete flow:
Find all rows that need to be deleted, by executing the subquery.
Write the “Delete Delta” file
EXAMPLE
I) Data Update :
Figure 3 : Data Updation Process
II) Data Delete :
Figure 4 : Data deletion process
READ FLOW SEQUENCE
Since the values are not physically deleted/updated so, while reading the values, the updated values are read in the following manner.
i) Update Scenario
Read “Base” file
Read “Delete Delta” and exclude RowId in the file
Read “Update Delta” and merge new row
ii) Delete Scenario
Read “Base” file
Read “Delete Delta” and exclude RowId in the file
*Row ID = Segment -> block -> blocklet -> row