Update and Delete Support

INTRODUCTION

Support update and delete over Big Data.

DESCRIPTION

It supports batch updates like daily update scenarios for OLAP and Base+Delta file based design.

As the systems are not OLTP systems, the data is updated offline.

Also the data for OLAP systems are not very frequently changing data, so updates are made in batches.

Updates are :

Maintain ACID properties while updating Data：

Update is atomic.
Update is immediately visible.
Allow concurrent query during update operations.
Single statement autocommit support, does not support OLTP-style transactions.
If the update fails, the user will not get affected.
If it updates, then it ensures that the data is correct.
While the update query is in progress and simultaneously a query is fired, it will get results from the previous stored values.
The query will work on the existing data, till this update is not committed.

FLOW SEQUENCE

Since the data in CarbonData files is immutable, the updates and delete are done via maintaining two files namely:

Insert Delta :

Stores newly added rows

CarbonData file format

Delete Delta :

Store RowId* of rows that are deleted

Bitmap file format

I) Update Flow Sequence

Figure 1 : Flow Sequence for Update

Update flow：

II) Delete

Figure 2 : Flow Sequence for Delete

Delete flow：

EXAMPLE

I) Data Update :

Figure 3 : Data Updation Process

II) Data Delete :

Figure 4 : Data deletion process

READ FLOW SEQUENCE

Since the values are not physically deleted/updated so, while reading the values, the updated values are read in the following manner.

i) Update Scenario

Read “Base” file

Read “Delete Delta” and exclude RowId in the file

Read “Update Delta” and merge new row

ii) Delete Scenario

Read “Base” file

Read “Delete Delta” and exclude RowId in the file

*Row ID = Segment -> block -> blocklet -> row

Page tree