INTRODUCTION

Support update and delete over Big Data.

DESCRIPTION

It supports batch updates like daily update scenarios for OLAP and Base+Delta file based design.

As the systems are not OLTP systems, the data is updated offline.

Also the data for OLAP systems are not very frequently changing data, so updates are made in batches.

Updates are :

  • Periodic to dimension table

  • Batched to fact table

Maintain ACID properties while updating Data:

  • Update is atomic.

  • Update is immediately visible.

  • Allow concurrent query during update operations.

  • Single statement autocommit support, does not support OLTP-style transactions.

  • If the update fails, the user will not get affected.

  • If it updates, then it ensures that the data is correct.

  • While the update query is in progress and simultaneously a query is fired, it will get results from the previous stored values.

  • The query will work on the existing data, till this update is not committed.

FLOW SEQUENCE

Since the data in CarbonData files is immutable, the updates and delete are done via maintaining two files namely:

Insert Delta :

Stores newly added rows

CarbonData file format

Delete Delta :

Store RowId* of rows that are deleted

Bitmap file format

I) Update Flow Sequence

Figure 1 : Flow Sequence for Update

Update flow:

  1. Find all rows that need to be updated, by executing the subquery.

  2. Write the “Delete Delta” file

  3. Write the “Insert Delta” file

II) Delete

Figure 2 : Flow Sequence for Delete

Delete flow:

  1. Find all rows that need to be deleted, by executing the subquery.

  2. Write the “Delete Delta” file

EXAMPLE

I) Data Update :

Figure 3 : Data Updation Process

II) Data Delete :


Figure 4 : Data deletion process

READ FLOW SEQUENCE

Since the values are not physically deleted/updated so, while reading the values, the updated values are read in the following manner.

i) Update Scenario

    Read “Base” file

    Read “Delete Delta” and exclude RowId in the file

    Read “Update Delta” and merge new row

ii) Delete Scenario 

   Read “Base” file

   Read “Delete Delta” and exclude RowId in the file

  *Row ID = Segment -> block -> blocklet -> row

  • No labels