Upsert

Syntax
The syntax of the Upsert statement is similar to the syntax of the Insert statement and it is constructed as follows:

upsert into dataset datasetname(
query()
);
Semantics
An upsert operation couples a delete "if exists" with an insert based on the primary key of the record. The two operations are run atomically but the execution of the query doesn't happen under the same transaction context and a materialized result might be computed before upserting records into the dataset.

Implementation Details
-Compilation time
An upsert statement is compiled exactly as an insert statement with the differences only at the storage side of the produced plan. The logical operators used are the exact ones used in the case of the insert statement and the differences are only in their expected inputs and outputs. The difference is only in the expected input and output schema for the operators since an Upsert operator needs the existing values in addition to the new values.

-Runtime
A new flag has been added to AsterixLSMTreeInsertDeleteOperatorDescriptor to indicate when the operation is an upsert operation. If the flag is set, the created push runtime is a new NodePushable that extends LSMIndexInsertUpdateDeleteOperatorNodePushable and named AsterixLSMPrimaryUpsertOperatorNodePushable. The secondary upsert push runtime AsterixLSMSecondaryUpsertOperatorNodePushable extends LSMIndexInsertUpdateDeleteOperatorNodePushable and the upsert operation has a special commit operator "UpsertCommitRuntime" that extends the existing "CommitRuntime". With these changes, the upsert pipeline looks as follows:

The primary Upsert operator perform the following tasks:

Lock on key
Search using key
Delete if found
Insert new record
Produce tuples that consist of the existing record "or NULL" and the new record

The secondary Upsert operator perform the following tasks:

If the existing secondary key equals to the new secondary key, it is a NO OP.
If a previous secondary key exists, it deletes the existing secondary key-primary key pair from the secondary index.
If a new secondary key exists, it inserts the new secondary key-primary key pair from the secondary index.

Example:
data type: [int(pk),int].
existing data:
single record {1,5}

a) upsert {1,5}
In this case, the secondary index upsert the old secondary key == the new secondary key. Hence, it will be no op.
b) upsert{1,2}
In this case, the secondary keys are not equal. Hence, we check:
(previous value exists(not null) which was {5,1}, so we delete it)
(new value is {2,1} so we insert it).
c) upsert {2,2}
In this case, there is no old value, we will only insert the key {2,2} in secondary index.

The Upsert commit operator commits the transaction creating Upsert logs if an old value was deleted in the transaction and an Insert log otherwise.
When an Upsert log is flushed to disk, the operation counter is decremented twice since it performed two operations.

- Special considerations for Filters

If the dataset is filtered, some extra operations must be done specifically for upsert in the case that a record is being replaced with a new value. After the new record is inserted, the filter will be updated by the index itself with the filter value of that record. However the record being upserted may exist in an immutable component, whereas the new record will be in a memory component. Thus searches that use the old filter value may not go into the new component that contains the new values we have inserted, but only see the old value. Thus we must update the filter separately, after inserting the new one into the index, with the filter value of the previously found tuple. This way searches with the old filter value will see the new record value.

Page tree

Upsert