<wip>
Hudi
...
(pronounced
...
“Hoodie”)
...
is
...
implemented
...
as
...
a
...
Spark
...
library,
...
which
...
makes
...
it
...
easy
...
to
...
integrate
...
into
...
existing
...
data
...
pipelines
...
or
...
ingestion
libraries
...
(which
...
we
...
will
...
refer
...
to
...
as
...
`Hudi
...
clients`).
...
Hudi
...
Clients
...
prepare
...
an
...
`RDD[HoodieRecord]`
...
that
...
contains
...
the
...
data
...
to
...
be
...
upserted
...
and
Hudi
...
upsert/insert
...
is
...
merely
...
a
...
Spark
...
DAG,
...
that
...
can
...
be
...
broken
...
into
...
two
...
big
...
pieces.
...
-
...
**Indexing**
...
:
...
A
...
big
...
part
...
of
...
Hudi's
...
efficiency
...
comes
...
from
...
indexing
...
the
...
mapping
...
from
...
record
...
keys
...
to
...
the
...
file
...
ids,
...
to
...
which
...
they
...
belong
...
to.
...
This
...
index
...
also
...
helps
...
the
...
`HoodieWriteClient`
...
separate
...
upserted
...
records
...
into
...
inserts
...
and
...
updates,
...
so
...
they
...
can
...
be
...
treated
...
differently.
...
`HoodieReadClient`
...
supports
...
operations
...
such
...
as
...
`filterExists`
...
(used
...
for
...
de-duplication
...
of
...
table)
...
and
...
an
...
efficient
...
batch
...
`read(keys)`
...
api,
...
that
...
can
...
read
...
out
...
the
...
records
...
corresponding
...
to
...
the
...
keys
...
using
...
the
...
index
...
much
...
quickly,
...
than
...
a
...
typical
...
scan
...
via
...
a
...
query.
...
The
...
index
...
is
...
also
...
atomically
...
updated
...
each
...
commit,
...
and
...
is
...
also
...
rolled
...
back
...
when
...
commits
...
are
...
rolled
...
back.
...
-
...
**Storage**
...
:
...
The
...
storage
...
part
...
of
...
the
...
DAG
...
is
...
responsible
...
for
...
taking
...
an
...
`RDD[HoodieRecord]`,
...
that
...
has
...
been
...
tagged
...
as
...
an
...
insert
...
or
...
update
...
via
...
index
...
lookup,
...
and
...
writing
...
it
...
out
...
efficiently
...
onto
...
storage.
##
...
Index
Hudi
...
currently
...
provides
...
two
...
choices
...
for
...
indexes
...
:
...
`BloomIndex`
...
and
...
`HBaseIndex`
...
to
...
map
...
a
...
record
...
key
...
into
...
the
...
file
...
id
...
to
...
which
...
it
...
belongs
...
to.
...
This
...
enables
us
...
to
...
speed
...
up
...
upserts
...
significantly,
...
without
...
scanning
...
over
...
every
...
record
...
in
...
the
...
dataset.
...
Hudi
...
Indices
...
can
...
be
...
classified
...
based
...
on
their
...
ability
...
to
...
lookup
...
records
...
across
...
partition.
...
A
...
`global`
...
index
...
does
...
not
...
need
...
partition
...
information
...
for
...
finding
...
the
...
file-id
...
for
...
a
...
record
...
key
but
...
a
...
`non-global`
...
does.
####
...
HBase
...
Index
...
(global)
Here,
...
we
...
just
...
use
...
HBase
...
in
...
a
...
straightforward
...
way
...
to
...
store
...
the
...
mapping
...
above.
...
The
...
challenge
...
with
...
using
...
HBase
...
(or
...
any
...
external
...
key-value
...
store
...
for
...
that
...
matter)
...
is
...
performing
...
rollback
...
of
...
a
...
commit
...
and
...
handling
...
partial
...
index
...
updates.
...
Since
...
the
...
HBase
...
table
...
is
...
indexed
...
by
...
record
...
key
...
and
...
not
...
commit
...
Time,
...
we
...
would
...
have
...
to
...
scan
...
all
...
the
...
entries
...
which
...
will
...
be
...
prohibitively
...
expensive.
...
Insteead,
...
we
...
store
...
the
...
commit
...
time
...
with
...
the
...
value
...
and
...
discard
...
its
...
value
...
if
...
it
...
does
...
not
...
belong
...
to
...
a
...
valid
...
commit.
####
...
Bloom
...
Index
...
(non-global)
This
...
index
...
is
...
built
...
by
...
adding
...
bloom
...
filters
...
with
...
a
...
very
...
high
...
false
...
positive
...
tolerance
...
(e.g:
...
1/10^9),
...
to
...
the
...
parquet
...
file
...
footers.
The
...
advantage
...
of
...
this
...
index
...
over
...
HBase
...
is
...
the
...
obvious
...
removal
...
of
...
a
...
big
...
external
...
dependency,
...
and
...
also
...
nicer
...
handling
...
of
...
rollbacks
...
&
...
partial
...
updates
since
...
the
...
index
...
is
...
part
...
of
...
the
...
data
...
file
...
itself.
At
...
runtime,
...
checking
...
the
...
Bloom
...
Index
...
for
...
a
...
given
...
set
...
of
...
record
...
keys
...
effectively
...
amounts
...
to
...
checking
...
all
...
the
...
bloom
...
filters
...
within
...
a
...
given
partition,
...
against
...
the
...
incoming
...
records,
...
using
...
a
...
Spark
...
join.
...
Much
...
of
...
the
...
engineering
...
effort
...
towards
...
the
...
Bloom
...
index
...
has
...
gone
...
into
...
scaling
...
this
...
join
by
...
caching
...
the
...
incoming
...
RDD[HoodieRecord]
...
and
...
dynamically
...
tuning
...
join
...
parallelism,
...
to
...
avoid
...
hitting
...
Spark
...
limitations
...
like
...
2GB
...
maximum
for
...
partition
...
size.
...
As
...
a
...
result,
...
Bloom
...
Index
...
implementation
...
has
...
been
...
able
...
to
...
handle
...
single
...
upserts
...
upto
...
5TB,
...
in
...
a
...
reliable
...
manner.
DAG with Range Pruning:
draw.io Diagram | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
## Storage
The
...
implementation
...
specifics
...
of
...
the
...
two
...
storage
...
types,
...
introduced
...
in
...
[concepts](concepts.html)
...
section,
...
are
...
detailed
...
below.
####
...
Copy
...
On
...
Write
The
...
Spark
...
DAG
...
for
...
this
...
storage,
...
is
...
relatively
...
simpler.
...
The
...
key
...
goal
...
here
...
is
...
to
...
group
...
the
...
tagged
...
Hudi
...
record
...
RDD,
...
into
...
a
...
series
...
of
updates
...
and
...
inserts,
...
by
...
using
...
a
...
partitioner.
...
To
...
achieve
...
the
...
goals
...
of
...
maintaining
...
file
...
sizes,
...
we
...
first
...
sample
...
the
...
input
...
to
...
obtain
...
a
...
`workload
...
profile`
that
...
understands
...
the
...
spread
...
of
...
inserts
...
vs
...
updates,
...
their
...
distribution
...
among
...
the
...
partitions
...
etc.
...
With
...
this
...
information,
...
we
...
bin-pack
...
the
records
...
such
...
that
...
-
...
For
...
updates,
...
the
...
latest
...
version
...
of
...
the
...
that
...
file
...
id,
...
is
...
rewritten
...
once,
...
with
...
new
...
values
...
for
...
all
...
records
...
that
...
have
...
changed
...
-
...
For
...
inserts,
...
the
...
records
...
are
...
first
...
packed
...
onto
...
the
...
smallest
...
file
...
in
...
each
...
partition
...
path,
...
until
...
it
...
reaches
...
the
...
configured
...
maximum
...
size.
...
Any
...
remaining
...
records
...
after
...
that,
...
are
...
again
...
packed
...
into
...
new
...
file
...
id
...
groups,
...
again
...
meeting
...
the
...
size
...
requirements.
In
...
this
...
storage,
...
index
...
updation
...
is
...
a
...
no-op,
...
since
...
the
...
bloom
...
filters
...
are
...
already
...
written
...
as
...
a
...
part
...
of
...
committing
...
data.
In
...
the
...
case
...
of
...
Copy-On-Write,
...
a
...
single
...
parquet
...
file
...
constitutes
...
one
...
`file
...
slice`
...
which
...
contains
...
one
...
complete
...
version
...
of
the
...
file
{%
...
include
...
image.html
...
file="hudi_log_format_v2.png"
...
alt="hudi_log_format_v2.png"
...
max-width="1000"
...
%}
####
...
Merge
...
On
...
Read
In
...
the
...
Merge-On-Read
...
storage
...
model,
...
there
...
are
...
2
...
logical
...
components
...
-
...
one
...
for
...
ingesting
...
data
...
(both
...
inserts/updates)
...
into
...
the
...
dataset
...
and
...
another
...
for
...
creating
...
compacted
...
views.
...
The
...
former
...
is
...
hereby
...
referred
...
to
...
as
...
`Writer`
...
while
...
the
...
later
...
is
...
referred
...
as
...
`Compactor`.
#####
...
Merge
...
On
...
Read
...
Writer
...
At
...
a
...
high
...
level,
...
Merge-On-Read
...
Writer
...
goes
...
through
...
same
...
stages
...
as
...
Copy-On-Write
...
writer
...
in
...
ingesting
...
data.
...
The
...
key
...
difference
...
here
...
is
...
that
...
updates
...
are
...
appended
...
to
...
latest
...
log
...
(delta)
...
file
...
belonging
...
to
...
the
...
latest
...
file
...
slice
...
without
...
merging.
...
For
...
inserts,
...
Hudi
...
supports
...
2
...
modes:
...
1.
...
Inserts
...
to
...
Log
...
Files
...
-
...
This
...
is
...
done
...
for
...
datasets
...
that
...
have
...
an
...
indexable
...
log
...
files
...
(for
...
eg
...
global
...
index)
...
2.
...
Inserts
...
to
...
parquet
...
files
...
-
...
This
...
is
...
done
...
for
...
datasets
...
that
...
do
...
not
...
have
...
indexable
...
log
...
files,
...
for
...
eg
...
bloom
...
index
...
embedded
...
in
...
parquer
...
files.
...
Hudi
...
treats
...
writing
...
new
...
records
...
in
...
the
...
same
...
way
...
as
...
inserting
...
to
...
Copy-On-Write
...
files.
As
...
in
...
the
...
case
...
of
...
Copy-On-Write,
...
the
...
input
...
tagged
...
records
...
are
...
partitioned
...
such
...
that
...
all
...
upserts
...
destined
...
to
a
...
`file
...
id`
...
are
...
grouped
...
together.
...
This
...
upsert-batch
...
is
...
written
...
as
...
one
...
or
...
more
...
log-blocks
...
written
...
to
...
log-files.
Hudi
...
allows
...
clients
...
to
...
control
...
log
...
file
...
sizes
...
(See
...
[Storage
...
Configs](../configurations))
The
...
WriteClient
...
API
...
is
...
same
...
for
...
both
...
Copy-On-Write
...
and
...
Merge-On-Read
...
writers.
With
...
Merge-On-Read,
...
several
...
rounds
...
of
...
data-writes
...
would
...
have
...
resulted
...
in
...
accumulation
...
of
...
one
...
or
...
more
...
log-files.
All
...
these
...
log-files
...
along
...
with
...
base-parquet
...
(if
...
exists)
...
constitute
...
a
...
`file
...
slice`
...
which
...
represents
...
one
...
complete
...
version
of
...
the
...
file.
####
...
Compactor
Realtime
...
Readers
...
will
...
perform
...
in-situ
...
merge
...
of
...
these
...
delta
...
log-files
...
to
...
provide
...
the
...
most
...
recent
...
(committed)
...
view
...
of
the
...
dataset.
...
To
...
keep
...
the
...
query-performance
...
in
...
check
...
and
...
eventually
...
achieve
...
read-optimized
...
performance,
...
Hudi
...
supports
compacting
...
these
...
log-files
...
asynchronously
...
to
...
create
...
read-optimized
...
views.
Asynchronous
...
Compaction
...
involves
...
2
...
steps:
...
*
...
`Compaction
...
Schedule`
...
:
...
Hudi
...
Write
...
Client
...
exposes
...
API
...
to
...
create
...
Compaction
...
plans
...
which
...
contains
...
the
...
list
...
of
...
`file
...
slice`
...
to
...
be
...
compacted
...
atomically
...
in
...
a
...
single
...
compaction
...
commit.
...
Hudi
...
allows
...
pluggable
...
strategies
...
for
...
choosing
...
file
...
slices
...
for
...
each
...
compaction
...
runs.
...
This
...
step
...
is
...
typically
...
done
...
inline
...
by
...
Writer
...
process
...
as
...
Hudi
...
expects
...
only
...
one
...
schedule
...
is
...
being
...
generated
...
at
...
a
...
time
...
which
...
allows
...
Hudi
...
to
...
enforce
...
the
...
constraint
...
that
...
pending
...
compaction
...
plans
...
do
...
not
...
step
...
on
...
each
...
other
...
file-slices.
...
This
...
constraint
...
allows
...
for
...
multiple
...
concurrent
...
`Compactors`
...
to
...
run
...
at
...
the
...
same
...
time.
...
Some
...
of
...
the
...
common
...
strategies
...
used
...
for
...
choosing
...
`file
...
slice`
...
for
...
compaction
...
are:
...
*
...
BoundedIO
...
-
...
Limit
...
the
...
number
...
of
...
file
...
slices
...
chosen
...
for
...
a
...
compaction
...
plan
...
by
...
expected
...
total
...
IO
...
(read
...
+
...
write)
...
needed
...
to
...
complete
...
compaction
...
run
...
*
...
Log
...
File
...
Size
...
-
...
Prefer
...
file-slices
...
with
...
larger
...
amounts
...
of
...
delta
...
log
...
data
...
to
...
be
...
merged
...
*
...
Day
...
Based
...
-
...
Prefer
...
file
...
slice
...
belonging
...
to
...
latest
...
day
...
partitions
...
*
...
`Compactor`
...
:
...
Hudi
...
provides
...
a
...
separate
...
API
...
in
...
Write
...
Client
...
to
...
execute
...
a
...
compaction
...
plan.
...
The
...
compaction
...
plan
...
(just
...
like
...
a
...
commit)
...
is
...
identified
...
by
...
a
...
timestamp.
...
Most
...
of
...
the
...
design
...
and
...
implementation
...
complexities
...
for
...
Async
...
Compaction
...
is
...
for
...
guaranteeing
...
snapshot
...
isolation
...
to
...
readers
...
and
...
writer
...
when
...
multiple
...
concurrent
...
compactors
...
are
...
running.
...
Typical
...
compactor
...
deployment
...
involves
...
launching
...
a
...
separate
...
spark
...
application
...
which
...
executes
...
pending
...
compactions
...
when
...
they
...
become
...
available.
...
The
...
core
...
logic
...
of
...
compacting
...
file
...
slices
...
in
...
the
...
Compactor
...
is
...
very
...
similar
...
to
...
that
...
of
...
merging
...
updates
...
in
...
a
...
Copy-On-Write
...
table.
...
The
...
only
...
difference
...
being
...
in
...
the
...
case
...
of
...
compaction,
...
there
...
is
...
an
...
additional
...
step
...
of
...
merging
...
the
...
records
...
in
...
delta
...
log-files.
Here
...
are
...
the
...
main
...
API
...
to
...
lookup
...
and
...
execute
...
a
...
compaction
...
plan.
```
...
Main
...
API
...
in
...
HoodieWriteClient
...
for
...
running
...
Compaction:
...
/**
...
*
...
Performs
...
Compaction
...
corresponding
...
to
...
instant-time
...
*
...
@param
...
compactionInstantTime
...
Compaction
...
Instant
...
Time
...
*
...
@return
...
*
...
@throws
...
IOException
...
*/
...
public
...
JavaRDD<WriteStatus>
...
compact(String
...
compactionInstantTime)
...
throws
...
IOException;
...
To
...
lookup
...
all
...
pending
...
compactions,
...
use
...
the
...
API
...
defined
...
in
...
HoodieReadClient
...
/**
...
*
...
Return
...
all
...
pending
...
compactions
...
with
...
instant
...
time
...
for
...
clients
...
to
...
decide
...
what
...
to
...
compact
...
next.
...
*
...
@return
...
*/
...
public
...
List<Pair<String,
...
HoodieCompactionPlan>>
...
getPendingCompactions();
```
API
...
for
...
scheduling
...
compaction
```
...
/**
...
*
...
Schedules
...
a
...
new
...
compaction
...
instant
...
*
...
@param
...
extraMetadata
...
*
...
@return
...
Compaction
...
Instant
...
timestamp
...
if
...
a
...
new
...
compaction
...
plan
...
is
...
scheduled
...
*/
...
Optional<String>
...
scheduleCompaction(Optional<Map<String,
...
String>>
...
extraMetadata)
...
throws
...
IOException;
```
Refer
...
to
...
__hoodie-client/src/test/java/HoodieClientExample.java__
...
class
...
for
...
an
...
example
...
of
...
how
...
compaction
is
...
scheduled
...
and
...
executed.
#####
...
Deployment
...
Models
These
...
are
...
typical
...
Hudi
...
Writer
...
and
...
Compaction
...
deployment
...
models
...
*
...
`Inline
...
Compaction`
...
:
...
At
...
each
...
round,
...
a
...
single
...
spark
...
application
...
ingests
...
new
...
batch
...
to
...
dataset.
...
It
...
then
...
optionally
...
decides
...
to
...
schedule
...
a
...
compaction
...
run
...
and
...
executes
...
it
...
in
...
sequence.
...
*
...
`Single
...
Dedicated
...
Async
...
Compactor`
...
:
...
The
...
Spark
...
application
...
which
...
brings
...
in
...
new
...
changes
...
to
...
dataset
...
(writer)
...
periodically
...
schedules
...
compaction.
...
The
...
Writer
...
application
...
does
...
not
...
run
...
compaction
...
inline.
...
A
...
separate
...
spark
...
applications
...
periodically
...
probes
...
for
...
pending
...
compaction
...
and
...
executes
...
the
...
compaction.
...
*
...
`
...
Multi
...
Async
...
Compactors`
...
:
...
This
...
mode
...
is
...
similar
...
to
...
`Single
...
Dedicated
...
Async
...
Compactor`
...
mode.
...
The
...
main
...
difference
...
being
...
now
...
there
...
can
...
be
...
more
...
than
...
one
...
spark
...
application
...
picking
...
different
...
compactions
...
and
...
executing
...
them
...
in
...
parallel.
...
In
...
order
...
to
...
ensure
...
compactors
...
do
...
not
...
step
...
on
...
each
...
other,
...
they
...
use
...
coordination
...
service
...
like
...
zookeeper
...
to
...
pickup
...
unique
...
pending
...
compaction
...
instants
...
and
...
run
...
them.
The
...
Compaction
...
process
...
requires
...
one
...
executor
...
per
...
file-slice
...
in
...
the
...
compaction
...
plan.
...
So,
...
the
...
best
...
resource
...
allocation
strategy
...
(both
...
in
...
terms
...
of
...
speed
...
and
...
resource
...
usage)
...
for
...
clusters
...
supporting
...
dynamic
...
allocation
...
is
...
to
...
lookup
...
the
...
compaction
plan
...
to
...
be
...
run
...
to
...
figure
...
out
...
the
...
number
...
of
...
file
...
slices
...
being
...
compacted
...
and
...
choose
...
that
...
many
...
number
...
of
...
executors.
##
...
Async
...
Compaction
...
Design
...
Deep-Dive
...
(Optional)
For
...
the
...
purpose
...
of
...
this
...
section,
...
it
...
is
...
important
...
to
...
distinguish
...
between
...
2
...
types
...
of
...
commits
...
as
...
pertaining
...
to
...
the
...
file-group:
A
...
commit
...
which
...
generates
...
a
...
merged
...
and
...
read-optimized
...
file-slice
...
is
...
called
...
`snapshot
...
commit`
...
(SC)
...
with
...
respect
...
to
...
that
...
file-group.
A
...
commit
...
which
...
merely
...
appended
...
the
...
new/updated
...
records
...
assigned
...
to
...
the
...
file-group
...
into
...
a
...
new
...
log
...
block
...
is
...
called
...
`delta
...
commit`
...
(DC)
with
...
respect
...
to
...
that
...
file-group.
###
...
Algorithm
The
...
algorithm
...
is
...
described
...
with
...
an
...
illustration.
...
Let
...
us
...
assume
...
a
...
scenario
...
where
...
there
...
are
...
commits
...
SC1,
...
DC2,
...
DC3
...
that
...
have
already
...
completed
...
on
...
a
...
data-set.
...
Commit
...
DC4
...
is
...
currently
...
ongoing
...
with
...
the
...
writer
...
(ingestion)
...
process
...
using
...
it
...
to
...
upsert
...
data.
Let
...
us
...
also
...
imagine
...
there
...
are
...
a
...
set
...
of
...
file-groups
...
(FG1
...
…
...
FGn)
...
in
...
the
...
data-set
...
whose
...
latest
...
version
...
(`File-Slice`)
contains
...
the
...
base
...
file
...
created
...
by
...
commit
...
SC1
...
(snapshot-commit
...
in
...
columnar
...
format)
...
and
...
a
...
log
...
file
...
containing
...
row-based
log
...
blocks
...
of
...
2
...
delta-commits
...
(DC2
...
and
...
DC3).
{%
...
include
...
image.html
...
file="async_compac_1.png"
...
alt="async_compac_1.png"
...
max-width="1000"
...
%}
...
*
...
Writer
...
(Ingestion)
...
that
...
is
...
going
...
to
...
commit
...
"DC4"
...
starts.
...
The
...
record
...
updates
...
in
...
this
...
batch
...
are
...
grouped
...
by
...
file-groups
...
and
...
appended
...
in
...
row
...
formats
...
to
...
the
...
corresponding
...
log
...
file
...
as
...
delta
...
commit.
...
Let
...
us
...
imagine
...
a
...
subset
...
of
...
file-groups
...
has
...
this
...
new
...
log
...
block
...
(delta
...
commit)
...
DC4
...
added.
...
*
...
Before
...
the
...
writer
...
job
...
completes,
...
it
...
runs
...
the
...
compaction
...
strategy
...
to
...
decide
...
which
...
file-group
...
to
...
compact
...
by
...
compactor
...
and
...
creates
...
a
...
new
...
compaction-request
...
commit
...
SC5.
...
This
...
commit
...
file
...
is
...
marked
...
as
...
“requested”
...
with
...
metadata
...
denoting
...
which
...
fileIds
...
to
...
compact
...
(based
...
on
...
selection
...
policy).
...
Writer
...
completes
...
without
...
running
...
compaction
...
(will
...
be
...
run
...
async).
...
{%
...
include
...
image.html
...
file="async_compac_2.png"
...
alt="async_compac_2.png"
...
max-width="1000"
...
%}
...
*
...
Writer
...
job
...
runs
...
again
...
ingesting
...
next
...
batch.
...
It
...
starts
...
with
...
commit
...
DC6.
...
It
...
reads
...
the
...
earliest
...
inflight
...
compaction
...
request
...
marker
...
commit
...
in
...
timeline
...
order
...
and
...
collects
...
the
...
(fileId,
...
Compaction
...
Commit
...
Id
...
“CcId”
...
)
...
pairs
...
from
...
meta-data.
...
Ingestion
...
DC6
...
ensures
...
a
...
new
...
file-slice
...
with
...
base-commit
...
“CcId”
...
gets
...
allocated
...
for
...
the
...
file-group.
...
The
...
Writer
...
will
...
simply
...
append
...
records
...
in
...
row-format
...
to
...
the
...
first
...
log-file
...
(as
...
delta-commit)
...
assuming
...
the
...
base-file
...
(“Phantom-Base-File”)
...
will
...
be
...
created
...
eventually
...
by
...
the
...
compactor.
...
{%
...
include
...
image.html
...
file="async_compac_3.png"
...
alt="async_compac_3.png"
...
max-width="1000"
...
%}
...
*
...
Compactor
...
runs
...
at
...
some
...
time
...
and
...
commits
...
at
...
“Tc”
...
(concurrently
...
or
...
before/after
...
Ingestion
...
DC6).
...
It
...
reads
...
the
...
commit-timeline
...
and
...
finds
...
the
...
first
...
unprocessed
...
compaction
...
request
...
marker
...
commit.
...
Compactor
...
reads
...
the
...
commit’s
...
metadata
...
finding
...
the
...
file-slices
...
to
...
be
...
compacted.
...
It
...
compacts
...
the
...
file-slice
...
and
...
creates
...
the
...
missing
...
base-file
...
(“Phantom-Base-File”)
...
with
...
“CCId”
...
as
...
the
...
commit-timestamp.
...
Compactor
...
then
...
marks
...
the
...
compaction
...
commit
...
timestamp
...
as
...
completed.
...
It
...
is
...
important
...
to
...
realize
...
that
...
at
...
data-set
...
level,
...
there
...
could
...
be
...
different
...
file-groups
...
requesting
...
compaction
...
at
...
different
...
commit
...
timestamps.
...
{%
...
include
...
image.html
...
file="async_compac_4.png"
...
alt="async_compac_4.png"
...
max-width="1000"
...
%}
...
*
...
Near
...
Real-time
...
reader
...
interested
...
in
...
getting
...
the
...
latest
...
snapshot
...
will
...
have
...
2
...
cases.
...
Let
...
us
...
assume
...
that
...
the
...
incremental
...
ingestion
...
(writer
...
at
...
DC6)
...
happened
...
before
...
the
...
compaction
...
(some
...
time
...
“Tc”’).
...
...
The
...
below
...
description
...
is
...
with
...
regards
...
to
...
compaction
...
from
...
file-group
...
perspective.
...
*
...
`Reader
...
querying
...
at
...
time
...
between
...
ingestion
...
completion
...
time
...
for
...
DC6
...
and
...
compaction
...
finish
...
“Tc”`:
...
Hudi’s
...
implementation
...
will
...
be
...
changed
...
to
...
become
...
aware
...
of
...
file-groups
...
currently
...
waiting
...
for
...
compaction
...
and
...
merge
...
log-files
...
corresponding
...
to
...
DC2-DC6
...
with
...
the
...
base-file
...
corresponding
...
to
...
SC1.
...
In
...
essence,
...
Hudi
...
will
...
create
...
a
...
pseudo
...
file-slice
...
by
...
combining
...
the
...
2
...
file-slices
...
starting
...
at
...
base-commits
...
SC1
...
and
...
SC5
...
to
...
one.
...
For
...
file-groups
...
not
...
waiting
...
for
...
compaction,
...
the
...
reader
...
behavior
...
is
...
essentially
...
the
...
same
...
-
...
read
...
latest
...
file-slice
...
and
...
merge
...
on
...
the
...
fly.
...
*
...
`Reader
...
querying
...
at
...
time
...
after
...
compaction
...
finished
...
(>
...
“Tc”)`
...
:
...
In
...
this
...
case,
...
reader
...
will
...
not
...
find
...
any
...
pending
...
compactions
...
in
...
the
...
timeline
...
and
...
will
...
simply
...
have
...
the
...
current
...
behavior
...
of
...
reading
...
the
...
latest
...
file-slice
...
and
...
merging
...
on-the-fly.
...
*
...
Read-Optimized
...
View
...
readers
...
will
...
query
...
against
...
the
...
latest
...
columnar
...
base-file
...
for
...
each
...
file-groups.
The
...
above
...
algorithm
...
explains
...
Async
...
compaction
...
w.r.t
...
a
...
single
...
compaction
...
run
...
on
...
a
...
single
...
file-group.
...
It
...
is
...
important
to
...
note
...
that
...
multiple
...
compaction
...
plans
...
can
...
be
...
run
...
concurrently
...
as
...
they
...
are
...
essentially
...
operating
...
on
...
different
file-groups.