Skip to content

persist time to tablet in bulk update#4072

Merged
keith-turner merged 3 commits into
apache:elasticityfrom
keith-turner:bulk-time
Feb 21, 2024
Merged

persist time to tablet in bulk update#4072
keith-turner merged 3 commits into
apache:elasticityfrom
keith-turner:bulk-time

Conversation

@keith-turner
Copy link
Copy Markdown
Contributor

When bulk import operations set time and a tablet was hosted the time was not persisted. The bulk import fate operation now persist time in tablet metadata. The tablet code assumed it was the only thing updating a tablets time field. The tablet code was modified to accomodate the bulk import code running in the manager updating the tablets time column in the metadata table.

When bulk import operations set time and a tablet was hosted the time
was not persisted.  The bulk import fate operation now persist time in
tablet metadata.  The tablet code assumed it was the only thing
updating a tablets time field.  The tablet code was modified to
accomodate the bulk import code running in the manager updating the
tablets time column in the metadata table.
Copy link
Copy Markdown
Contributor

@dlmarion dlmarion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how tablet time works in general. It might be useful to have a discussion about it so that I/we can understand the changes here.

Comment thread server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java Outdated
@keith-turner
Copy link
Copy Markdown
Contributor Author

keith-turner commented Dec 15, 2023

I'm not quite sure how tablet time works in general.

Here is some info. Each tablet has a concept of time that only moves forward. This time is persisted in the tablet metadata table entry. As mutations arrive in a tablet, if the time was not explicitly set then its set on the mutation and tablets time is incremented. For bulk import one can optionally set the tablets current time on an entire file. The way this works is that it allocates a timestamp for the tablet and then persist this with the bulk file entry. When the bulk file is read, if a timestamp is present for the file then its applied to everything in the file.

  1. Batch write row=a value=1
  2. Bulk import row=a value=2 and set time.

With the ability to set time on bulk imports, it allows bulk imports to be properly orders w.r.t. write. Like the above would set the timestamp on the 2nd write by the bulk import such that its higher than the timestamp on the first write.

This PR adds coordination between a hosted tablet and the bulk import code running in the manager to ensure that timestamp is set correctly.

@dlmarion
Copy link
Copy Markdown
Contributor

Does this close #3354 ?

@dlmarion dlmarion linked an issue Jan 23, 2024 that may be closed by this pull request
@keith-turner keith-turner merged commit 4378e02 into apache:elasticity Feb 21, 2024
@keith-turner keith-turner deleted the bulk-time branch February 21, 2024 21:12
keith-turner added a commit to keith-turner/accumulo that referenced this pull request Apr 11, 2024
This todo was already done in apache#4072
keith-turner added a commit that referenced this pull request Apr 11, 2024
This todo was already done in #4072
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Handle setting time in bulk import FATE operation

3 participants