Avoids reading all fate ids into memory. by keith-turner · Pull Request #4129 · apache/accumulo

keith-turner · 2024-01-04T23:16:07Z

As we change Accumulo to use FATE for per tablet operations its important to avoid reading all FATEs persisted data into memory. This commit modifies FATE to use Streams internally instead of Collections. For the Accumulo implemention of FATE storage this makes it possible to have java stream backed by a scanner which avoids reading all of the FATE ids into memory. The Zookeeper storage implementation will still read everything into memory.

Another change that was made in the PR was optimizing the Accumulo storage layer to read the status while reading the id. Before this change ids were read from scanner, then for each id a scanner was created to read the status. Now the status and id are read in stream from the same scanner which should be much faster. This change was not possible for Zookeeper, it will still make an RPC to get each status. Its ok that Zookeeper store is less efficient as the Accumulo store will likely store orders of magnitude more data. Its probably not possible to make the same optimizations for speed and memory in the zookeeper store.

A bug in the Fate integration test was fixed by using the Unknown status which represents the status for transaction that does not exists in the persisted store. Ran into this bug while testing these changes.

As we change Accumulo to use FATE for per tablet operations its important to avoid reading all FATEs persisted data into memory. This commit modifies FATE to use Streams internally instead of Collections. For the Accumulo implemention of FATE storage this makes it possible to have java stream backed by a scanner which avoids reading all of the FATE ids into memory. The Zookeeper storage implementation will still read everything into memory. Another change that was made in the PR was optimizing the Accumulo storage layer to read the status while reading the id. Before this change ids were read from scanner, then for each id a scanner was created to read the status. Now the status and id are read in stream from the same scanner which should be much faster. This change was not possible for Zookeeper, it will still make an RPC to get each status. Its ok that Zookeeper store is less efficient as the Accumulo store will likely store orders of magnitude more data. Its probably not possible to make the same optimizations for speed and memory in the zookeeper store. A bug in the Fate integration test was fixed by using the Unknown status which represents the status for transaction that does not exists in the persisted store. Ran into this bug while testing these changes.

cshannon

LGTM, the changes here are nice, especially optimizing the status look up

cshannon · 2024-01-05T13:48:29Z

+      return scanner.stream().onClose(scanner::close).map(e -> {
+        return new FateIdStatus(parseTid(e.getKey().getRow().toString())) {
+          @Override
+          public TStatus getStatus() {
+            return TStatus.valueOf(e.getValue().toString());
+          }
+        };
+      });


Suggested change

return scanner.stream().onClose(scanner::close).map(e -> {

return new FateIdStatus(parseTid(e.getKey().getRow().toString())) {

@Override

public TStatus getStatus() {

return TStatus.valueOf(e.getValue().toString());

}

};

});

return scanner.stream().onClose(scanner::close)

.map(e -> new FateIdStatus(parseTid(e.getKey().getRow().toString())) {

@Override

public TStatus getStatus() {

return TStatus.valueOf(e.getValue().toString());

}

});

Just a small simplification which you can ignore

I tried removing the code block when writing and thought it was just too much going on on a single line. Decided to add the code block back so that the lambda declaration and anonymous class declaration were on separate lines.

cshannon · 2024-01-05T13:51:15Z

      assertEquals(1, callStarted.getCount());
      fate.delete(txid);
-      assertThrows(getNoTxExistsException(), () -> getTxStatus(sctx, txid));
+      assertEquals(UNKNOWN, getTxStatus(sctx, txid));


I'm glad you fixed this issue, I made the comment about the race condition because I encountered the same bug and I forgot to make a follow on issue to fix the test because it kept failing for me on occasion too

keith-turner added 2 commits January 4, 2024 18:11

replace innner classes w/ anonymous classes

3170c58

keith-turner requested a review from cshannon January 4, 2024 23:30

fix javadoc link

138b782

keith-turner mentioned this pull request Jan 5, 2024

Make FATE age off memory efficient #4130

Closed

Merge branch 'elasticity' into fate-stream

6f18c5e

cshannon approved these changes Jan 5, 2024

View reviewed changes

keith-turner merged commit f46b09a into apache:elasticity Jan 5, 2024

keith-turner deleted the fate-stream branch January 5, 2024 14:04

ctubbsii added this to the 4.0.0 milestone Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoids reading all fate ids into memory.#4129

Avoids reading all fate ids into memory.#4129
keith-turner merged 4 commits into
apache:elasticityfrom
keith-turner:fate-stream

keith-turner commented Jan 4, 2024

Uh oh!

cshannon left a comment

Uh oh!

cshannon Jan 5, 2024

Uh oh!

cshannon Jan 5, 2024

Uh oh!

keith-turner Jan 5, 2024

Uh oh!

cshannon Jan 5, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

keith-turner commented Jan 4, 2024

Uh oh!

cshannon left a comment

Choose a reason for hiding this comment

Uh oh!

cshannon Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

cshannon Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

keith-turner Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

cshannon Jan 5, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants