tests merge idempotence and fixes found bug#4643
Conversation
Adds a way to make Fate in the manager always runs steps multiple times to test idempotence. Created versions of merge and delete rows ITs that uses this test manager. Found and fixed a bug in the merge code when running these new test. Changed mini accumulo to allow specifying a different manager class to use prio to startup. There was an existing method that would allow starting a manager with a different class, but using it would have meant letting the actuall manager start and then killing it and starting another which would have added time to test. The change to mini accumulo was not made to its public API, only its implementation. fixes apache#4642
|
I manually verified that when the test runs that fate steps run twice by looking at the logs. Saw logs like the following when doing this. It would be good to verify this at run time during the test, but have not been able to figure out a good way to do this. |
|
Could expand the usage of this new testing technique to cover more of the Fate operations like bulk import, tablet compaction, etc. Will look into that after this, should be easy to do, just need to find the right existing test to extend. |
| public class DeleteRowsFlakyFateIT extends DeleteRowsIT { | ||
| @Override | ||
| public void configureMiniCluster(MiniAccumuloConfigImpl cfg, Configuration hadoopCoreSite) { | ||
| cfg.setServerClass(ServerType.MANAGER, FlakyFateManager.class); |
There was a problem hiding this comment.
I'd bet that there are some other tests, related to Scan Servers and/or External Compactions, that could use this new method instead of stopping the "normal" implementation and starting a different one for the test. It's probably worth a new ticket for this.
There was a problem hiding this comment.
Would be good to look into and see if it could simplify those test.
There was a problem hiding this comment.
I'll open an issue
|
@keith-turner - I took a look over this fix this morning since I worked on no-chop merge and added the merge marker to handle the indempotent case in #3975 and it looks good to me. This does not affect 2.1 since the merge code is entirely different, but I was curious about main because the merged marker and no-chop merge also exists there in #3957. I took a look and found the following that I had added so I do not think this bug is currently an issue in main. In the merge code in main we are only checking if the last tablet in the merge range has the merged marker and then returning because we are assuming it is already merged if it does. We are not checking for a correct linked list. This makes sense as I tested the idempotent case manually when working on this and didn't see an issue. I guess the only outstanding question if if we should add a similar check to the merge code in main to verify the linked list or not worry about it. |
The expectation is that the tablets being merged form a linked list. Did not realize that was only checked in elasticity. I think it would be good to add an issue for that. If the persisted tablet metadata does not meet expectations its probably best to avoid proceeding w/ merge. |
Thanks for looking into that. Made a comment on #4646 that while this exact bug does not exist in earlier branches that other bugs related to idempotence may exists. Your suggestion there about back porting FlakyFate would help test that. |
I created a new issue #4651 |
This backports the FlakyFate and FlakyFateManager impl from elasticity that was added in apache#4643 so that fate operations can be easily tested to check if they are idempotent. DeleteRowsFlakyFateIT and MergeFlayFateIT were also backported and pass verifying the operations are idempotent.
This backports the FlakyFate and FlakyFateManager impl from elasticity that was added in apache#4643 so that fate operations can be easily tested to check if they are idempotent. DeleteRowsFlakyFateIT and MergeFlakyFateIT were also backported and pass verifying the operations are idempotent.
This backports the FlakyFate and FlakyFateManager impl from elasticity that was added in #4643 so that fate operations can be easily tested to check if they are idempotent. DeleteRowsFlakyFateIT and MergeFlakyFateIT were also backported and pass verifying the operations are idempotent.
Update a few ITs that were stopping a normal server instance and starting up a test impl by taking advantage of changes in apache#4643 to set the server instance type before start up. This simplifies the tests and also prevents weird behavior by having servers start up and be shut down quickly.
Update a few ITs that were stopping a normal server instance and starting up a test impl by taking advantage of changes in apache#4643 to set the server instance type before start up. This simplifies the tests and also prevents weird behavior by having servers start up and be shut down quickly. This closes apache#4644
Update a few ITs that were stopping a normal server instance and starting up a test impl by taking advantage of changes in #4643 to set the server instance type before start up. This simplifies the tests and also prevents weird behavior by having servers start up and be shut down quickly. This closes #4644
Adds a way to make Fate in the manager always runs steps multiple times to test idempotence. Created versions of merge and delete rows ITs that uses this test manager. Found and fixed a bug in the merge code when running these new test. Changed mini accumulo to allow specifying a different manager class to use prio to startup. There was an existing method that would allow starting a manager with a different class, but using it would have meant letting the actuall manager start and then killing it and starting another which would have added time to test. The change to mini accumulo was not made to its public API, only its implementation.
fixes #4642