Skip to content

Add TSERV_GROUP_NAME, ability to balance within tserver group#3496

Merged
dlmarion merged 24 commits into
apache:elasticityfrom
dlmarion:3459-tserver-groups-attempt2
Jul 11, 2023
Merged

Add TSERV_GROUP_NAME, ability to balance within tserver group#3496
dlmarion merged 24 commits into
apache:elasticityfrom
dlmarion:3459-tserver-groups-attempt2

Conversation

@dlmarion
Copy link
Copy Markdown
Contributor

Added TSERV_GROUP_NAME property and refactored how groups are determined within the server processes. Modified LiveTServerSet to pull the group from the ServiceLockData and add it to the TServerInstance. Modified the TabletGroupWatcher to use the group information in the TServerInstance to determine if the tablet is currently hosted on a TServer that is not in the correct group

Added TSERV_GROUP_NAME property and refactored how groups
are determined within the server processes. Modified LiveTServerSet
to pull the group from the ServiceLockData and add it to the
TServerInstance. Modified the TabletGroupWatcher to use the
group information in the TServerInstance to determine if the
tablet is currently hosted on a TServer that is not in the
correct group
@dlmarion
Copy link
Copy Markdown
Contributor Author

This replaces PR #3482 . It does not put the tserver group name into the metadata. There is likely more work to be done here w/r/t the balancers and tests. Putting up for early feedback on approach.

@dlmarion
Copy link
Copy Markdown
Contributor Author

dlmarion commented Jun 16, 2023

Follow-on tasks include:

Rename compactor.queue to compactor.group
Update scripts for compactor.queue name change, addition of tserver.group

@dlmarion dlmarion marked this pull request as ready for review June 16, 2023 17:05
@dlmarion
Copy link
Copy Markdown
Contributor Author

Follow-on tasks completed in https://github.com/dlmarion/accumulo/tree/3496-followon-rename-compactor-queue. Will create PR for elasticity branch once this merged.

Copy link
Copy Markdown
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really neat. I made comments, those could be addressed as follow on issues.

+ "also consider configuring the `" + NoDeleteConstraint.class.getName() + "` "
+ "constraint.",
"2.0.0"),
TABLE_ASSIGNMENT_GROUP("table.assignment.group", Constants.DEFAULT_RESOURCE_GROUP_NAME,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is configuration for the TableLoadBalancer plugin, other configured balancer plugins may ignore this. Instead of defining it as a table property could define it as a custom property like table.custom.assignment.group and describe this prop in the documentation for the TableLoadBalancer plugin. This scopes the property to the plugin.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also used in TabletGroupWatcher.getAssignmentsFromBalancer to print a warning if a tablet is assigned to a tserver outside of its resource group.

Copy link
Copy Markdown
Contributor

@keith-turner keith-turner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's also used in TabletGroupWatcher.getAssignmentsFromBalancer to print a warning if a tablet is assigned to a tserver outside of its resource group.

I don't think TGW should try to validate if a particular plugin is behaving properly.

Copy link
Copy Markdown
Contributor

@keith-turner keith-turner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we scope the config to plugin, it could allow TableLoadBalancer to have more complex configuration that is tightly coupled to it. For example TableLoadBalancer could support the property table.custom.assignment.group or table.custom.assignment.rangedGroups where rangedGroups allows a user to set a json value that specifies something like the folllowing.

  • For tablets in range (<row1>,<row2>] use tablet server group X
  • For tablets in range (<row3>,<row4>] use tablet server group Y
  • For all other tablet use tablet server group Z

So one property or the other could be set, if both are set it throws an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it wasn't just the warning in TGW.getAssignmentsFromBalancer. There is also this:

https://github.com/apache/accumulo/pull/3496/files#diff-dd97054ac43737177293a8f3a9231f05f0e4bc5eb0a3b042cad3fb8ce31b6caaR303-R325

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this check also need to run in the tablet mgmt iterator?

I don't think it can unless we pass the group information to the iterator. The group is not in the metadata anymore, only in ZK. The LiveTServerSet object in the Manager is the only thing that knows about the TServer groups

Copy link
Copy Markdown
Contributor

@keith-turner keith-turner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If needed we could pass the tserver group information to the tablet mgmt iterator, we already pass the live tserver set I think so this would be a similar amount of data. We could call the balancer plugins in the tablet mgmt iterator if needed also. The compaction code is calling compaction plugins in CompactionJobGenerator inside tablet mgtm iterator to make decisions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be good to have an IT that does something like the following.

  • Configures table A to be assigned to tserver group Z
  • Verifies assignments are as expected
  • Changes config such that table A is assigned to tserver group X
  • Starts a tserver in group X
  • Waits for tablets to be assigned to group X

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the TABLE_ASSIGNMENT_GROUP property from Property and kept it in TableLoadBalancer in the commits today. There is a lot in this thread. I need to re-read it and make sure I got to everything.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that you suggested above. It uncovered some other issues with tables being deleted and how the TabletManagementIterator handles that. I think I have everything in this thread done.

Comment thread core/src/main/java/org/apache/accumulo/core/spi/balancer/TableLoadBalancer.java Outdated
// get the group of tservers for this table
SortedMap<TabletServerId,TServerStatus> groupedTServers = getCurrentSetForTable(
params.currentStatus(), params.currentResourceGroups(),
environment.getConfiguration(e.getKey()).get(Property.TABLE_ASSIGNMENT_GROUP.getKey()));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It using the custom table prop, could do the following.

Suggested change
environment.getConfiguration(e.getKey()).get(Property.TABLE_ASSIGNMENT_GROUP.getKey()));
environment.getConfiguration(e.getKey()).getTableCustom("assignment.group");

}

public synchronized Map<String,Set<TServerInstance>> getCurrentServersGroups() {
Map<String,Set<TServerInstance>> copy = new HashMap<>();
Copy link
Copy Markdown
Contributor

@keith-turner keith-turner Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its possible that depending on the usage patterns that it may be more efficient to compute an immutable map of this info when it changes. The tserver does this for online tablets in https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/OnlineTablets.java

I don't know if this would be useful w/o analyzing the larger way in which this is used and created.

* @return map of resource group name to set of TServerInstance objects
* @since 4.0.0
*/
Map<String,Set<TabletServerId>> currentResourceGroups();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a follow on, we may want to introduce a type for the group names instead of using string. This would be something like TableId.

Suggested change
Map<String,Set<TabletServerId>> currentResourceGroups();
Map<String,Set<TabletServerId>> currentResourceGroups();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestion looks the same to me, but I think I understand what you are saying.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant to put some type name in the suggestion. Must have gotten distracted by something while writing the comment.

Comment thread core/src/main/java/org/apache/accumulo/core/spi/balancer/TableLoadBalancer.java Outdated
@dlmarion dlmarion requested a review from keith-turner July 6, 2023 20:42
log.trace("Table {} is set to use resource group: {}", e.getKey(), tableResourceGroup);
final Map<TabletId,TabletServerId> newAssignments = new HashMap<>();
// get the group of tservers for this table
final SortedMap<TabletServerId,TServerStatus> groupedTServers = getCurrentSetForTable(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a follow on. Need to determine if we need to do anything to avoid recomputing this map on every call to getAssignments. Not sure if it matters, depends on how frequently the manager calls this.


import com.google.common.net.HostAndPort;

public class TabletServerResourceGroupBalanceIT extends SharedMiniClusterBase {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more test that could be useful.

  • Test balancing the root and metadata table to diff tserver groups.
  • Test two user tables that balance to diff groups to make sure that works as expected.
  • Test tablets within in a single table that balance to different groups.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test tablets within in a single table that balance to different groups.

This was one of your earlier suggestions, but is not implemented at this time. This currently supports a table being assigned to a single resource group. I wonder if we should let users use this first and provide feedback on how they want it to operate. Also, they could extend TableLoadBalancer and do this for themselves.

Copy link
Copy Markdown
Contributor Author

@dlmarion dlmarion Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test two user tables that balance to diff groups to make sure that works as expected.

testBalancerWithResourceGroups does this. It creates one table in the default group and one table in the GROUP1 group. Then it validates that all of the first tables tablets are located in the tserver in the default group and that all of the second tables tablets are located in the tserver in the GROUP1 group.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test balancing the root and metadata table to diff tserver groups.

Is this any different than the testBalancerWithResourceGroups test? Are these tables handled differently?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root tablet is handled differently, metadata table is more likely to be like everything else. When I added an IT for compaction of root and metadata, I found the root tablet did not work. Had to add this :

CompactionJobGenerator cjg = new CompactionJobGenerator(new ServiceEnvironmentImpl(ctx));
var jobs = cjg.generateJobs(tm,
EnumSet.of(CompactionKind.SYSTEM, CompactionKind.USER, CompactionKind.SELECTOR));
if (!jobs.isEmpty()) {
actions.add(ManagementAction.NEEDS_COMPACTING);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #3587

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keith-turner - I added balance tests for root and metadata table in 6ca1a88, and they work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the SPI supports the functionality, then it would be good to ensure it actually works.

So, the TabletBalancer.getResourceGroup api method works at the tablet level now. The TabletMetadata.getTabletState method returns TabletState.ASSIGNED_TO_WRONG_GROUP when a tablet is not hosted on a tablet server that is in the same group that TabletBalancer.getResourceGroup returns. TableLoadBalancer passes the set of TabletServers that correspond to the resource group to the tables load balancer for assignment and balance purposes. This allows, for example, the SimpleLoadBalancer to assign and balance within the subset of tablet servers.

I think to do what you are asking (assign and balance at the tablet level based on resource group assignment), the table level balancers would need to be modified to make decisions on a per-tablet basis.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one way to resolve this is to remove TabletBalancer.getResourceGroup such that only TableLoadBalancer has this functionality. TabletMetadata.getTabletState would only check the resource group if the balancer is TableLoadBalancer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed the above with @keith-turner , I'm going to merge as-is and create a follow-on issue.

Comment thread core/src/main/java/org/apache/accumulo/core/metadata/schema/TabletMetadata.java Outdated
@dlmarion dlmarion merged commit cfd7c80 into apache:elasticity Jul 11, 2023
@dlmarion dlmarion deleted the 3459-tserver-groups-attempt2 branch July 11, 2023 21:13
@ctubbsii ctubbsii added this to the 4.0.0 milestone Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

Resource Groups: Create TabletServer property like sserver.group that can be used for assignment / balancing

3 participants