Add TSERV_GROUP_NAME, ability to balance within tserver group#3496
Conversation
Added TSERV_GROUP_NAME property and refactored how groups are determined within the server processes. Modified LiveTServerSet to pull the group from the ServiceLockData and add it to the TServerInstance. Modified the TabletGroupWatcher to use the group information in the TServerInstance to determine if the tablet is currently hosted on a TServer that is not in the correct group
|
This replaces PR #3482 . It does not put the tserver group name into the metadata. There is likely more work to be done here w/r/t the balancers and tests. Putting up for early feedback on approach. |
|
Follow-on tasks include: Rename |
|
Follow-on tasks completed in https://github.com/dlmarion/accumulo/tree/3496-followon-rename-compactor-queue. Will create PR for elasticity branch once this merged. |
keith-turner
left a comment
There was a problem hiding this comment.
This is really neat. I made comments, those could be addressed as follow on issues.
| + "also consider configuring the `" + NoDeleteConstraint.class.getName() + "` " | ||
| + "constraint.", | ||
| "2.0.0"), | ||
| TABLE_ASSIGNMENT_GROUP("table.assignment.group", Constants.DEFAULT_RESOURCE_GROUP_NAME, |
There was a problem hiding this comment.
This is configuration for the TableLoadBalancer plugin, other configured balancer plugins may ignore this. Instead of defining it as a table property could define it as a custom property like table.custom.assignment.group and describe this prop in the documentation for the TableLoadBalancer plugin. This scopes the property to the plugin.
There was a problem hiding this comment.
it's also used in TabletGroupWatcher.getAssignmentsFromBalancer to print a warning if a tablet is assigned to a tserver outside of its resource group.
There was a problem hiding this comment.
it's also used in TabletGroupWatcher.getAssignmentsFromBalancer to print a warning if a tablet is assigned to a tserver outside of its resource group.
I don't think TGW should try to validate if a particular plugin is behaving properly.
There was a problem hiding this comment.
If we scope the config to plugin, it could allow TableLoadBalancer to have more complex configuration that is tightly coupled to it. For example TableLoadBalancer could support the property table.custom.assignment.group or table.custom.assignment.rangedGroups where rangedGroups allows a user to set a json value that specifies something like the folllowing.
- For tablets in range
(<row1>,<row2>]use tablet server group X - For tablets in range
(<row3>,<row4>]use tablet server group Y - For all other tablet use tablet server group Z
So one property or the other could be set, if both are set it throws an error.
There was a problem hiding this comment.
So, it wasn't just the warning in TGW.getAssignmentsFromBalancer. There is also this:
There was a problem hiding this comment.
Does this check also need to run in the tablet mgmt iterator?
I don't think it can unless we pass the group information to the iterator. The group is not in the metadata anymore, only in ZK. The LiveTServerSet object in the Manager is the only thing that knows about the TServer groups
There was a problem hiding this comment.
If needed we could pass the tserver group information to the tablet mgmt iterator, we already pass the live tserver set I think so this would be a similar amount of data. We could call the balancer plugins in the tablet mgmt iterator if needed also. The compaction code is calling compaction plugins in CompactionJobGenerator inside tablet mgtm iterator to make decisions.
There was a problem hiding this comment.
It may be good to have an IT that does something like the following.
- Configures table A to be assigned to tserver group Z
- Verifies assignments are as expected
- Changes config such that table A is assigned to tserver group X
- Starts a tserver in group X
- Waits for tablets to be assigned to group X
There was a problem hiding this comment.
I removed the TABLE_ASSIGNMENT_GROUP property from Property and kept it in TableLoadBalancer in the commits today. There is a lot in this thread. I need to re-read it and make sure I got to everything.
There was a problem hiding this comment.
I added a test that you suggested above. It uncovered some other issues with tables being deleted and how the TabletManagementIterator handles that. I think I have everything in this thread done.
| // get the group of tservers for this table | ||
| SortedMap<TabletServerId,TServerStatus> groupedTServers = getCurrentSetForTable( | ||
| params.currentStatus(), params.currentResourceGroups(), | ||
| environment.getConfiguration(e.getKey()).get(Property.TABLE_ASSIGNMENT_GROUP.getKey())); |
There was a problem hiding this comment.
It using the custom table prop, could do the following.
| environment.getConfiguration(e.getKey()).get(Property.TABLE_ASSIGNMENT_GROUP.getKey())); | |
| environment.getConfiguration(e.getKey()).getTableCustom("assignment.group"); |
| } | ||
|
|
||
| public synchronized Map<String,Set<TServerInstance>> getCurrentServersGroups() { | ||
| Map<String,Set<TServerInstance>> copy = new HashMap<>(); |
There was a problem hiding this comment.
Its possible that depending on the usage patterns that it may be more efficient to compute an immutable map of this info when it changes. The tserver does this for online tablets in https://github.com/apache/accumulo/blob/main/server/tserver/src/main/java/org/apache/accumulo/tserver/OnlineTablets.java
I don't know if this would be useful w/o analyzing the larger way in which this is used and created.
| * @return map of resource group name to set of TServerInstance objects | ||
| * @since 4.0.0 | ||
| */ | ||
| Map<String,Set<TabletServerId>> currentResourceGroups(); |
There was a problem hiding this comment.
This would be a follow on, we may want to introduce a type for the group names instead of using string. This would be something like TableId.
| Map<String,Set<TabletServerId>> currentResourceGroups(); | |
| Map<String,Set<TabletServerId>> currentResourceGroups(); |
There was a problem hiding this comment.
The suggestion looks the same to me, but I think I understand what you are saying.
There was a problem hiding this comment.
I think I meant to put some type name in the suggestion. Must have gotten distracted by something while writing the comment.
| log.trace("Table {} is set to use resource group: {}", e.getKey(), tableResourceGroup); | ||
| final Map<TabletId,TabletServerId> newAssignments = new HashMap<>(); | ||
| // get the group of tservers for this table | ||
| final SortedMap<TabletServerId,TServerStatus> groupedTServers = getCurrentSetForTable( |
There was a problem hiding this comment.
This could be a follow on. Need to determine if we need to do anything to avoid recomputing this map on every call to getAssignments. Not sure if it matters, depends on how frequently the manager calls this.
|
|
||
| import com.google.common.net.HostAndPort; | ||
|
|
||
| public class TabletServerResourceGroupBalanceIT extends SharedMiniClusterBase { |
There was a problem hiding this comment.
Some more test that could be useful.
- Test balancing the root and metadata table to diff tserver groups.
- Test two user tables that balance to diff groups to make sure that works as expected.
- Test tablets within in a single table that balance to different groups.
There was a problem hiding this comment.
Test tablets within in a single table that balance to different groups.
This was one of your earlier suggestions, but is not implemented at this time. This currently supports a table being assigned to a single resource group. I wonder if we should let users use this first and provide feedback on how they want it to operate. Also, they could extend TableLoadBalancer and do this for themselves.
There was a problem hiding this comment.
Test two user tables that balance to diff groups to make sure that works as expected.
testBalancerWithResourceGroups does this. It creates one table in the default group and one table in the GROUP1 group. Then it validates that all of the first tables tablets are located in the tserver in the default group and that all of the second tables tablets are located in the tserver in the GROUP1 group.
There was a problem hiding this comment.
Test balancing the root and metadata table to diff tserver groups.
Is this any different than the testBalancerWithResourceGroups test? Are these tables handled differently?
There was a problem hiding this comment.
The root tablet is handled differently, metadata table is more likely to be like everything else. When I added an IT for compaction of root and metadata, I found the root tablet did not work. Had to add this :
There was a problem hiding this comment.
@keith-turner - I added balance tests for root and metadata table in 6ca1a88, and they work.
There was a problem hiding this comment.
If the SPI supports the functionality, then it would be good to ensure it actually works.
So, the TabletBalancer.getResourceGroup api method works at the tablet level now. The TabletMetadata.getTabletState method returns TabletState.ASSIGNED_TO_WRONG_GROUP when a tablet is not hosted on a tablet server that is in the same group that TabletBalancer.getResourceGroup returns. TableLoadBalancer passes the set of TabletServers that correspond to the resource group to the tables load balancer for assignment and balance purposes. This allows, for example, the SimpleLoadBalancer to assign and balance within the subset of tablet servers.
I think to do what you are asking (assign and balance at the tablet level based on resource group assignment), the table level balancers would need to be modified to make decisions on a per-tablet basis.
There was a problem hiding this comment.
I guess one way to resolve this is to remove TabletBalancer.getResourceGroup such that only TableLoadBalancer has this functionality. TabletMetadata.getTabletState would only check the resource group if the balancer is TableLoadBalancer.
There was a problem hiding this comment.
Discussed the above with @keith-turner , I'm going to merge as-is and create a follow-on issue.
Added TSERV_GROUP_NAME property and refactored how groups are determined within the server processes. Modified LiveTServerSet to pull the group from the ServiceLockData and add it to the TServerInstance. Modified the TabletGroupWatcher to use the group information in the TServerInstance to determine if the tablet is currently hosted on a TServer that is not in the correct group