-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Is your feature request related to a problem?
In the radio astronomy domain specific xarray-ms, we construct a DataTree representing partitions of a legacy data format where each partition contains regular data cubes. As currently implemented, the custom backend supports a partition_chunks kwarg in the BackendEntrypoint.open_datatree method so that it is possible to specify different chunking schemas per partition:
The chunking specification above is specific to a radio astronomy legacy format, but it may be more generally useful to be able to specify per-DataTree node chunking.
Describe the solution you'd like
Currently, BackendEntrypoint.open_datatree passes it's chunks kwarg to each Dataset constructor in the DataTree. This is quite coarse-grained as it applies the same chunking schema to all Datasets in the DataTree.
I propose that the chunks kwarg in BackendEntrypoint.open_datatree support a chunking dictionary per path (i.e. DataTree Node). For example:
import xarray
xdt = xarray.open_datatree(..., chunks={
"/path/to/node1": {"time": 20, "frequency": 16},
"/path/to/a/node2": {"time": 10, "frequency": 4},
}Then, when constructing Datasets in the DataTree, the chunking schema appropriate to the node can be applied.
An entry in the above dictionary does not necessarily need to only apply to a single node. It could also apply the chunking schema to each subtree below the node. But it may be better to make this more explicit
xd = xarray.open_datatree(..., chunks={
# Apply to node1 and any node below
"/path/to/node1/...": {"time": 20, "frequency": 16}
}Describe alternatives you've considered
We've implemented a custom partition_chunks kwarg argument in the BackendEntrypoint.open_datatree method for our legacy data format.
Additional context
No response