fixing behaviour for group parameter in `open_datatree` by aladinor · Pull Request #9666 · pydata/xarray

aladinor · 2024-10-23T19:53:25Z

Hi all.

This might be more complex than pruning the path in the open_group_as_dict function. It is kind of complex because when we use _iter_zarr_groups, it yields the root group. I am still working o it

Closes open_datatree(group='some_subgroup') returning parent nodes #9665
Tests added

for more information, see https://pre-commit.ci

aladinor · 2024-10-23T20:46:28Z

@TomNicholas, I think this solution will work, but the DataTree.from_dict(groups_dict) function will create the root as an empty node. I think we might need to check it out. Let me know your thoughts.

keewis

I think the reason why you don't get the desired result is that you compute paths relative to the immediate parent of the group, not the global parent. I don't have a deeply nested tree ready for testing (so I can't be sure this actually works), but with the suggestion below I don't get the empty root node anymore.

xarray/backends/zarr.py

aladinor · 2024-10-24T15:37:17Z

I think we are getting close. However, we are still having some discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path. Example:

print(dtree["/group2/subg1"])
Group: /group2/subg1 <----- different root paths here
│   Dimensions:  (x: 2, y: 3)
│   Inherited coordinates:
│     * x        (x) int64 16B -1 -2
│     * y        (y) int64 24B 0 1 2
│   Data variables:
│       blah     (x) int64 16B 2 3
├── Group: /group2/subg1/subsub1 <----- different paths here
│       Dimensions:  (y: 3)
│       Data variables:
│           var      (y) int64 24B 4 5 6
└── Group: /group2/subg1/subsub2 <----- different  paths here

dt2 = xr.open_datatree("test.zarr",
                          group="/group2/subg1"

print(dt2)
Group: /  <----- different root paths here
│   Dimensions:  (x: 2)
│   Dimensions without coordinates: x
│   Data variables:
│       blah     (x) int64 16B ...
├── Group: /subsub1 <----- different  paths here
│       Dimensions:  (y: 3)
│       Dimensions without coordinates: y
│       Data variables:
│           var      (y) int64 24B ...
└── Group: /subsub2 <----- different  paths here

Any comments on this @TomNicholas @keewis?

keewis · 2024-10-24T15:45:41Z

this is by design, I think? I'd interpret group="/group2/subgroup1" as saying "give me that group as the new root group". Then tree.encoding["source_group"] can contain the full path of the new root.

(if you know unix commands, this would be similar to chroot)

xarray/backends/zarr.py

TomNicholas · 2024-10-24T16:46:37Z

discrepancies when comparing both datatrees using the group parameter and when directly selecting it via path

The reprs are different because the objects are different: dtree["/group2/subg1"] has a parent, whereas dt2 does not. So this is intended.

TomNicholas

We just need a test, to remove the encoding bit, and then this is good to go!

xarray/backends/h5netcdf_.py

xarray/backends/netCDF4_.py

TomNicholas · 2024-10-24T16:52:58Z

The test should look very much like the ones in #9669 - create a tiny nested tree, save it to zarr/netcdf, open it with the group parameter, and check the structure is as expected.

…ata#9660

xarray/tests/test_backends_datatree.py

…into fix-group-param

for more information, see https://pre-commit.ci

…into fix-group-param

for more information, see https://pre-commit.ci

TomNicholas

Thank you so much @aladinor !

keewis

I've got two comments: one on our strategy on DataTree whats-new entries, and one on the way we compare node datasets.

doc/whats-new.rst

xarray/tests/test_backends_datatree.py

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

TomNicholas · 2024-10-24T19:24:34Z

Sorry @aladinor - in fact could we just do this? #9666 (comment)

TomNicholas · 2024-10-24T20:16:15Z

This looks good! @keewis 's comments are addressed so I'm going to merge it.

keewis · 2024-10-24T20:20:15Z

(you can still add yourself to the list of contributors to the DataTree entry in the whats-new, @aladinor Oh, and me too, apparently 😅)

Edit: see

xarray/doc/whats-new.rst

Lines 24 to 32 in 5b2e6f1

    
           - ``DataTree`` related functionality is now exposed in the main ``xarray`` public 
        
             API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``, 
        
             ``xarray.map_over_datasets``, ``xarray.group_subtrees``, 
        
             ``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``. 
        
             By `Owen Littlejohns <https://github.com/owenlittlejohns>`_, 
        
             `Eni Awowale <https://github.com/eni-awowale>`_, 
        
             `Matt Savoie <https://github.com/flamingbear>`_, 
        
             `Stephan Hoyer <https://github.com/shoyer>`_ and 
        
             `Tom Nicholas <https://github.com/TomNicholas>`_.

keewis · 2024-10-24T20:22:29Z

Otherwise @TomNicholas can do that while preparing for the release.

… entry

TomNicholas · 2024-10-24T20:28:37Z

Amazing thank you! And thanks for pointing that out so everyone involved can get credit for these great contributions!

aladinor · 2024-10-24T20:57:36Z

Thanks @TomNicholas and @keewis for your guidance!

* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)

* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...

aladinor and others added 5 commits October 23, 2024 14:39

adding draft for fixing behaviour for group parameter

3429b2c

[pre-commit.ci] auto fixes from pre-commit.com hooks

1507f4d

for more information, see https://pre-commit.ci

new trial

e24e88b

new trial

34e74db

new trial

b6fac5b

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Oct 23, 2024

keewis reviewed Oct 23, 2024

View reviewed changes

xarray/backends/zarr.py Outdated Show resolved Hide resolved

xarray/backends/zarr.py Outdated Show resolved Hide resolved

keewis mentioned this pull request Oct 24, 2024

support chunks in open_groups and open_datatree #9660

Merged

2 tasks

fixing duplicate pahts and path in the root group

ce83c89

aladinor commented Oct 24, 2024

View reviewed changes

xarray/backends/zarr.py Outdated Show resolved Hide resolved

aladinor added 2 commits October 24, 2024 11:13

removing yield str(gpath)

72fcee6

implementing the proposed solution to hdf5 and netcdf backends

bd853c8

aladinor marked this pull request as ready for review October 24, 2024 16:47

TomNicholas reviewed Oct 24, 2024

View reviewed changes

xarray/backends/h5netcdf_.py Outdated Show resolved Hide resolved

xarray/backends/netCDF4_.py Outdated Show resolved Hide resolved

aladinor added 2 commits October 24, 2024 11:55

adding changes to whats-new.rst

d6e5422

removing encoding['source_group'] line to avoid conflicts with PR pyd…

12005e2

…ata#9660

aladinor changed the title ~~adding draft for fixing behaviour for group parameter~~ fixing behaviour for group parameter in open_datatree Oct 24, 2024

TomNicholas added the topic-backends label Oct 24, 2024

TomNicholas mentioned this pull request Oct 24, 2024

Track merging datatree into xarray #8572

Closed

27 tasks

adding test

e4384d6

TomNicholas reviewed Oct 24, 2024

View reviewed changes

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

TomNicholas and others added 4 commits October 24, 2024 12:37

Merge branch 'main' into fix-group-param

0fab3c7

adding test

e935e4e

Merge branch 'fix-group-param' of https://github.com/aladinor/xarray …

2803f9f

…into fix-group-param

[pre-commit.ci] auto fixes from pre-commit.com hooks

9a41b68

for more information, see https://pre-commit.ci

aladinor and others added 4 commits October 24, 2024 13:40

adding assert subgroup_tree.root.parent is None

f5d3073

Merge branch 'fix-group-param' of https://github.com/aladinor/xarray …

a473778

…into fix-group-param

modifying tests

bb6d413

[pre-commit.ci] auto fixes from pre-commit.com hooks

fcf3dc6

for more information, see https://pre-commit.ci

TomNicholas approved these changes Oct 24, 2024

View reviewed changes

TomNicholas requested a review from keewis October 24, 2024 18:56

keewis reviewed Oct 24, 2024

View reviewed changes

doc/whats-new.rst Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

xarray/tests/test_backends_datatree.py Outdated Show resolved Hide resolved

aladinor and others added 2 commits October 24, 2024 14:18

Update xarray/tests/test_backends_datatree.py

195e036

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>

applying suggested changes

5ef3a56

aladinor and others added 2 commits October 24, 2024 14:49

updating test

90c5b4d

Merge branch 'main' into fix-group-param

38548b0

aladinor added 2 commits October 24, 2024 15:26

adding Justus and Alfonso to the list of contributors to the DataTree…

e78d576

… entry

adding Justus and Alfonso to the list of contributors to the DataTree…

762587b

… entry

TomNicholas enabled auto-merge (squash) October 24, 2024 20:28

Merge branch 'main' into fix-group-param

0cd22c5

TomNicholas mentioned this pull request Oct 24, 2024

Should DataTree.orphan act in-place? #9674

Open

TomNicholas disabled auto-merge October 24, 2024 21:00

TomNicholas merged commit f24cae3 into pydata:main Oct 24, 2024

aladinor deleted the fix-group-param branch November 20, 2024 16:19

Uh oh!

Conversation

aladinor commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aladinor commented Oct 23, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

aladinor commented Oct 24, 2024

Uh oh!

keewis commented Oct 24, 2024

Uh oh!

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

keewis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

keewis commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keewis commented Oct 24, 2024

Uh oh!

TomNicholas commented Oct 24, 2024

Uh oh!

aladinor commented Oct 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aladinor commented Oct 23, 2024 •

edited

Loading

aladinor commented Oct 23, 2024 •

edited

Loading

keewis left a comment •

edited

Loading

keewis commented Oct 24, 2024 •

edited

Loading