Conversation
|
So far I've just added a single test which fails. I don't think the test should fail although I'm not sure what the |
What problem are you trying to solve here? This vlen string stuff is an internal API that isn't really intended for use outside Xarray. |
Somehow I ended up with import numpy as np
import pandas as pd
# I don't know how the strings ended up being np.str_....
scenarios = [np.str_(v) for v in ["scenario_a", "scenario_b", "scenario_c"]]
years = range(2015, 2100 + 1)
tdf = pd.DataFrame(
data=np.random.random((len(scenarios), len(years))),
columns=years,
index=scenarios,
)
tdf.index.name = "scenario"
tdf.columns.name = "year"
tdf = tdf.stack()
tdf.name = "tas"
txr = tdf.to_xarray()
# raises error shown below
txr.to_netcdf("test.nc")
# error
Traceback (most recent call last):
File "scratch.py", line 20, in <module>
txr.to_netcdf("test.nc")
File ".../lib/python3.7/site-packages/xarray/core/dataarray.py", line 2741, in to_netcdf
return dataset.to_netcdf(*args, **kwargs)
File ".../lib/python3.7/site-packages/xarray/core/dataset.py", line 1699, in to_netcdf
invalid_netcdf=invalid_netcdf,
File ".../lib/python3.7/site-packages/xarray/backends/api.py", line 1108, in to_netcdf
dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
File ".../lib/python3.7/site-packages/xarray/backends/api.py", line 1154, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File ".../lib/python3.7/site-packages/xarray/backends/common.py", line 256, in store
variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
File ".../lib/python3.7/site-packages/xarray/backends/common.py", line 294, in set_variables
name, v, check, unlimited_dims=unlimited_dims
File ".../lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 464, in prepare_variable
variable, self.format, raise_on_invalid_encoding=check_encoding
File ".../lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 131, in _get_datatype
datatype = _nc4_dtype(var)
File ".../lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 154, in _nc4_dtype
raise ValueError(f"unsupported dtype for netCDF4 variable: {var.dtype}")
ValueError: unsupported dtype for netCDF4 variable: object |
|
@shoyer any further thoughts on this now that the scope is clearer? |
|
I agree, this should totally work. It's not obvious to me how to best fix it, though. |
I assume it's not as trivial as just changing e.g. xarray/xarray/coding/strings.py Line 32 in f9a535c np.str_?
|
|
I think the issue must be somewhere around this line, where xarray attempts to infer a dtype for object arrays: Line 215 in f9a535c |
33f0e09 to
ca6abdb
Compare
|
I tried pushing a fix. It's unclear to me whether the change should be in how the dtypes are inferred (given that the inference code seems to do what it is meant to...) or whether |
|
My suggestion is that either Lines 160 to 161 in f9a535c ) or the underlying create_vlen_dtype should be updated, so it never puts np.str_ inside a custom vlen dtype. Instead, we should normalize element_type to always be str or bytes inside the vlen dtype.
|
|
To add a bit more clarification: the vlen dtype should correspond to an HDF5/netCDF4 compatible data-type, like a variable length string or bytes. |
|
Something like 59ed7d5? (Obviously missing proper tests but just to get a sense of whether the idea is plausible) |
|
|
f2e1550 to
5149cc7
Compare
|
Ignoring failing CI due to fsspec (see #5615 (comment)) |
5149cc7 to
5e15269
Compare
|
@shoyer can I bother you again now that CI is passing please? |
|
@lewisjarednz fyi |
shoyer
left a comment
There was a problem hiding this comment.
Looks great, thanks! Please move the test, then we can merge this
|
Looks good to me, nice work! |
|
Thanks @znicholls! |
Fixes handling of numpy string types in coding
pre-commit run --all-fileswhats-new.rst