Skip to content

Xarray integration #705

@cosmicBboy

Description

@cosmicBboy

Is your feature request related to a problem? Please describe.

xarray is a project that provides a dict-like data container abstraction for ndimensional arrays. It shares some commonalities with pandas, but there many key differences (e.g. coords and attrs).

After chatting with @jhamman about this approach, we decided it would be appropriate to park xarray-schema within the pandera codebase. This issue tracks the planned integration of xarray-schema into the pandera codebase.

Describe the solution you'd like

A good start for this integration is to add a pandera.xarray module exposing the schema and schema component classes specific to xarray:

import numpy as np
import xarray as xr
from pandera.xarray import DataArraySchema, DatasetSchema

da = xr.DataArray(np.ones(4, dtype='i4'), dims=['x'], name='foo')

schema = DataArraySchema(dtype=np.integer, name='foo', shape=(4, ), dims=['x'])

schema.validate(da)

TODO

Describe alternatives you've considered

The main alternative to this integration is to keep xarray-schema as a separate project that's interoperable with pandera. However, given that pandera plans on expanding its scope to validate data containers beyond pandas, it would benefit this project to maintain schema interfaces for multiple (not just pandas-like) data container libraries.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions