Skip to content

refactor: Import fewer private modules#1069

Merged
timtreis merged 10 commits intoscverse:mainfrom
selmanozleyen:refactor/fix-private-imports
Dec 2, 2025
Merged

refactor: Import fewer private modules#1069
timtreis merged 10 commits intoscverse:mainfrom
selmanozleyen:refactor/fix-private-imports

Conversation

@selmanozleyen
Copy link
Copy Markdown
Member

@selmanozleyen selmanozleyen commented Nov 28, 2025

fixes #1061 partially with a workaround. I have moved everything we import privately to _compat.py. If there is a version mismatch it's handled there. This approach is like the unsafe block of Rust. At least it encapsulates the possible fragile code in one file.

Another function that we should reconsider is _create_sparse_df. It uses lots of protected members of pandas and might cause pandas version mismatch in the future.

In fact only ligrec is using this function which caused reproducibility errors in the past. It wasn't directly caused by it but maybe if there was a safer code written the bug wouldn't surface. Or at least there would be a warning before the pandas behavior changed.

# modified from pandas' source code
def _create_sparse_df(
    data: NDArrayA | spmatrix,
    index: pd.Index | None = None,
    columns: Sequence[Any] | None = None,
    fill_value: float = 0,
) -> pd.DataFrame:
    """
    Create a new DataFrame from a scipy sparse matrix or numpy array.

    This is the original :mod:`pandas` implementation with 2 differences:

        - allow creation also from :class:`numpy.ndarray`
        - expose ``fill_values``

    Parameters
    ----------
    data
        Must be convertible to CSC format.
    index
        Row labels to use.
    columns
        Column labels to use.

    Returns
    -------
    Each column of the DataFrame is stored as a :class:`arrays.SparseArray`.
    """
    from pandas._libs.sparse import IntIndex
    from pandas.core.arrays.sparse.accessor import (
        SparseArray,
        SparseDtype,
        SparseFrameAccessor,
    )

    if not issparse(data):
        pred = (lambda col: ~np.isnan(col)) if np.isnan(fill_value) else (lambda col: ~np.isclose(col, fill_value))
        dtype = SparseDtype(data.dtype, fill_value=fill_value)
        n_rows, n_cols = data.shape
        arrays = []

        for i in range(n_cols):
            mask = pred(data[:, i])
            idx = IntIndex(n_rows, np.where(mask)[0], check_integrity=False)
            arr = SparseArray._simple_new(data[mask, i], idx, dtype)
            arrays.append(arr)

        return pd.DataFrame._from_arrays(arrays, columns=columns, index=index, verify_integrity=False)

    if TYPE_CHECKING:
        assert isinstance(data, spmatrix)
    data = data.tocsc()
    sort_indices = True

    data = data.tocsc()
    index, columns = SparseFrameAccessor._prep_index(data, index, columns)
    n_rows, n_columns = data.shape
    # We need to make sure indices are sorted, as we create
    # IntIndex with no input validation (i.e. check_integrity=False ).
    # Indices may already be sorted in scipy in which case this adds
    # a small overhead.
    if sort_indices:
        data.sort_indices()

    indices = data.indices
    indptr = data.indptr
    array_data = data.data
    dtype = SparseDtype(array_data.dtype, fill_value=fill_value)
    arrays = []

    for i in range(n_columns):
        sl = slice(indptr[i], indptr[i + 1])
        idx = IntIndex(n_rows, indices[sl], check_integrity=False)
        arr = SparseArray._simple_new(array_data[sl], idx, dtype)
        arrays.append(arr)

    return pd.DataFrame._from_arrays(arrays, columns=columns, index=index, verify_integrity=False)

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 28, 2025

Codecov Report

❌ Patch coverage is 83.33333% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.90%. Comparing base (277b192) to head (1dac201).
⚠️ Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
src/squidpy/_compat.py 76.47% 2 Missing and 2 partials ⚠️
src/squidpy/pl/_interactive/_utils.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1069      +/-   ##
==========================================
+ Coverage   65.88%   65.90%   +0.02%     
==========================================
  Files          44       45       +1     
  Lines        6762     6772      +10     
  Branches     1137     1138       +1     
==========================================
+ Hits         4455     4463       +8     
- Misses       1874     1875       +1     
- Partials      433      434       +1     
Files with missing lines Coverage Δ
src/squidpy/datasets/_10x_datasets.py 97.87% <100.00%> (-0.05%) ⬇️
src/squidpy/datasets/_utils.py 75.83% <100.00%> (+0.40%) ⬆️
src/squidpy/gr/_utils.py 60.35% <100.00%> (-0.45%) ⬇️
src/squidpy/pl/_color_utils.py 82.50% <100.00%> (ø)
src/squidpy/pl/_spatial_utils.py 78.32% <100.00%> (ø)
src/squidpy/pl/_utils.py 48.99% <100.00%> (ø)
src/squidpy/pl/_var_by_distance.py 66.66% <100.00%> (-0.37%) ⬇️
src/squidpy/pl/_interactive/_utils.py 0.00% <0.00%> (ø)
src/squidpy/_compat.py 76.47% <76.47%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors import statements to consolidate version-dependent compatibility code into a new _compat.py module, addressing issue #1061. The changes improve maintainability by encapsulating fragile version-specific imports in a single location.

  • Moved version-dependent scanpy and anndata imports to squidpy/_compat.py
  • Replaced private pandas imports (pandas._libs.lib.infer_dtype) with public API (pandas.api.types.infer_dtype)
  • Replaced deprecated scanpy._utils.check_presence_download with a new download_file function using pooch

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/squidpy/_compat.py New compatibility module that handles version-dependent imports from scanpy and anndata with appropriate version checks
src/squidpy/pl/_var_by_distance.py Updated to import scanpy plotting functions from _compat module instead of directly from scanpy private APIs
src/squidpy/pl/_utils.py Replaced private pandas import with public API (pandas.api.types.infer_dtype)
src/squidpy/pl/_spatial_utils.py Updated to import add_categorical_legend from _compat module
src/squidpy/pl/_interactive/_utils.py Replaced private pandas import with public API and imports scanpy function from _compat
src/squidpy/pl/_color_utils.py Updated to import scanpy function from _compat module
src/squidpy/gr/_utils.py Removed local version checking logic and imports anndata views from _compat
src/squidpy/datasets/_utils.py Added new download_file function using pooch as replacement for scanpy's deprecated download utility
src/squidpy/datasets/_10x_datasets.py Updated to use new download_file function instead of scanpy's check_presence_download

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Zethson Zethson changed the title Refactoring our imports refactor: Import fewer private modules Nov 29, 2025
Copy link
Copy Markdown
Member

@flying-sheep flying-sheep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Idk if _SET_DEFAULT_COLORS_FOR_CATEGORICAL_OBS_CHANGED has value over just try:/except ImportError since this a private API, otherwise looks great!

Some thoughts about pooch (all optional improvements, but since you’re switching to it anyway):

  • If the data you download tends to be big, consider setting progressbar=... in pooch.download, maybe even using tqdm.auto for notebook support.
  • Don’t you want to use caching? The way you use it now re-downloads stuff every time, no?

@selmanozleyen
Copy link
Copy Markdown
Member Author

I wouldn't want to use a try block since the ImportError could be from another module theoratically.

You are right about your points with pooch. I will have a second look

@selmanozleyen
Copy link
Copy Markdown
Member Author

@flying-sheep do you think its fine to merge this and then create a new issue for adding the hashes?

@selmanozleyen selmanozleyen self-assigned this Dec 2, 2025
@selmanozleyen selmanozleyen force-pushed the refactor/fix-private-imports branch from 7eb2588 to 1dac201 Compare December 2, 2025 11:05
@flying-sheep
Copy link
Copy Markdown
Member

I think that’s fine!

@timtreis
Copy link
Copy Markdown
Member

timtreis commented Dec 2, 2025

@selmanozleyen Does this touch the zarr.zip download function as well? Or will it automatically also use the new behaviour?

@selmanozleyen
Copy link
Copy Markdown
Member Author

selmanozleyen commented Dec 2, 2025

@timtreis until I add the hashes it will redownload everytime. But I think this is better because if the download was interupped in the old implementation you'd have to remove the existing file since it only checked for the existance of the file. But I can add the hashes very quickly

Copy link
Copy Markdown
Member

@timtreis timtreis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me, you can add with a subsequent PR

@selmanozleyen selmanozleyen enabled auto-merge (squash) December 2, 2025 12:35
@selmanozleyen selmanozleyen removed the request for review from Zethson December 2, 2025 12:44
@timtreis timtreis disabled auto-merge December 2, 2025 12:57
@timtreis timtreis merged commit 91dd574 into scverse:main Dec 2, 2025
13 checks passed
@timtreis
Copy link
Copy Markdown
Member

timtreis commented Dec 2, 2025

merged so we don't have to wait for that zombie action

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We shouldn't import private module or functions

4 participants