Time partition#37
Merged
Merged
Conversation
Field test passes on first run — xr.open_mfdataset + isel(t=-1) + compute already produces a dask graph that reads bulk from exactly the indexed file (verified: 1 read of 'jeh' from pfd.000000010.bp). The investigation methodology in the spec turned out to be belt-and- suspenders; nothing to fix in the field stack. Particle test is marked xfail and confirms the diagnosed shape: 77 bulk reads (7 columns x 11 files) instead of 7 (1 file). Fix deferred per the spec. Co-Authored-By: Claude <noreply@anthropic.com>
New optional ListMetadata fields describing the partition layout of the underlying dask DataFrame. Used by Idx (next commit) to do partition pruning instead of a predicate filter when iseling along the partition dim. None defaults preserve existing behavior. Co-Authored-By: Claude <noreply@anthropic.com>
Track per-step partition layout in metadata so Idx (next commit) can do dask-native partition pruning. Subfile chunking is preserved — each step still has CONFIG.dask_chunk_size-bounded partitions; we just record the ranges. Co-Authored-By: Claude <noreply@anthropic.com>
Mirror of the change in particle_bp. Same shape, same intent. Co-Authored-By: Claude <noreply@anthropic.com>
When iseling along the partition_dim of a list, use df.partitions[...] to let dask prune the graph instead of df[df[dim] == pos], which forces every partition to be read to evaluate the predicate. For the prt-bin-time idx case on test-2d: bulk reads drop from 77 (7 columns x 11 files) to 7 (7 columns x 1 file). The test_idx_efficient particle case is no longer xfail. LazyList.compute() now clears partition_dim/partition_ranges since they describe the dask layout and are meaningless after materialization to a pandas frame. Co-Authored-By: Claude <noreply@anthropic.com>
Document the new ListMetadata fields and the loader invariant that keeps them in sync with the dd.DataFrame layout. Without this, a future loader implementer could silently lose Idx's partition-pruning optimization by forgetting to set them. Co-Authored-By: Claude <noreply@anthropic.com>
Mirror the existing --idx t=-1 tests with --pos t=999 (nearest resolves to the last file). Field passes; particle xfails for the same structural reason Idx did, fix in next commit. Co-Authored-By: Claude <noreply@anthropic.com>
Pos translates each coord-valued sel into an integer-index isel against the dim's coords and hands the dict to Idx. That picks up Idx's new partition-pruning behavior for free: --pos t=<value> on particles now reads bulk from exactly the nearest file's partitions, not all of them. Non-coord dims (e.g. filtering particle columns like px by value range) keep the predicate-filter path; Idx can't handle those since it needs coords for the isel translation. Idx is lazy-imported inside apply_list to dodge a circular import via lib.plotting.animated_plot -> idx. Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Somewhat brute-force a method for lazily indexing time in list data. This method emerged given the following constraints: