Skip to content

Speed up cost_distance iterative tile Dijkstra 2-4x#1023

Merged
brendancol merged 1 commit intomasterfrom
refactor/cost-distance-iterative-perf
Mar 18, 2026
Merged

Speed up cost_distance iterative tile Dijkstra 2-4x#1023
brendancol merged 1 commit intomasterfrom
refactor/cost-distance-iterative-perf

Conversation

@brendancol
Copy link
Contributor

@brendancol brendancol commented Mar 18, 2026

Summary

  • Batch-compute all dask tiles in a single dask.compute() call and cache them, replacing per-tile .compute() calls that re-executed the graph on every iteration
  • Assemble the final result eagerly from cached tiles instead of a second pass through da.map_blocks
  • Store friction boundary strips as float64 to skip repeated dtype conversion in _compute_seeds
  • Pass precomputed f_min from the dask+cupy fallback path to _cost_distance_dask, avoiding a redundant da.nanmin().compute()

Benchmarked improvement on the iterative (unbounded max_cost) dask path:

Config Before (s) After (s) Speedup Before (MB) After (MB) Mem saved
200x100 0.206 0.050 4.1x 1.90 0.80 58%
300x150 0.229 0.075 3.0x 2.91 1.38 53%
400x200 0.263 0.114 2.3x 4.02 2.19 46%

numpy and dask-bounded (map_overlap) paths are unchanged.

Test plan

  • All 44 existing test_cost_distance.py tests pass
  • Verify dask iterative results still match numpy reference on larger grids
  • Spot-check that dask+cupy fallback path passes _f_min correctly (requires GPU)

Batch-compute all dask tiles in a single scheduler pass and cache them
for reuse across iterations, replacing per-tile .compute() calls that
re-executed the dask graph each time. Store friction boundaries as
float64 to skip repeated dtype conversion. Assemble the final result
eagerly from cached tiles instead of through da.map_blocks. Pass
precomputed f_min from dask+cupy fallback to avoid a redundant
da.nanmin().compute().

Benchmarked improvement on the iterative (unbounded max_cost) path:
  200x100: 0.206s -> 0.050s (4.1x), 1.90MB -> 0.80MB (-58%)
  300x150: 0.229s -> 0.075s (3.0x), 2.91MB -> 1.38MB (-53%)
  400x200: 0.263s -> 0.114s (2.3x), 4.02MB -> 2.19MB (-46%)

numpy and dask-bounded (map_overlap) paths are unchanged.
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Mar 18, 2026
@brendancol brendancol merged commit 1670b01 into master Mar 18, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant