Release v22.10.00 by marcinz · Pull Request #652 · nv-legate/cupynumeric

marcinz · 2022-10-11T20:00:46Z

No description provided.

@bryevdv

…nced indexing (#486) * fxing logic for some advanced_indexing test cases * Reformatting of new testcases by @bryevdv * Add new required test packages to conda env files Co-authored-by: Manolis Papadakis <manopapad@gmail.com>

* Refactor test driver for cpu/gpu sharding * fix -cunumeric:test * Add system info to top-level banner * make some methods functions for easier testing * add --debug to CPU jobs * don't special case verbsoe mode * add debug output to all jobs Co-authored-by: Manolis Papadakis <manopapad@gmail.com>

Conda packages now build with support for curand both in the CPU and the GPU builds. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

) * report times in summary lines * fix typo * Add an overall test suite summary * defer test output until test completion * remove -j 1 argument to test.sh * try bloat factor = 1.25 * fix default fbsize and bloat factor * specify fbmem in MB

* Unify the template for device reduction tree and do some cleanup * Fix performance bugs in scalar reduction kernels: * Use unsigned 64-bit integers instead of signed integers wherever possible; CUDA hasn't added an atomic intrinsic for the latter yet. * Move reduction buffers from zero-copy memory to framebuffer. This makes the slow atomic update code path in reduction operators run much more efficiently. * Use thew new scalar reduction buffer in binary reductions as well * Use only the RHS type in the reduction buffer as we never call apply * Minor clean up per review * Rename the buffer class and method to make the intent explicit * Flip the polarity of reduce's template parameter

Pre-commit update and the necessary fixes. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

updates: - [github.com/PyCQA/flake8: 5.0.2 → 5.0.4](PyCQA/flake8@5.0.2...5.0.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

This reverts commit b02c91f.

* Fix for an off-by-one bug * Shared memory size had not been passed to the kernel launch

* Ensure test.py --use flag fully overrides USE_* envvars * Update a test-tools unit test Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>

* Enhance two integration tests Enhance test_append and test_array_creation 1. add negative tests 2. add more test cases 3. refactor test code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments 1. Create test class for negative testing 2. Refactor out test functions 3. Use parameterize * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments - part2 1. update run_test name to check_array_method 2. use parameterize for step zero cases of arange * Address comments - Part 3 1. add pytest.mark.xfail for cases with expected failure 2. Small Fix: replace Assert with raising ValueError in deferred.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments - fix a typo Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* checkpoint array * Clean up cunumeric.tile * Disallow kind=None on cn.ndarray.argsort, to match cn.argsort * Avoid a cast * Update a docstring to match the inferred type signature * Fix handling of out=cn.ndarray in clip * add new missing types * Values of type CastingKind should never be None Value of this type are eventually fed to np.can_cast, which doesn't accept None. * Use np.ndarray.tobytes over the deprecated tostring * Minor fixes * Don't compare dtypes with `is`, but with == Doing the former can result in unexpected behavior, in the common case where one value is a proper np.dtype object, while the other is something that is not itself a np.dtype, but something convertible to it: >>> np.dtype(np.int64) is np.int64 False >>> np.dtype(np.int64) == np.int64 True Some of the uses in the modified code were actually safe, because a pre-existing array's dtype is always wrapped in a np.dtype object, but I changed them too, for the sake of consistency. * No need to call dtype.type if using == * Fix dtype= handling in _diag_helper * Copy NumPy's type signature for an unimplemented function It doesn't matter now, but we might have forgotten to change it to the more general signature when we got around to implementing this. Co-authored-by: Manolis Papadakis <manopapad@gmail.com>

* Update test runner for osx * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * lint * fix up tests, simplify manager creation Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Don't blindly trust user-supplied bincount.minlength * Change parameter name to match docstring

This is working around some recent changes to OpenBLAS. Previously we were using the internal names for functions, e.g. "spotrf_". OpenBLAS changed the definitions of these internal functions, so in a previous PR we switched to using the public functions, e.g. "LAPACK_spotrf". These used to be function symbols, but in the latest update OpenBLAS changed these to be macros.

…tion (#467) (#537) * fix reciprocal tests and add unary test customization * make tests deterministic and enforce root inputs are non-negative Co-authored-by: Jeremy <jjwilke@users.noreply.github.com>

* Refactor test runner to support more pinning options * add --gpu-delay option

* Make the validation condition for random distributions lenient * Fix typo * Catch too small standard variations against theoretical values as well * Replace unnecessary NumPy calls with Python primitives * Tighten the tolerance

…astype` (#549) * Fix buggy complex-to-bool conversions and add correctness tests for np.astype * Typo * Fix the bug in the eager implementation as well

* src/cunumeric: handle high number of bins in GPU bincount The existing bincount implementation on GPUs attempts to allocate a workspace for all bins within the shared memory available on each SM. This commit updates the implementation to fall back to a slower kernel that reduces to global memory when there are too many bins to fit into shared memory. Fixes #503. * src/cunumeric/stat: fix launch parameters in bincount kernels * tests/integration: refactor test_bincount.py for better readability * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci Co-authored-by: Rohan Yadav <rohany@cs.stanford.edu> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

fixing advanced indexing operation for empty arrays

…esults can diverge from numpy (#528) * Added note to prefix documentation for corner cases where cunumeric results can diverge from numpy. Also other minor fixes to prefix documentation. * Minor changes to documentation phrasing.

…gion fields (#551) * Handle inline allocations from 0D stores correctly (spoiler: they are not 0D) * Add a test case for 0D region-backed stores

…ckage installation (#514) * add initial CMake build * fix compile error * point to nv-legate repo * use realm_defines and legion_defines from the build dir if it's defined * update version * guard against RealmRuntime and LegionRuntime targets not existing * fix version number * fully support building without CUDA and OpenMP, detect support for both from legate_core target * use compiler cache to speed up tblis builds * toggle tblis openmp via CUNUMERIC_USE_OPENMP * print messages for CI * adjust -isystem flag to support clangd * Toggle CUDA, OpenMP, and bounds checking based on the found legate.core package's config * print message when legate_core is found * fix typo * use CMAKE_SHARED_LIBRARY_SUFFIX for tblis shared library * remove dot * handle case where build_shared_libs is off * Speed up FetchContent_Populate by downloading a tarball (if possible) instead of cloning * cleanup * make required CMake version match conda-forge's CMake * Use CPM to find or build OpenBLAS * only create alias targets if OpenBLAS was added * initial commit of CMake-based install.py * make install-2.py work with legate build dirs * place libraries in build/lib * add target to preprocess cunumeric_c.h for use with Python CFFI * ignore install dir * use the preprocessed cunumeric_c.h.i generated by CMake instead of doing it in Python * remove unused vars * fix gitlab archive URI for branches with slashes in the name * update rapids-cmake version * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add scikit-build * make install.py call pip install . * fix lint * remove debugging lines * clean up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix default legate branch * assume legion_core is already installed in build-install.sh example script * export LIBRARY_PATH if not set * resolve relative path in build scripts * formatting * apply Bryan's fixes for tests * don't use defaults * fix lint * use Readline so tab completion works * set CMAKE_BUILD_PARALLEL_LEVEL * build tblis on cmake --build instead of cmake configure * fix get_libpath * fix separate tblis configure/build stages to correctly link to libtblis.so * use add_custom_command so tblis isn't always rebuilt * use my legate.core branch temporarily in CI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * default branch and url in install.py temporarily * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * set optimization level -O2 * ensure CUDA architectures are detected correctly * add searchsorted sources * fix typos * install tblis if we built it * clean out tblis lib and include dirs * use --upgrade instead of --force-install * remove todo * find exact legate_core and cunumeric package versions * do pip install --upgrade if not editable * set REQUIRED if legate_core_ROOT is defined * Update conda recipe to use CMake * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add new source files * fix bad merge * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test legate_core_DIR/ROOT for truthy-ness * fix lint * add back in optional --legate argument to test.py * fix legate_path to be str instead of Path * fix lint * move make/cmake/ninja to build requirements * add build and runtime dependencies to dev conda envs * fix lint * remove legion_helpers.cmake * Enable using tblis_ROOT to find external tblis installations * add build and install export sets to rapids_cpm_find * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add separate build scripts to build with/without prebuilt legate.core * update cunumeric_cpp.cmake for new files * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix bad merge * export tblis_BINARY_DIR to PARENT_SCOPE * do not reference undefined env "_" in tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix tblis flag in install.py * fix flake8 issues on test_patch * mypy fixes * update conda-build/build.sh * add initial build directions * formatting fixes * more build information * add conda directions * exclude legate files from mypy again * update default legate core branch and repos * ensure sccache is used in conda build * allow sccache envvars from external environment * ensure SETUPTOOLS_ENABLE_FEATURES is set to "legacy-editable" * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * translate gpu name to cuda architectures * remove unnecessary cmake define * link to curand * fix install.py --with-core arg * Apply suggestions from code review * fix gitlab tgz urls * Apply suggestions from code review * don't link curand * use if(POLICY) * enable cmake policy 0135 * add extra target to update build.ninja mtime so rebuilding doesn't re-run CMake * remove easy-install.pth * ensure libcunumeric.so is found if installed into a non-standard install location * better handle --prefix flag, remove --python-only flag * infer legate_dir from an existing legate.core python install (including editable installs) when the user omits the --with-core flag * don't remove easy-install.pth * mirror flags in legate.core example build scripts * add argwhere sources * update mypy paths to ignore new location of install_info * ensure build dir is cleaned if the value of --build-isolation is different from last time we built * cmake cleanup * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add --max_dim --max_fields --spy --openmp --llvm --hdf --gasnet --gasnet_dir and --conduit flags in case cunumeric builds legate_core instead of finding it * define CUDAHOSTCXX envvar * define flags for debug and minsizerel build types * update package version * parse BUILD_MARCH and/or BUILD_MCPU configuration flags * add openmpi to conda envs * use correct dynamic library extension for other OS's * fix typo * add py.typed for mypy, fix typings * add wrap to sources list * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * remove unused # type: ignore comment * Update get_legate_core.cmake * Update install.py * Update install.py Co-authored-by: ptaylor <paul.e.taylor@me.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bryan Van de Ven <bryan@bokeh.org>

* Enhance test_block.py and test_eye.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix case for list in check/compare methods. * Fix typos. * Fix another typo. * Address comments only use pytest.raises to handle exceptions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address comments Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* add negative test case * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add negative test case * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add negative test case * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * correct bugs by upgrading the code path for the array_split function * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * correct bugs by upgrading the code path for the array_split function * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update the code path for the array_split function * add negative test case in test_array_split.py * add negative test case in test_array_split.py * add testcase for test_array_split.py * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add test case for test_array_split.py * add test cases for test_flip and test_indices * fix Eager execution test error * add test case for test_flip.py and test_indices.py * add test cases for test_fill.py and test_ndim.py * add test cases for test_fill.py and test_ndim.py * add test cases for test_fill.py and test_ndim.py * add test cases for test_fill.py and test_ndim.py Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Remove unneeded dependency on curand in conda build. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

Label checking delay is set to 5 minutes. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>

* Provenance tracking for cuNumeric operators * Use decorators for provenance tracking * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Filter out legate frames in the logic finding the last user frame Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Invoke eye with read-write privilege, not write-discard We cannot create tight region requirements, that include just the diagonal we are writing, so necessarily there will be elements in the regions we pass to the eye call whose values must remain. Write-discard privilege, then, is not appropriate for this call, as it essentially tells the runtime that it can throw away the previous contents of the entire region. * Add a comment explaining the eye task privilege

)

* Fix tests utils to make --directory work correctly. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use relative path to compare against skipped tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Change self.root_dir to Path type. * remove PurePath Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Enhance test_diag_indices.py and test_flatten.py. * Address comments. * Skip msg match.

…ng (#608)

* Enhance mask_indices and move_axis * Address comments.

This file was included unnecessarily, and led to build issues on distributed machines. In particular, including coll.h pulls in mpi.h, which is an unresolved header to NVCC. Signed-off-by: Rohan Yadav <rohany@alumni.cmu.edu> Signed-off-by: Rohan Yadav <rohany@alumni.cmu.edu>

Resolve conflicts

ipdemes and others added 30 commits August 3, 2022 19:48

Add support for curand conda package build (#510)

ab5d0d2

Conda packages now build with support for curand both in the CPU and the GPU builds. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

[pre-commit.ci] pre-commit autoupdate (#492)

dc5d26e

Pre-commit update and the necessary fixes. Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

Update version number in setup.py

007c422

[pre-commit.ci] pre-commit autoupdate (#517)

3e3209f

updates: - [github.com/PyCQA/flake8: 5.0.2 → 5.0.4](PyCQA/flake8@5.0.2...5.0.4) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

Don't use internal LAPACK function names (#522)

5d16c35

Update test runner for osx

b02c91f

Revert "Update test runner for osx"

3644d13

This reverts commit b02c91f.

Bug fixes for advanced indexing: (#532)

04430a8

* Fix for an off-by-one bug * Shared memory size had not been passed to the kernel launch

Ensure test.py --use flag fully overrides USE_* envvars (#524)

b4fbde3

* Ensure test.py --use flag fully overrides USE_* envvars * Update a test-tools unit test Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>

Don't blindly trust user-supplied bincount.minlength (#523)

9cdb59c

* Don't blindly trust user-supplied bincount.minlength * Change parameter name to match docstring

Make reduced-precision cuBLAS mode opt-in (#519)

61c974b

Fix reciprocal tests for zero values and improve test value customiza…

32831c1

…tion (#467) (#537) * fix reciprocal tests and add unary test customization * make tests deterministic and enforce root inputs are non-negative Co-authored-by: Jeremy <jjwilke@users.noreply.github.com>

Minor python typing fix

fb09ab7

fix mypy issue w/ np methods (#542)

f365820

Refactor test runner to support more pinning options (#535)

86b7a62

* Refactor test runner to support more pinning options * add --gpu-delay option

Remove dead code (#546)

4b9b857

Fix buggy complex-to-bool conversions and add correctness tests for `…

aeab190

…astype` (#549) * Fix buggy complex-to-bool conversions and add correctness tests for np.astype * Typo * Fix the bug in the eager implementation as well

fixing advanced indexing operation for empty arrays (#504)

ede9793

fixing advanced indexing operation for empty arrays

Construct NumPy arrays correctly from 0D deferred arrays backed by re…

a12f070

…gion fields (#551) * Handle inline allocations from 0D stores correctly (spoiler: they are not 0D) * Add a test case for 0D region-backed stores

jjwilke and others added 24 commits September 21, 2022 09:43

Allow casting in cn.dot, to match numpy's behavior (#598)

8cb82fa

Add the linalg.solve implementation to the cmake build (#603)

c238c0f

Remove run dependency on curand (#520)

91cd161

Remove unneeded dependency on curand in conda build. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

Delay label checking on fresh PRs (#607)

e262c27

Label checking delay is set to 5 minutes. Co-authored-by: Marcin Zalewski <mzalewski@nvidia.com>

Use Legion Fills when possible (#604)

b04ca61

Co-authored-by: Manolis Papadakis <mpapadakis@nvidia.com>

Support building with GASNet-Ex and MPI backends (#610)

68963d3

updating documentation (#614)

ce953ff

Fix a bug in scalar reduction launching kernels with empty domains (#606

b0c3dfd

)

Fix a compiler warning (#594)

cff746c

Enhance test_diag_indices.py and test_flatten.py. (#609)

ceed11a

* Enhance test_diag_indices.py and test_flatten.py. * Address comments. * Skip msg match.

cuNumeric doesn't need nested provenance tracking (#617)

af636e0

Add RuntimeError exception to legate.time (#618)

f200b7f

Stop instantiating min and max reduction ops for complex types (#621)

0db4b50

Mark temporary conversion outputs as linear for eager storage recycli…

86658c4

…ng (#608)

Make the negative test on fill robust across Python versions (#619)

dcef5d1

Enhance mask_indices and move_axis (#622)

26ef5b9

* Enhance mask_indices and move_axis * Address comments.

add missing docs symlink (#635)

187c0c0

marcinz added the category:task PR is a simple task and will not be included in release notes label Oct 11, 2022

marcinz and others added 2 commits October 11, 2022 15:05

Resolve conflicts with main

2236324

Resolve conflicts with main #653

b22329d

Resolve conflicts

marcinz merged commit 81ad156 into main Oct 12, 2022

manopapad pushed a commit that referenced this pull request Mar 17, 2025

Fix build has tests (#652)

0cb281a

mag1cp1n pushed a commit to mag1cp1n/cupynumeric that referenced this pull request Apr 11, 2025

Fix build has tests (nv-legate#652) (nv-legate#653)

4298c00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v22.10.00#652

Release v22.10.00#652
marcinz merged 82 commits intomainfrom
branch-22.10

marcinz commented Oct 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

marcinz commented Oct 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants