Add types to thunk modules by bryevdv · Pull Request #438 · nv-legate/cupynumeric

bryevdv · 2022-06-30T20:53:13Z

This PR adds mypy typings to the "thunk" modules: thunk.py, eager.py, and deferred.py

I will preface by saying this PR is not perfect, and has considerably more # type: ignore, Any types, and cast calls than I would normally like or accept. However, some fundamental problems caused the number of type issues to balloon (probably over 500 total just in these few files) and eventually I needed to just plough through to get a baseline of some improvement that would make a basis for further work. (This is probably the third time I have attempted to start typing these files...)

The overriding complication in typing these modules is that, ostensibly, clients of NumpyThunk subclasses should not have to know or care what subclass they are dealing with. But the subclasses EagerArray and DeferredArray and very much not-interchangeable. They have different attributes and methods and different codepaths make assumptions regarding these differences.

Some of this may be improvable by renaming some properties e.g. Eager.array and Deferred.base to match on both classes. This turns out to be an involved change, and I did not want to add it to this already large and complicated PR. However I think more refactoring may be needed in order to reduce the number of Any types (that was the primary solution to this issue, in lots of places).
The string "dtype" for point types is very thorny. I chose to add a few ugly casts in a handful of places, rather than make the ~50 other updates that changing to Union[str, np.dtype[Any]] would have necessitated. (It's my understanding that a real proper dtype is planned for points 🤞)
lots of places pass (dtype,) to task.add_scalar_arg If this is correct, then changes are needed on the legate side. For now I have added (several) type: ignore comments.

Other comments inline. All in all I would suggest trying to merge this with minimal changes, and then making more incremental improvements in smaller PRs to follow.

cunumeric/random/random.py

bryevdv · 2022-06-30T20:56:03Z

cunumeric/sort.py

+        "DeferredArray",
+        output.runtime.create_empty_thunk(
+            flattened.shape, dtype=output.dtype, inputs=(flattened,)
+        ),


I think to improve this we would have to split create_empty_thunk somehow, to have more specific return types.

cunumeric/thunk.py

cunumeric/eager.py

bryevdv · 2022-06-30T21:02:29Z

cunumeric/eager.py

            if argpartition:
-                self.array = np.argpartition(rhs.array, kth, axis, kind, order)
+                self.array = np.argpartition(
+                    rhs.array, kth, axis, kind, order  # type:  ignore


Just need a KindType I think but this PR is large enough as it is

bryevdv · 2022-06-30T21:04:01Z

cunumeric/eager.py

        if self.deferred is None:
            self.to_deferred_array()
-        return self.deferred.storage
+        return self.deferred.storage  # type: ignore


mypy does not know self.deferred is no longer None after the call with side effects. If self.to_deferred_array() could instead return a value that is assigned, that would make the type flow clearer

cunumeric/deferred.py

bryevdv · 2022-06-30T21:05:39Z

cunumeric/deferred.py

                copy.add_input(store)
                copy.add_source_indirect(index_array.base)
-                copy.add_output(result.base)
+                copy.add_output(result.base)  # type: ignore


base does not exist on NumpyThunk, these are pervasive. Splitting create_empty_thunk might help resolve, or possibly the attributes can be brought into alignment between concrete thunk classes.

cunumeric/deferred.py

bryevdv · 2022-07-08T17:56:03Z

merged (b202261) to include updates for #414 (cc @mfoerste4)

bryevdv · 2022-07-19T16:59:31Z

Merge again (eadfa98) to fix new conlicts, also updates for bits ops work.

manopapad · 2022-07-20T17:52:02Z

It's my understanding that a real proper dtype is planned for points

@ipdemes @magnatelee is there a fundamental issue with using a structured data type to represent Point<N> arrays? I assume this is just to have a more meaningful identifier for this type (and for mypy's benefit), instead of using a string identifier (i.e. we're not talking about adding proper object-type array support to cunumeric).

I had previously toyed with using sub-array dtypes to represent Point<N>:

>>> import numpy as np
>>> t = np.dtype((np.int64, (4,)))
>>> t
dtype(('<i8', (4,)))

Note that you can't actually have an array of sub-array dtype, NumPy will combine them into a single shape:

>>> x = np.ones((3,), dtype=t)
>>> x.shape
(3, 4)
>>> x.dtype
dtype('int64')

manopapad · 2022-07-20T17:58:43Z

lots of places pass (dtype,) to task.add_scalar_arg If this is correct, then changes are needed on the legate side

This is correct; add_scalar_arg can accept singleton tuples of dtype, e.g. (np.float32,), to mean "add a scalar argument that is a vector of float32s". The size of the vector actually passed on any invocation of the task can be any length, it doesn't have to be a singleton.

@bryevdv Would you like to fix this on this or a separate PR? Also, if this "singleton tuple" business is causing trouble for mypy, we could consider changing the interface.

bryevdv · 2022-07-20T18:14:36Z

@manopapad It's only causing trouble for mypy in the sense that the legate type was initially defined too narrowly. It should be straightforward to update. I'll push a PR to legate today and then update this PR accordingly.

Edit: submitted nv-legate/legate#306 have small commit ready to push to this PR as soon as it is merged.

Edit2: this is done dde3fab

cunumeric/thunk.py

ipdemes · 2022-07-20T20:02:24Z

@ipdemes @magnatelee is there a fundamental issue with using a structured data type to represent Point arrays?

@manopapad : we have a related issue for this: #385. I was planning to work on it soon.

manopapad · 2022-07-21T06:31:59Z

cunumeric/eager.py

+        runtime: Runtime,
+        array: Any,
+        parent: Optional[Any] = None,
+        key: Optional[Any] = None,


This could be further refined as "None or a tuple with a string at position 0 and anything else on positions 1+", but I don't think mypy supports such a type. We could change the code to take the same information in a nested tuple, i.e. something of type tuple[str,tuple[Any, ...]], which is within the capabilities of mypy.

Sure but I'd prefer to keep this first PR as documentation of what currently exists (as best as possible) and leave actual code/implementation changes to dedicated PRs

Yes, I'm leaving this comment as documentation for future PRs

manopapad · 2022-07-21T06:45:15Z

cunumeric/deferred.py

    # Copy source array to the destination array
    @auto_convert([1])
-    def copy(self, rhs, deep=False):
+    def copy(self, rhs: Any, deep: bool = False) -> None:


The body of DeferredArray methods is written assuming that all NumPyThunk-type arguments are actually DeferredArrays. The @auto_convert decorator converts incoming EagerArray arguments to DeferredArray, thus allowing the method to handle any NumPyThunk. It would, therefore, be most accurate to write the function signature as:

copy(self, rhs: DeferredArray, deep: bool = False) -> None

meaning that mypy can analyze the method knowing that rhs will always be of type DeferredArray, as the code assumes.

However, we also need to somehow inform mypy that the presence of @auto_convert([1]) changes the type signature visible to the outside world to:

copy(self, rhs: NumPyThunk, deep: bool = False) -> None

If we just change the type signature to use DeferredArray, then mypy will not reason about what the decorator is doing, and will end up complaining that the method's signature doesn't match that on the base class (which claims to accept any NumPyThunk object).

Currenty auto_convert specifies this

R = TypeVar("R") P = ParamSpec("P") def auto_convert( indices: Collection[int], keys: Sequence[str] = [] ) -> Callable[[Callable[P, R]], Callable[P, R]]:

ParamSpec is actually quite new, I will have to see there is any way to take apart and re-assemble just certain pieces of P in the callable return. I'm pretty sure this kind of type surgery would be possible in TypeScript, but I am a bit skeptical that auto_convert can be improved much, unless it can also be simplified some.

Other approaches that would certainly work but are a bit more verbose:

change auto_convert to be a helper function rather than a decorator and use it inside the method explicitly

optionally split methods into _copy (takes Deferred) and copy (that takes NumpyThunk) that only calls the helper and then delegates to _copy.

This would be a little tedious but I think ultimately a little clearer,and there are only ~20 uses of auto_convert.

manopapad · 2022-07-21T16:39:19Z

cunumeric/deferred.py

+    def __init__(
+        self,
+        runtime: Runtime,
+        base: Any,


This is definitely of type Store, but setting that causes a bunch of warnings later.
(Leaving this here as a note for future work)

bryevdv · 2022-07-21T16:47:43Z

Fixed merge conflict with #465 in ee4f350 cc @manopapad please verify correct resolution

manopapad

LGTM. Left some discussions unresolved, as pointers for future work

* Add types to thunk modules * updates for bits ops * fix scalar arg and storage types * remove array class attr on thunk * remove superflous dot method on EagerArray * remove mutable default

Add types to thunk modules

5b01706

bryevdv commented Jun 30, 2022

View reviewed changes

Merge branch 'branch-22.07' into bryanv/mypy_thunks

b202261

bryevdv added 3 commits July 13, 2022 17:03

Merge branch 'branch-22.07' into bryanv/mypy_thunks

6f779ce

Merge branch 'branch-22.07' into bryanv/mypy_thunks

eadfa98

updates for bits ops

86777d5

manopapad self-requested a review July 20, 2022 17:19

bryevdv mentioned this pull request Jul 20, 2022

Broaden add_scalar_arg type nv-legate/legate#306

Merged

manopapad reviewed Jul 20, 2022

View reviewed changes

cunumeric/thunk.py Outdated Show resolved Hide resolved

manopapad reviewed Jul 20, 2022

View reviewed changes

cunumeric/thunk.py Outdated Show resolved Hide resolved

bryevdv added 2 commits July 20, 2022 13:10

fix scalar arg and storage types

dde3fab

remove array class attr on thunk

51424b0

manopapad reviewed Jul 21, 2022

View reviewed changes

bryevdv added 3 commits July 21, 2022 09:42

remove superflous dot method on EagerArray

0d92336

Merge branch 'branch-22.07' into bryanv/mypy_thunks

e7bb185

Merge branch 'branch-22.07' into bryanv/mypy_thunks

ee4f350

remove mutable default

1ae8fc9

manopapad approved these changes Jul 21, 2022

View reviewed changes

bryevdv merged commit d2a1126 into nv-legate:branch-22.07 Jul 21, 2022

bryevdv deleted the bryanv/mypy_thunks branch July 21, 2022 18:41

Conversation

bryevdv commented Jun 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bryevdv commented Jul 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryevdv commented Jul 19, 2022

Uh oh!

manopapad commented Jul 20, 2022

Uh oh!

manopapad commented Jul 20, 2022

Uh oh!

bryevdv commented Jul 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ipdemes commented Jul 20, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bryevdv Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bryevdv commented Jul 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

manopapad left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bryevdv commented Jun 30, 2022 •

edited

Loading

bryevdv commented Jul 8, 2022 •

edited

Loading

bryevdv commented Jul 20, 2022 •

edited

Loading

bryevdv Jul 21, 2022 •

edited

Loading

bryevdv commented Jul 21, 2022 •

edited

Loading