Skip to content

Add types to thunk modules#438

Merged
bryevdv merged 11 commits intonv-legate:branch-22.07from
bryevdv:bryanv/mypy_thunks
Jul 21, 2022
Merged

Add types to thunk modules#438
bryevdv merged 11 commits intonv-legate:branch-22.07from
bryevdv:bryanv/mypy_thunks

Conversation

@bryevdv
Copy link
Contributor

@bryevdv bryevdv commented Jun 30, 2022

This PR adds mypy typings to the "thunk" modules: thunk.py, eager.py, and deferred.py

I will preface by saying this PR is not perfect, and has considerably more # type: ignore, Any types, and cast calls than I would normally like or accept. However, some fundamental problems caused the number of type issues to balloon (probably over 500 total just in these few files) and eventually I needed to just plough through to get a baseline of some improvement that would make a basis for further work. (This is probably the third time I have attempted to start typing these files...)

  • The overriding complication in typing these modules is that, ostensibly, clients of NumpyThunk subclasses should not have to know or care what subclass they are dealing with. But the subclasses EagerArray and DeferredArray and very much not-interchangeable. They have different attributes and methods and different codepaths make assumptions regarding these differences.

    Some of this may be improvable by renaming some properties e.g. Eager.array and Deferred.base to match on both classes. This turns out to be an involved change, and I did not want to add it to this already large and complicated PR. However I think more refactoring may be needed in order to reduce the number of Any types (that was the primary solution to this issue, in lots of places).

  • The string "dtype" for point types is very thorny. I chose to add a few ugly casts in a handful of places, rather than make the ~50 other updates that changing to Union[str, np.dtype[Any]] would have necessitated. (It's my understanding that a real proper dtype is planned for points 🤞)

  • lots of places pass (dtype,) to task.add_scalar_arg If this is correct, then changes are needed on the legate side. For now I have added (several) type: ignore comments.

Other comments inline. All in all I would suggest trying to merge this with minimal changes, and then making more incremental improvements in smaller PRs to follow.

"DeferredArray",
output.runtime.create_empty_thunk(
flattened.shape, dtype=output.dtype, inputs=(flattened,)
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think to improve this we would have to split create_empty_thunk somehow, to have more specific return types.

if argpartition:
self.array = np.argpartition(rhs.array, kth, axis, kind, order)
self.array = np.argpartition(
rhs.array, kth, axis, kind, order # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just need a KindType I think but this PR is large enough as it is

if self.deferred is None:
self.to_deferred_array()
return self.deferred.storage
return self.deferred.storage # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy does not know self.deferred is no longer None after the call with side effects. If self.to_deferred_array() could instead return a value that is assigned, that would make the type flow clearer

copy.add_input(store)
copy.add_source_indirect(index_array.base)
copy.add_output(result.base)
copy.add_output(result.base) # type: ignore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base does not exist on NumpyThunk, these are pervasive. Splitting create_empty_thunk might help resolve, or possibly the attributes can be brought into alignment between concrete thunk classes.

@bryevdv
Copy link
Contributor Author

bryevdv commented Jul 8, 2022

merged (b202261) to include updates for #414 (cc @mfoerste4)

@bryevdv
Copy link
Contributor Author

bryevdv commented Jul 19, 2022

Merge again (eadfa98) to fix new conlicts, also updates for bits ops work.

@manopapad manopapad self-requested a review July 20, 2022 17:19
@manopapad
Copy link
Contributor

It's my understanding that a real proper dtype is planned for points

@ipdemes @magnatelee is there a fundamental issue with using a structured data type to represent Point<N> arrays? I assume this is just to have a more meaningful identifier for this type (and for mypy's benefit), instead of using a string identifier (i.e. we're not talking about adding proper object-type array support to cunumeric).

I had previously toyed with using sub-array dtypes to represent Point<N>:

>>> import numpy as np
>>> t = np.dtype((np.int64, (4,)))
>>> t
dtype(('<i8', (4,)))

Note that you can't actually have an array of sub-array dtype, NumPy will combine them into a single shape:

>>> x = np.ones((3,), dtype=t)
>>> x.shape
(3, 4)
>>> x.dtype
dtype('int64')

@manopapad
Copy link
Contributor

lots of places pass (dtype,) to task.add_scalar_arg If this is correct, then changes are needed on the legate side

This is correct; add_scalar_arg can accept singleton tuples of dtype, e.g. (np.float32,), to mean "add a scalar argument that is a vector of float32s". The size of the vector actually passed on any invocation of the task can be any length, it doesn't have to be a singleton.

@bryevdv Would you like to fix this on this or a separate PR? Also, if this "singleton tuple" business is causing trouble for mypy, we could consider changing the interface.

@bryevdv
Copy link
Contributor Author

bryevdv commented Jul 20, 2022

@manopapad It's only causing trouble for mypy in the sense that the legate type was initially defined too narrowly. It should be straightforward to update. I'll push a PR to legate today and then update this PR accordingly.

Edit: submitted nv-legate/legate#306 have small commit ready to push to this PR as soon as it is merged.

Edit2: this is done dde3fab

@ipdemes
Copy link
Contributor

ipdemes commented Jul 20, 2022

@ipdemes @magnatelee is there a fundamental issue with using a structured data type to represent Point arrays?

@manopapad : we have a related issue for this: #385. I was planning to work on it soon.

runtime: Runtime,
array: Any,
parent: Optional[Any] = None,
key: Optional[Any] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be further refined as "None or a tuple with a string at position 0 and anything else on positions 1+", but I don't think mypy supports such a type. We could change the code to take the same information in a nested tuple, i.e. something of type tuple[str,tuple[Any, ...]], which is within the capabilities of mypy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but I'd prefer to keep this first PR as documentation of what currently exists (as best as possible) and leave actual code/implementation changes to dedicated PRs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm leaving this comment as documentation for future PRs

# Copy source array to the destination array
@auto_convert([1])
def copy(self, rhs, deep=False):
def copy(self, rhs: Any, deep: bool = False) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body of DeferredArray methods is written assuming that all NumPyThunk-type arguments are actually DeferredArrays. The @auto_convert decorator converts incoming EagerArray arguments to DeferredArray, thus allowing the method to handle any NumPyThunk. It would, therefore, be most accurate to write the function signature as:

copy(self, rhs: DeferredArray, deep: bool = False) -> None

meaning that mypy can analyze the method knowing that rhs will always be of type DeferredArray, as the code assumes.

However, we also need to somehow inform mypy that the presence of @auto_convert([1]) changes the type signature visible to the outside world to:

copy(self, rhs: NumPyThunk, deep: bool = False) -> None

If we just change the type signature to use DeferredArray, then mypy will not reason about what the decorator is doing, and will end up complaining that the method's signature doesn't match that on the base class (which claims to accept any NumPyThunk object).

Copy link
Contributor Author

@bryevdv bryevdv Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currenty auto_convert specifies this

R = TypeVar("R")
P = ParamSpec("P")

def auto_convert(
    indices: Collection[int], keys: Sequence[str] = []
) -> Callable[[Callable[P, R]], Callable[P, R]]:

ParamSpec is actually quite new, I will have to see there is any way to take apart and re-assemble just certain pieces of P in the callable return. I'm pretty sure this kind of type surgery would be possible in TypeScript, but I am a bit skeptical that auto_convert can be improved much, unless it can also be simplified some.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other approaches that would certainly work but are a bit more verbose:

  • change auto_convert to be a helper function rather than a decorator and use it inside the method explicitly
  • optionally split methods into _copy (takes Deferred) and copy (that takes NumpyThunk) that only calls the helper and then delegates to _copy.

This would be a little tedious but I think ultimately a little clearer,and there are only ~20 uses of auto_convert.

def __init__(
self,
runtime: Runtime,
base: Any,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely of type Store, but setting that causes a bunch of warnings later.
(Leaving this here as a note for future work)

@bryevdv
Copy link
Contributor Author

bryevdv commented Jul 21, 2022

Fixed merge conflict with #465 in ee4f350 cc @manopapad please verify correct resolution

Copy link
Contributor

@manopapad manopapad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Left some discussions unresolved, as pointers for future work

@bryevdv bryevdv merged commit d2a1126 into nv-legate:branch-22.07 Jul 21, 2022
@bryevdv bryevdv deleted the bryanv/mypy_thunks branch July 21, 2022 18:41
jjwilke pushed a commit to jjwilke/cunumeric that referenced this pull request Jul 29, 2022
* Add types to thunk modules

* updates for bits ops

* fix scalar arg and storage types

* remove array class attr on thunk

* remove superflous dot method on EagerArray

* remove mutable default
jjwilke pushed a commit to jjwilke/cunumeric that referenced this pull request Jul 29, 2022
* Add types to thunk modules

* updates for bits ops

* fix scalar arg and storage types

* remove array class attr on thunk

* remove superflous dot method on EagerArray

* remove mutable default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants