Add types to thunk modules#438
Add types to thunk modules#438bryevdv merged 11 commits intonv-legate:branch-22.07from bryevdv:bryanv/mypy_thunks
Conversation
| "DeferredArray", | ||
| output.runtime.create_empty_thunk( | ||
| flattened.shape, dtype=output.dtype, inputs=(flattened,) | ||
| ), |
There was a problem hiding this comment.
I think to improve this we would have to split create_empty_thunk somehow, to have more specific return types.
| if argpartition: | ||
| self.array = np.argpartition(rhs.array, kth, axis, kind, order) | ||
| self.array = np.argpartition( | ||
| rhs.array, kth, axis, kind, order # type: ignore |
There was a problem hiding this comment.
Just need a KindType I think but this PR is large enough as it is
| if self.deferred is None: | ||
| self.to_deferred_array() | ||
| return self.deferred.storage | ||
| return self.deferred.storage # type: ignore |
There was a problem hiding this comment.
mypy does not know self.deferred is no longer None after the call with side effects. If self.to_deferred_array() could instead return a value that is assigned, that would make the type flow clearer
| copy.add_input(store) | ||
| copy.add_source_indirect(index_array.base) | ||
| copy.add_output(result.base) | ||
| copy.add_output(result.base) # type: ignore |
There was a problem hiding this comment.
base does not exist on NumpyThunk, these are pervasive. Splitting create_empty_thunk might help resolve, or possibly the attributes can be brought into alignment between concrete thunk classes.
|
merged (b202261) to include updates for #414 (cc @mfoerste4) |
|
Merge again (eadfa98) to fix new conlicts, also updates for bits ops work. |
@ipdemes @magnatelee is there a fundamental issue with using a structured data type to represent I had previously toyed with using sub-array dtypes to represent Note that you can't actually have an array of sub-array dtype, NumPy will combine them into a single shape: |
This is correct; @bryevdv Would you like to fix this on this or a separate PR? Also, if this "singleton tuple" business is causing trouble for mypy, we could consider changing the interface. |
|
@manopapad It's only causing trouble for mypy in the sense that the legate type was initially defined too narrowly. It should be straightforward to update. I'll push a PR to legate today and then update this PR accordingly. Edit: submitted nv-legate/legate#306 have small commit ready to push to this PR as soon as it is merged. Edit2: this is done dde3fab |
@manopapad : we have a related issue for this: #385. I was planning to work on it soon. |
| runtime: Runtime, | ||
| array: Any, | ||
| parent: Optional[Any] = None, | ||
| key: Optional[Any] = None, |
There was a problem hiding this comment.
This could be further refined as "None or a tuple with a string at position 0 and anything else on positions 1+", but I don't think mypy supports such a type. We could change the code to take the same information in a nested tuple, i.e. something of type tuple[str,tuple[Any, ...]], which is within the capabilities of mypy.
There was a problem hiding this comment.
Sure but I'd prefer to keep this first PR as documentation of what currently exists (as best as possible) and leave actual code/implementation changes to dedicated PRs
There was a problem hiding this comment.
Yes, I'm leaving this comment as documentation for future PRs
| # Copy source array to the destination array | ||
| @auto_convert([1]) | ||
| def copy(self, rhs, deep=False): | ||
| def copy(self, rhs: Any, deep: bool = False) -> None: |
There was a problem hiding this comment.
The body of DeferredArray methods is written assuming that all NumPyThunk-type arguments are actually DeferredArrays. The @auto_convert decorator converts incoming EagerArray arguments to DeferredArray, thus allowing the method to handle any NumPyThunk. It would, therefore, be most accurate to write the function signature as:
copy(self, rhs: DeferredArray, deep: bool = False) -> None
meaning that mypy can analyze the method knowing that rhs will always be of type DeferredArray, as the code assumes.
However, we also need to somehow inform mypy that the presence of @auto_convert([1]) changes the type signature visible to the outside world to:
copy(self, rhs: NumPyThunk, deep: bool = False) -> None
If we just change the type signature to use DeferredArray, then mypy will not reason about what the decorator is doing, and will end up complaining that the method's signature doesn't match that on the base class (which claims to accept any NumPyThunk object).
There was a problem hiding this comment.
Currenty auto_convert specifies this
R = TypeVar("R")
P = ParamSpec("P")
def auto_convert(
indices: Collection[int], keys: Sequence[str] = []
) -> Callable[[Callable[P, R]], Callable[P, R]]:
ParamSpec is actually quite new, I will have to see there is any way to take apart and re-assemble just certain pieces of P in the callable return. I'm pretty sure this kind of type surgery would be possible in TypeScript, but I am a bit skeptical that auto_convert can be improved much, unless it can also be simplified some.
There was a problem hiding this comment.
Other approaches that would certainly work but are a bit more verbose:
- change
auto_convertto be a helper function rather than a decorator and use it inside the method explicitly - optionally split methods into
_copy(takes Deferred) andcopy(that takes NumpyThunk) that only calls the helper and then delegates to_copy.
This would be a little tedious but I think ultimately a little clearer,and there are only ~20 uses of auto_convert.
| def __init__( | ||
| self, | ||
| runtime: Runtime, | ||
| base: Any, |
There was a problem hiding this comment.
This is definitely of type Store, but setting that causes a bunch of warnings later.
(Leaving this here as a note for future work)
|
Fixed merge conflict with #465 in ee4f350 cc @manopapad please verify correct resolution |
manopapad
left a comment
There was a problem hiding this comment.
LGTM. Left some discussions unresolved, as pointers for future work
* Add types to thunk modules * updates for bits ops * fix scalar arg and storage types * remove array class attr on thunk * remove superflous dot method on EagerArray * remove mutable default
* Add types to thunk modules * updates for bits ops * fix scalar arg and storage types * remove array class attr on thunk * remove superflous dot method on EagerArray * remove mutable default
This PR adds mypy typings to the "thunk" modules:
thunk.py,eager.py, anddeferred.pyI will preface by saying this PR is not perfect, and has considerably more
# type: ignore,Anytypes, andcastcalls than I would normally like or accept. However, some fundamental problems caused the number of type issues to balloon (probably over 500 total just in these few files) and eventually I needed to just plough through to get a baseline of some improvement that would make a basis for further work. (This is probably the third time I have attempted to start typing these files...)The overriding complication in typing these modules is that, ostensibly, clients of
NumpyThunksubclasses should not have to know or care what subclass they are dealing with. But the subclassesEagerArrayandDeferredArrayand very much not-interchangeable. They have different attributes and methods and different codepaths make assumptions regarding these differences.Some of this may be improvable by renaming some properties e.g.
Eager.arrayandDeferred.baseto match on both classes. This turns out to be an involved change, and I did not want to add it to this already large and complicated PR. However I think more refactoring may be needed in order to reduce the number ofAnytypes (that was the primary solution to this issue, in lots of places).The string "dtype" for point types is very thorny. I chose to add a few ugly casts in a handful of places, rather than make the ~50 other updates that changing to
Union[str, np.dtype[Any]]would have necessitated. (It's my understanding that a real proper dtype is planned for points 🤞)lots of places pass
(dtype,)totask.add_scalar_argIf this is correct, then changes are needed on thelegateside. For now I have added (several)type: ignorecomments.Other comments inline. All in all I would suggest trying to merge this with minimal changes, and then making more incremental improvements in smaller PRs to follow.