First implementation of single-GPU FFT using cuFFT by mferreravila · Pull Request #238 · nv-legate/cupynumeric

mferreravila · 2022-03-28T20:56:37Z

Add support for FFTs in single-GPU using cuFFT as back-end.

magnatelee · 2022-04-09T04:50:49Z

cunumeric/array.py

+            raise ValueError(
+                "Axis is out of bounds for array of size {}".format(self.ndim)
+            )
+        fft_axes = [x % self.ndim for x in fft_axes]


I guess if you did this first, then you could have avoided the np.abs call at line 2128.

Not really. axes = [-5, 10] is not a valid input and it should error out before sanitizing. Unless there is a compelling reason to deviate from numpy behavior?

I'm not suggesting we deviate from NumPy. I believe your checking code at line 2128 is incorrect because of the asymmetry between positive and negative indices in Python (which made me find it somewhat odd). Index n is invalid for a sequence of size n, but -n is valid, as it points to the first element. Line 2128 doesn't take this into account and will reject the latter. (Let me know if you think otherwise.) A usual way of handling negative indices in Python would be to add the size of the sequence to any negative indices and then check if they are still negative or greater than or equal to the size.

magnatelee · 2022-04-09T04:56:07Z

cunumeric/array.py

+        if user_sizes:
+            # Zero padding if any of the user sizes is larger than input
+            zeropad_input = self
+            if np.any(np.greater(fft_s, fft_input.shape)):


if you're constantly throwing these to numpy calls, why don't you just make numpy ndarrays from them? the conversion between a Python list and a NumPy ndarray may not be as cheap as you would expect.

magnatelee · 2022-04-09T04:57:07Z

cunumeric/array.py

+            fft_input_shape = list(fft_input.shape)
+            for idx, ax in enumerate(fft_axes):
+                fft_input_shape[ax] = s[idx]
+            fft_input_shape = tuple(fft_input_shape)


why is this necessary?

cunumeric/array.py

magnatelee · 2022-04-09T04:58:02Z

cunumeric/array.py

+            fft_input = ndarray(
+                shape=fft_input_shape,
+                thunk=zeropad_input._thunk.get_item(slices),
+            ).copy()


What's the reason for this copy?

From your comments on slack
(Feb 16):

what I’d do for a functional implementation is to make a padded array, fill it with zeros, slice it to a sub-array whose shape matches the original array’s, and do a copy between the two

(Feb 24):

for zero padding, I think you can do something like this: 1. create an empty deferred array A of the size 2. call fill to zero it out 3. slice A to match the shape of s and call copy

Tests fail without the copy. But if this copy is redundant, I'd be happy to amend it with your feedback. Maybe the thunk is not being assigned correctly?

I think I got confused by the fact that lines 2180-2183 already make a copy of the input. I think the copy at line 2193 should be moved to that if statement to make sure we make only one copy of the input.

magnatelee · 2022-04-09T05:25:55Z

src/cunumeric/fft/fft.cu

+                                      fftDirection direction)
+  {
+    const Point<DIM> zero = Point<DIM>::ZEROES();
+    CHECK_CUFFT(cufftXtExec(plan, (void*)in.ptr(zero), (void*)out.ptr(zero), (int)direction));


I don't think you need these castings to void*. That would happen implicitly. If you insist on casting them, please use static_cast. (I saw there are other places like this, so I'd like you to fix them all.)

magnatelee · 2022-04-09T05:27:23Z

src/cunumeric/fft/fft.cu

+  }
+
+  // Copy input to temporary buffer to perform FFTs one by one
+  DeferredBuffer<INPUT_TYPE, DIM> input_buffer(


Again, let's use create_buffer.

magnatelee · 2022-04-09T05:28:06Z

src/cunumeric/fft/fft.cu

+  for (auto& ax : axes) {
+    // Create the plan
+    cufftHandle plan;
+    CHECK_CUFFT(cufftCreate(&plan));


Again, let me know how the existing plan cache can replace this code.

magnatelee · 2022-04-09T05:28:37Z

src/cunumeric/fft/fft.cu

+                             num_elements_out * sizeof(OUTPUT_TYPE),
+                             cudaMemcpyDefault,
+                             stream));
+  CHECK_CUDA(cudaStreamSynchronize(stream));


You don't need this synchronization, as the runtime will do it for you.

magnatelee · 2022-04-09T05:29:43Z

src/cunumeric/fft/fft.cu

+
+// Perform the FFT operation as multiple 1D FFTs along the specified axes, single R2C/C2R operation.
+template <int DIM, typename OUTPUT_TYPE, typename INPUT_TYPE>
+__host__ static inline void cufft_over_axis_r2c_c2r(AccessorWO<OUTPUT_TYPE, DIM> out,


This function looks quite similar to cufft_over_axis_c2c. Can you think of a way to factor out the common parts?

* Added sizes to fft input * Added initial support for per-axis FFTs * Added axes sanitization * Added working implementation of zero padding and size truncation * Added normalization * Added hermitian transform functions * Clean-up of code * Fixed runtime issues * Fixed repeating axes * Work over axes (#3) * First working version for single axes * R2C working for 3D + axes * Added fixes for R2C + axes * Fixed C2R for 3D * Clean-up part I * Removed axes boolean in C++ * Some minor renaming and refactoring * Fixed several issues, moved FFT to its own module within cuNumeric * Refactored and expanded test lists * Added default values to public API * Added conversions to R2C/C2R * Added docstrings and odd type tests * Fixed issue when running C2C/Z2Z with real values * Further refactoring, removing unnecessary code * Addressed PR feedback, refactor / code cleaning * Added host synchronization on FFT with internal data copies * Fixed an issue that caused C2R to run over axes unnecessarily * Fixed issues after rebase * Final fixes from MR * Replaced manual stream creation with cached streams * Minor fixes from MR feedback * Minor fixes from last MR feedback

for more information, see https://pre-commit.ci

… branch-22.05

for more information, see https://pre-commit.ci

… branch-22.05

magnatelee · 2022-04-18T20:51:43Z

cunumeric/array.py

+
+        # Shape
+        fft_input = self
+        fft_input_shape = np.asarray(list(self.shape))


why do you create a list here? can't this be np.asarray(self.shape)?

magnatelee · 2022-04-18T20:51:50Z

cunumeric/array.py

+        # Shape
+        fft_input = self
+        fft_input_shape = np.asarray(list(self.shape))
+        fft_output_shape = np.asarray(list(self.shape))


magnatelee · 2022-04-18T20:54:14Z

cunumeric/array.py

+        # Normalization
+        fft_norm = FFTNormalization.from_string(norm)
+        do_normalization = any(
+            [


Use a tuple instead

magnatelee · 2022-04-18T20:55:05Z

cunumeric/config.py

+    @staticmethod
+    def real_to_complex_code(dtype):
+        if dtype == np.float64:
+            return FFT_D2Z()


I guess we could cache these objects in a dictionary. we don't want to create a fresh instance.

magnatelee · 2022-04-18T20:55:12Z

cunumeric/config.py

+        if dtype == np.float64:
+            return FFT_D2Z()
+        elif dtype == np.float32:
+            return FFT_R2C()


magnatelee · 2022-04-18T20:55:19Z

cunumeric/config.py

+    @staticmethod
+    def complex_to_real_code(dtype):
+        if dtype == np.complex128:
+            return FFT_Z2D()


magnatelee · 2022-04-18T20:55:27Z

cunumeric/config.py

+        if dtype == np.complex128:
+            return FFT_Z2D()
+        elif dtype == np.complex64:
+            return FFT_C2R()


magnatelee · 2022-04-18T20:59:55Z

cunumeric/config.py

+
+
+# Match these to fftType in fft_util.h
+class FFT_R2C:


I'd probably refactor these classes into one template that changes its properties based on the constructor arguments. you don't have to do that refactoring, but you're welcome to try.

What constructor arguments would you suggest?

I'll probably leave this to you, as you seem to have a clear idea of how you'd like these classes to look, and any changes on my end might steer this again into C++ territory.

Ok. I'll approve this PR and make some follow-up changes if you don't mind me doing it.

for more information, see https://pre-commit.ci

… branch-22.05

magnatelee · 2022-04-22T00:24:20Z

@mferreravila Like I said in the other comment, I'll take this over from you and polish it up, unless you're interested in finishing it up yourself. Just let me know.

* duplicate conda envs from cunumeric * remove old env file * update README

mferreravila changed the title ~~First iimplementation of single-GPU FFT using cuFFT~~ First implementation of single-GPU FFT using cuFFT Mar 28, 2022

magnatelee self-requested a review March 28, 2022 21:08

magnatelee reviewed Apr 9, 2022

View reviewed changes

mferreravila and others added 12 commits April 12, 2022 22:58

Added docstring to array.fft

f035a1d

Fixed style via pre-commit hooks

bd7b9fd

Added support for deferred FFT

9b56cdc

Added eager testing

e109b3a

[pre-commit.ci] auto fixes from pre-commit.com hooks

364e0b4

for more information, see https://pre-commit.ci

Added support for CPU environments via numpy work-around

698cb1f

Refactored and split testlist for tests/fft to unlock pipelines

676abf8

Removed unused warnings module

3683a9e

Fix missing broadcast issue in multi-gpu, multi-cpu

1506f53

First round of feedback from PR addressed

3365c77

Added missing fix after switch to create_buffer

3c8d3c8

mferreravila force-pushed the branch-22.05 branch from 447b990 to 3c8d3c8 Compare April 13, 2022 06:48

pre-commit-ci bot and others added 7 commits April 13, 2022 06:48

[pre-commit.ci] auto fixes from pre-commit.com hooks

b661702

for more information, see https://pre-commit.ci

Second round of feedback addressing

1fc6d58

Merge branch 'branch-22.05' of github.com:mferreravila/cunumeric into…

ac29d81

… branch-22.05

[pre-commit.ci] auto fixes from pre-commit.com hooks

976b195

for more information, see https://pre-commit.ci

Refactored the FFT codes as objects

c3bc2fa

Merge branch 'branch-22.05' of github.com:mferreravila/cunumeric into…

d53763d

… branch-22.05

Switched sizes and axes from lists to numpy arrays

5920ee8

magnatelee reviewed Apr 18, 2022

View reviewed changes

mferreravila and others added 6 commits April 21, 2022 16:26

Addressed feedback w.r.t. config FFT codes

2693cf4

Merge branch 'branch-22.05' into branch-22.05

414a261

[pre-commit.ci] auto fixes from pre-commit.com hooks

f3d6487

for more information, see https://pre-commit.ci

Fix issues after rebase/merge

3242bc3

Fix issue after rebase/merge

52147f1

[pre-commit.ci] auto fixes from pre-commit.com hooks

5f5212d

for more information, see https://pre-commit.ci

mferreravila added 2 commits April 21, 2022 16:46

Fix issue after rebase/merge, precommit hooks

bdee3dc

Merge branch 'branch-22.05' of github.com:mferreravila/cunumeric into…

9587ba3

… branch-22.05

magnatelee approved these changes Apr 22, 2022

View reviewed changes

magnatelee merged commit 9e829e6 into nv-legate:branch-22.05 Apr 22, 2022

ipdemes pushed a commit to ipdemes/cunumeric that referenced this pull request Jun 7, 2022

Misc small updates (nv-legate#238)

b05952d

* duplicate conda envs from cunumeric * remove old env file * update README

manopapad pushed a commit that referenced this pull request Nov 17, 2024

enhance coverage for _module/stats_order.py (#238)

8bda90f

Conversation

mferreravila commented Mar 28, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

magnatelee Apr 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

magnatelee commented Apr 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

magnatelee Apr 11, 2022 •

edited

Loading