Skip to content

Make reduced-precision cuBLAS mode opt-in#519

Merged
manopapad merged 1 commit intonv-legate:branch-22.10from
manopapad:fast-math-opt-in
Aug 16, 2022
Merged

Make reduced-precision cuBLAS mode opt-in#519
manopapad merged 1 commit intonv-legate:branch-22.10from
manopapad:fast-math-opt-in

Conversation

@manopapad
Copy link
Contributor

No description provided.

@manopapad manopapad requested a review from magnatelee August 9, 2022 22:12
if (nullptr == disable_tensor_cores) {
// No request to disable tensor cores so turn them on
cublasStatus_t status = cublasSetMathMode(cublas_, CUBLAS_TENSOR_OP_MATH);
const char* fast_math = getenv("CUNUMERIC_FAST_MATH");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this against CUNUMERIC_DISABLE_TENSOR_CORES, because if we don't set anything then cuBLAS will stil use tensor cores, but only if that wouldn't result in precision loss.

const char* fast_math = getenv("CUNUMERIC_FAST_MATH");
if (fast_math != nullptr && atoi(fast_math) > 0) {
// Enable acceleration of single precision routines using TF32 tensor cores.
cublasStatus_t status = cublasSetMathMode(cublas_, CUBLAS_TF32_TENSOR_OP_MATH);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUBLAS_TENSOR_OP_MATH is deprecated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are TF32 tensor cores not used when this math mode isn't set? This table says that CUBLAS_DEFAULT_MATH can use tensor cores whenever possible, which might include TF32 tensor cores:

https://docs.nvidia.com/cuda/cublas/index.html#cublasmath_t

If we want to be absolutely sure about precision, I guess we should set CUBLAS_PEDANTIC_MATH in the else branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIU in CUBLAS_PEDANTIC_MATH mode tensor cores are disabled altogether, and (I assume) other mostly-safe optimizations like operation reordering.

It sounds like CUBLAS_DEFAULT_MATH is guaranteed to respect the (minimum) level of precision prescribed by the operation (e.g. cublassSgemmEx prescribes the use of fp32-wide intermediates) but can adjust upwards if it makes sense, and will use tensor cores where safe (in contrast to CUBLAS_TF32_TENSOR_OP_MATH, which allows use of tf32 intermediates for cublassSgemmEx, and CUBLAS_TENSOR_OP_MATH, which allows use of fp16 intermediates).

Therefore, I don't think we want CUBLAS_PEDANTIC_MATH to be the default. How about I just change the default to be CUBLAS_DEFAULT_MATH, and activate CUBLAS_PEDANTIC_MATH if the user passes CUNUMERIC_DISABLE_TENSOR_CORES? We can then pass this flag during testing, if necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just wanted to confirm that the default setting is sufficient to avoid the precision problem we're seeing on A100s. CUBLAS_PEDANTIC_MATH is unnecessary if CUBLAS_DEFAULT_MATH does the right thing already.

@manopapad manopapad merged commit 61c974b into nv-legate:branch-22.10 Aug 16, 2022
marcinz pushed a commit to marcinz/cunumeric that referenced this pull request Aug 16, 2022
sbak5 pushed a commit to sbak5/cunumeric that referenced this pull request Aug 17, 2022
marcinz added a commit that referenced this pull request Aug 17, 2022
Co-authored-by: Manolis Papadakis <manopapad@gmail.com>
@manopapad manopapad deleted the fast-math-opt-in branch July 19, 2023 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants