Skip to content

IBM benchmark crashes at runtime with AMD OpenMP offloading compiler on Frontier #1158

@sbryngelson

Description

@sbryngelson

Summary

The ibm benchmark case crashes immediately at simulation startup when compiled with the AMD compiler (-c famd) and run with OpenMP offloading (--gpu mp) on Frontier. The case compiles successfully — the failure is at runtime during the initial host-to-device data transfer.

The same case works fine with the CCE compiler (-c f).

Error

Simulating a case-optimized 576x288x288 case on 8 rank(s) with OpenMP offloading.
"PluginInterface" error: Failure to copy data from host to device. Pointers: host = 0x0000154c9a2f1010, device = 0x0000154c89200000, size = 0: "unknown or internal error" error in hsa_amd_memory_async_copy_on_engine: HSA_STATUS_ERROR_INVALID_ARGUMENT: One of the actual arguments does not meet a precondition stated in the documentation of the corresponding formal argument.
omptarget error: Copying data to device failed.
omptarget error: Consult https://openmp.llvm.org/design/Runtimes.html for debugging options.
omptarget error: Source location information not present. Compile with -g or -gline-tables-only.
omptarget fatal error 1: failure of target construct while offloading is mandatory
srun: error: frontier10350: tasks 4-7: Aborted (core dumped)

All 4 failing ranks show the same error with size = 0, suggesting a zero-sized omp target data transfer that the AMD HSA runtime rejects (HSA_STATUS_ERROR_INVALID_ARGUMENT) while CCE treats as a no-op.

Details

  • Case: benchmarks/ibm/case.py — 3D viscous 2-fluid case with immersed boundary sphere ("ib": "T")
  • Unique feature: IBM is the only benchmark case that enables the immersed boundary method; the other 4 benchmark cases all pass with AMD
  • Build: Case-optimized (--case-optimization), which compiles a custom case.fpp that eliminates dead code paths. This may produce zero-sized arrays for unused features that trigger the AMD runtime bug
  • syscheck: Passes (MPI + OMP GPU detection OK)
  • pre_process: Passes (576x288x288 on 8 ranks, 3.9s)
  • simulation: Crashes immediately on launch

Reproducer

On Frontier:

. ./mfc.sh load -c famd -m g
./mfc.sh run benchmarks/ibm/case.py --gpu mp --case-optimization -c frontier_amd -n 8

Possible root cause

Case optimization may compile certain IBM-related arrays to zero size. The AMD LLVM OpenMP runtime calls hsa_amd_memory_async_copy_on_engine with size = 0, which the HSA runtime rejects. CCE presumably skips zero-size copies. A debug build (-g or -gline-tables-only) would reveal which source line triggers the transfer.

Workaround

The AMD compiler has been removed from the benchmark workflow in #1135 until this is resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions