Skip to content
This repository was archived by the owner on Feb 24, 2026. It is now read-only.
This repository was archived by the owner on Feb 24, 2026. It is now read-only.

[BUG] Database or tuning conflict with Multi-GPU Environment #204

@LeiWang1999

Description

@LeiWang1999

When a program applies tensor parallelism, different rank may receive the same op_config, the tuning proc may become duplicated across different ranks, consider the following:

cpu 0: tune op0, save runtime into local database id 0
cpu 1: tune op0, save runtime into local database id 0 

This 0 -> 0 cross-overwriting process can potentially corrupt the local runtime module.

Maybe some bugs related to issue #186 .

Recommend solution:

  • save op into database with a spin locker.

TODO Items:

  • provide a test case to reproduce the bug
  • implement spin locks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions