Commit 28d2d62
drm/amdkfd: Fix lock dependency warning
[ Upstream commit 47bf0f8 ]
======================================================
WARNING: possible circular locking dependency detected
6.5.0-kfd-fkuehlin #276 Not tainted
------------------------------------------------------
kworker/8:2/2676 is trying to acquire lock:
ffff9435aae95c88 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}, at: __flush_work+0x52/0x550
but task is already holding lock:
ffff9435cd8e1720 (&svms->lock){+.+.}-{3:3}, at: svm_range_deferred_list_work+0xe8/0x340 [amdgpu]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&svms->lock){+.+.}-{3:3}:
__mutex_lock+0x97/0xd30
kfd_ioctl_alloc_memory_of_gpu+0x6d/0x3c0 [amdgpu]
kfd_ioctl+0x1b2/0x5d0 [amdgpu]
__x64_sys_ioctl+0x86/0xc0
do_syscall_64+0x39/0x80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
-> #1 (&mm->mmap_lock){++++}-{3:3}:
down_read+0x42/0x160
svm_range_evict_svm_bo_worker+0x8b/0x340 [amdgpu]
process_one_work+0x27a/0x540
worker_thread+0x53/0x3e0
kthread+0xeb/0x120
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x11/0x20
-> #0 ((work_completion)(&svm_bo->eviction_work)){+.+.}-{0:0}:
__lock_acquire+0x1426/0x2200
lock_acquire+0xc1/0x2b0
__flush_work+0x80/0x550
__cancel_work_timer+0x109/0x190
svm_range_bo_release+0xdc/0x1c0 [amdgpu]
svm_range_free+0x175/0x180 [amdgpu]
svm_range_deferred_list_work+0x15d/0x340 [amdgpu]
process_one_work+0x27a/0x540
worker_thread+0x53/0x3e0
kthread+0xeb/0x120
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x11/0x20
other info that might help us debug this:
Chain exists of:
(work_completion)(&svm_bo->eviction_work) --> &mm->mmap_lock --> &svms->lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&svms->lock);
lock(&mm->mmap_lock);
lock(&svms->lock);
lock((work_completion)(&svm_bo->eviction_work));
I believe this cannot really lead to a deadlock in practice, because
svm_range_evict_svm_bo_worker only takes the mmap_read_lock if the BO
refcount is non-0. That means it's impossible that svm_range_bo_release
is running concurrently. However, there is no good way to annotate this.
To avoid the problem, take a BO reference in
svm_range_schedule_evict_svm_bo instead of in the worker. That way it's
impossible for a BO to get freed while eviction work is pending and the
cancel_work_sync call in svm_range_bo_release can be eliminated.
v2: Use svm_bo_ref_unless_zero and explained why that's safe. Also
removed redundant checks that are already done in
amdkfd_fence_enable_signaling.
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>1 parent 6757fd7 commit 28d2d62
1 file changed
Lines changed: 10 additions & 16 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
391 | 391 | | |
392 | 392 | | |
393 | 393 | | |
394 | | - | |
395 | | - | |
396 | | - | |
397 | | - | |
398 | | - | |
| 394 | + | |
| 395 | + | |
399 | 396 | | |
400 | | - | |
401 | | - | |
402 | 397 | | |
403 | 398 | | |
404 | 399 | | |
| |||
3424 | 3419 | | |
3425 | 3420 | | |
3426 | 3421 | | |
3427 | | - | |
3428 | | - | |
3429 | | - | |
3430 | | - | |
3431 | | - | |
3432 | | - | |
3433 | | - | |
| 3422 | + | |
| 3423 | + | |
| 3424 | + | |
| 3425 | + | |
| 3426 | + | |
| 3427 | + | |
| 3428 | + | |
| 3429 | + | |
3434 | 3430 | | |
3435 | 3431 | | |
3436 | 3432 | | |
| |||
3445 | 3441 | | |
3446 | 3442 | | |
3447 | 3443 | | |
3448 | | - | |
3449 | | - | |
3450 | 3444 | | |
3451 | 3445 | | |
3452 | 3446 | | |
| |||
0 commit comments