DLPX-86177 Azure Accelerated networking broken because Mellanox drivers absent in kernel#27
Merged
palash-gandhi merged 1 commit intoMay 23, 2023
Conversation
81cf166 to
9cfeaeb
Compare
sebroy
approved these changes
May 23, 2023
prakashsurya
approved these changes
May 23, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
May 26, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 3, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 4, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 5, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
prakashsurya
pushed a commit
that referenced
this pull request
Jun 13, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
prakashsurya
pushed a commit
that referenced
this pull request
Aug 8, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 24, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 25, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 26, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 27, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 2, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 3, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 4, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 5, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 6, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 7, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 8, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
prakashsurya
pushed a commit
that referenced
this pull request
Sep 12, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 20, 2023
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
pcd1193182
pushed a commit
that referenced
this pull request
Feb 12, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
pcd1193182
pushed a commit
to pcd1193182/linux-kernel-gcp
that referenced
this pull request
Feb 21, 2024
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 2, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 3, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 9, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
grwilson
pushed a commit
that referenced
this pull request
Mar 23, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
prakashsurya
pushed a commit
that referenced
this pull request
Mar 26, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
jwk404
pushed a commit
that referenced
this pull request
Apr 14, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Apr 20, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
May 9, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
May 16, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
May 17, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 8, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 9, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jul 31, 2024
BugLink: https://bugs.launchpad.net/bugs/2067959 commit fff1386 upstream. Running a lot of VK CTS in parallel against nouveau, once every few hours you might see something like this crash. BUG: kernel NULL pointer dereference, address: 0000000000000008 PGD 8000000114e6e067 P4D 8000000114e6e067 PUD 109046067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 7 PID: 53891 Comm: deqp-vk Not tainted 6.8.0-rc6+ #27 Hardware name: Gigabyte Technology Co., Ltd. Z390 I AORUS PRO WIFI/Z390 I AORUS PRO WIFI-CF, BIOS F8 11/05/2021 RIP: 0010:gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] Code: c7 48 01 c8 49 89 45 58 85 d2 0f 84 95 00 00 00 41 0f b7 46 12 49 8b 7e 08 89 da 42 8d 2c f8 48 8b 47 08 41 83 c7 01 48 89 ee <48> 8b 40 08 ff d0 0f 1f 00 49 8b 7e 08 48 89 d9 48 8d 75 04 48 c1 RSP: 0000:ffffac20c5857838 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 00000000004d8001 RCX: 0000000000000001 RDX: 00000000004d8001 RSI: 00000000000006d8 RDI: ffffa07afe332180 RBP: 00000000000006d8 R08: ffffac20c5857ad0 R09: 0000000000ffff10 R10: 0000000000000001 R11: ffffa07af27e2de0 R12: 000000000000001c R13: ffffac20c5857ad0 R14: ffffa07a96fe9040 R15: 000000000000001c FS: 00007fe395eed7c0(0000) GS:ffffa07e2c980000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000011febe001 CR4: 00000000003706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: ... ? gp100_vmm_pgt_mem+0xe3/0x180 [nouveau] ? gp100_vmm_pgt_mem+0x37/0x180 [nouveau] nvkm_vmm_iter+0x351/0xa20 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] ? __lock_acquire+0x3ed/0x2170 ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_ptes_get_map+0xc2/0x100 [nouveau] ? __pfx_nvkm_vmm_ref_ptes+0x10/0x10 [nouveau] ? __pfx_gp100_vmm_pgt_mem+0x10/0x10 [nouveau] nvkm_vmm_map_locked+0x224/0x3a0 [nouveau] Adding any sort of useful debug usually makes it go away, so I hand wrote the function in a line, and debugged the asm. Every so often pt->memory->ptrs is NULL. This ptrs ptr is set in the nv50_instobj_acquire called from nvkm_kmap. If Thread A and Thread B both get to nv50_instobj_acquire around the same time, and Thread A hits the refcount_set line, and in lockstep thread B succeeds at refcount_inc_not_zero, there is a chance the ptrs value won't have been stored since refcount_set is unordered. Force a memory barrier here, I picked smp_mb, since we want it on all CPUs and it's write followed by a read. v2: use paired smp_rmb/smp_wmb. Cc: <stable@vger.kernel.org> Fixes: be55287 ("drm/nouveau/imem/nv50: embed nvkm_instobj directly into nv04_instobj") Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240411011510.2546857-1-airlied@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 1, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 6, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 15, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 22, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 23, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 14, 2024
BugLink: https://bugs.launchpad.net/bugs/2072617 [ Upstream commit 8987515 ] When request_irq() fails, error path calls vp_del_vqs(). There, as vq is present in the list, free_irq() is called for the same vector. That causes following splat: [ 0.414355] Trying to free already-free IRQ 27 [ 0.414403] WARNING: CPU: 1 PID: 1 at kernel/irq/manage.c:1899 free_irq+0x1a1/0x2d0 [ 0.414510] Modules linked in: [ 0.414540] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.9.0-rc4+ #27 [ 0.414540] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 [ 0.414540] RIP: 0010:free_irq+0x1a1/0x2d0 [ 0.414540] Code: 1e 00 48 83 c4 08 48 89 e8 5b 5d 41 5c 41 5d 41 5e 41 5f c3 cc cc cc cc 90 8b 74 24 04 48 c7 c7 98 80 6c b1 e8 00 c9 f7 ff 90 <0f> 0b 90 90 48 89 ee 4c 89 ef e8 e0 20 b8 00 49 8b 47 40 48 8b 40 [ 0.414540] RSP: 0000:ffffb71480013ae0 EFLAGS: 00010086 [ 0.414540] RAX: 0000000000000000 RBX: ffffa099c2722000 RCX: 0000000000000000 [ 0.414540] RDX: 0000000000000000 RSI: ffffb71480013998 RDI: 0000000000000001 [ 0.414540] RBP: 0000000000000246 R08: 00000000ffffdfff R09: 0000000000000001 [ 0.414540] R10: 00000000ffffdfff R11: ffffffffb18729c0 R12: ffffa099c1c91760 [ 0.414540] R13: ffffa099c1c916a4 R14: ffffa099c1d2f200 R15: ffffa099c1c91600 [ 0.414540] FS: 0000000000000000(0000) GS:ffffa099fec40000(0000) knlGS:0000000000000000 [ 0.414540] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 0.414540] CR2: 0000000000000000 CR3: 0000000008e3e001 CR4: 0000000000370ef0 [ 0.414540] Call Trace: [ 0.414540] <TASK> [ 0.414540] ? __warn+0x80/0x120 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] ? report_bug+0x164/0x190 [ 0.414540] ? handle_bug+0x3b/0x70 [ 0.414540] ? exc_invalid_op+0x17/0x70 [ 0.414540] ? asm_exc_invalid_op+0x1a/0x20 [ 0.414540] ? free_irq+0x1a1/0x2d0 [ 0.414540] vp_del_vqs+0xc1/0x220 [ 0.414540] vp_find_vqs_msix+0x305/0x470 [ 0.414540] vp_find_vqs+0x3e/0x1a0 [ 0.414540] vp_modern_find_vqs+0x1b/0x70 [ 0.414540] init_vqs+0x387/0x600 [ 0.414540] virtnet_probe+0x50a/0xc80 [ 0.414540] virtio_dev_probe+0x1e0/0x2b0 [ 0.414540] really_probe+0xc0/0x2c0 [ 0.414540] ? __pfx___driver_attach+0x10/0x10 [ 0.414540] __driver_probe_device+0x73/0x120 [ 0.414540] driver_probe_device+0x1f/0xe0 [ 0.414540] __driver_attach+0x88/0x180 [ 0.414540] bus_for_each_dev+0x85/0xd0 [ 0.414540] bus_add_driver+0xec/0x1f0 [ 0.414540] driver_register+0x59/0x100 [ 0.414540] ? __pfx_virtio_net_driver_init+0x10/0x10 [ 0.414540] virtio_net_driver_init+0x90/0xb0 [ 0.414540] do_one_initcall+0x58/0x230 [ 0.414540] kernel_init_freeable+0x1a3/0x2d0 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] kernel_init+0x1a/0x1c0 [ 0.414540] ret_from_fork+0x31/0x50 [ 0.414540] ? __pfx_kernel_init+0x10/0x10 [ 0.414540] ret_from_fork_asm+0x1a/0x30 [ 0.414540] </TASK> Fix this by calling deleting the current vq when request_irq() fails. Fixes: 0b0f9dc ("Revert "virtio_pci: use shared interrupts for virtqueues"") Signed-off-by: Jiri Pirko <jiri@nvidia.com> Message-Id: <20240426150845.3999481-1-jiri@resnulli.us> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
prakashsurya
pushed a commit
that referenced
this pull request
Sep 23, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 20, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 21, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
palash-gandhi
added a commit
that referenced
this pull request
Oct 24, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 10, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 11, 2024
…rs absent in kernel (#27) PR URL: https://www.github.com/delphix/linux-kernel-gcp/pull/27
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Side-port of delphix/linux-kernel-azure#14.