DLPX-83697 iscsi target login should wait until tx/rx threads have properly started#21
Merged
Merged
Conversation
sdimitro
approved these changes
Nov 9, 2022
Contributor
sdimitro
left a comment
There was a problem hiding this comment.
Could we add a link to the upstream discussion in the commit message? Just for future reference?
don-brady
approved these changes
Nov 14, 2022
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 17, 2022
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 18, 2022
delphix-devops-bot
pushed a commit
that referenced
this pull request
Dec 15, 2022
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jan 7, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jan 16, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Feb 10, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 4, 2023
BugLink: https://bugs.launchpad.net/bugs/2003914 [ Upstream commit 9de255c ] Commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") started calling disable_irq() without holding the spinlock because it can sleep. However, that fix introduced another bug: if interrupt is already disabled and a new disable request comes in, then the spinlock is not unlocked: root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# printf '\x00\x00\x00\x00' > /dev/uio0 root@localhost:~# [ 14.851538] BUG: scheduling while atomic: bash/223/0x00000002 [ 14.851991] Modules linked in: uio_dmem_genirq uio myfpga(OE) bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper drm snd_pcm ppdev joydev psmouse snd_timer snd e1000fb_sys_fops syscopyarea parport sysfillrect soundcore sysimgblt input_leds pcspkr i2c_piix4 serio_raw floppy evbug qemu_fw_cfg mac_hid pata_acpi ip_tables x_tables autofs4 [last unloaded: parport_pc] [ 14.854206] CPU: 0 PID: 223 Comm: bash Tainted: G OE 6.0.0-rc7 #21 [ 14.854786] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 14.855664] Call Trace: [ 14.855861] <TASK> [ 14.856025] dump_stack_lvl+0x4d/0x67 [ 14.856325] dump_stack+0x14/0x1a [ 14.856583] __schedule_bug.cold+0x4b/0x5c [ 14.856915] __schedule+0xe81/0x13d0 [ 14.857199] ? idr_find+0x13/0x20 [ 14.857456] ? get_work_pool+0x2d/0x50 [ 14.857756] ? __flush_work+0x233/0x280 [ 14.858068] ? __schedule+0xa95/0x13d0 [ 14.858307] ? idr_find+0x13/0x20 [ 14.858519] ? get_work_pool+0x2d/0x50 [ 14.858798] schedule+0x6c/0x100 [ 14.859009] schedule_hrtimeout_range_clock+0xff/0x110 [ 14.859335] ? tty_write_room+0x1f/0x30 [ 14.859598] ? n_tty_poll+0x1ec/0x220 [ 14.859830] ? tty_ldisc_deref+0x1a/0x20 [ 14.860090] schedule_hrtimeout_range+0x17/0x20 [ 14.860373] do_select+0x596/0x840 [ 14.860627] ? __kernel_text_address+0x16/0x50 [ 14.860954] ? poll_freewait+0xb0/0xb0 [ 14.861235] ? poll_freewait+0xb0/0xb0 [ 14.861517] ? rpm_resume+0x49d/0x780 [ 14.861798] ? common_interrupt+0x59/0xa0 [ 14.862127] ? asm_common_interrupt+0x2b/0x40 [ 14.862511] ? __uart_start.isra.0+0x61/0x70 [ 14.862902] ? __check_object_size+0x61/0x280 [ 14.863255] core_sys_select+0x1c6/0x400 [ 14.863575] ? vfs_write+0x1c9/0x3d0 [ 14.863853] ? vfs_write+0x1c9/0x3d0 [ 14.864121] ? _copy_from_user+0x45/0x70 [ 14.864526] do_pselect.constprop.0+0xb3/0xf0 [ 14.864893] ? do_syscall_64+0x6d/0x90 [ 14.865228] ? do_syscall_64+0x6d/0x90 [ 14.865556] __x64_sys_pselect6+0x76/0xa0 [ 14.865906] do_syscall_64+0x60/0x90 [ 14.866214] ? syscall_exit_to_user_mode+0x2a/0x50 [ 14.866640] ? do_syscall_64+0x6d/0x90 [ 14.866972] ? do_syscall_64+0x6d/0x90 [ 14.867286] ? do_syscall_64+0x6d/0x90 [ 14.867626] entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] stripped [ 14.872959] </TASK> ('myfpga' is a simple 'uio_dmem_genirq' driver I wrote to test this) The implementation of "uio_dmem_genirq" was based on "uio_pdrv_genirq" and it is used in a similar manner to the "uio_pdrv_genirq" driver with respect to interrupt configuration and handling. At the time "uio_dmem_genirq" was introduced, both had the same implementation of the 'uio_info' handlers irqcontrol() and handler(). Then commit 34cb275 ("UIO: Fix concurrency issue"), which was only applied to "uio_pdrv_genirq", ended up making them a little different. That commit, among other things, changed disable_irq() to disable_irq_nosync() in the implementation of irqcontrol(). The motivation there was to avoid a deadlock between irqcontrol() and handler(), since it added a spinlock in the irq handler, and disable_irq() waits for the completion of the irq handler. By changing disable_irq() to disable_irq_nosync() in irqcontrol(), we also avoid the sleeping-while-atomic bug that commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") was trying to fix. Thus, this fixes the missing unlock in irqcontrol() by importing the implementation of irqcontrol() handler from the "uio_pdrv_genirq" driver. In the end, it reverts commit b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") and change disable_irq() to disable_irq_nosync(). It is worth noting that this still does not address the concurrency issue fixed by commit 34cb275 ("UIO: Fix concurrency issue"). It will be addressed separately in the next commits. Split out from commit 34cb275 ("UIO: Fix concurrency issue"). Fixes: b743512 ("uio: fix a sleep-in-atomic-context bug in uio_dmem_genirq_irqcontrol()") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Link: https://lore.kernel.org/r/20220930224100.816175-2-rafaelmendsr@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 4, 2023
prakashsurya
pushed a commit
that referenced
this pull request
Mar 11, 2023
prakashsurya
pushed a commit
that referenced
this pull request
Mar 14, 2023
prakashsurya
pushed a commit
that referenced
this pull request
Mar 14, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 30, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Apr 20, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Apr 28, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
May 31, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 3, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 4, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jun 5, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 27, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 30, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Aug 31, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 1, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 2, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 3, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 6, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 7, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 8, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Sep 20, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 6, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 7, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 8, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 9, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 10, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Oct 21, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 1, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Nov 22, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Dec 9, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Dec 10, 2023
delphix-devops-bot
pushed a commit
that referenced
this pull request
Jan 27, 2024
delphix-devops-bot
pushed a commit
that referenced
this pull request
Feb 9, 2024
BugLink: https://bugs.launchpad.net/bugs/2043422 commit 0b0747d upstream. The following processes run into a deadlock. CPU 41 was waiting for CPU 29 to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29 was hung by that spinlock with IRQs disabled. PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd" !# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0 !# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0 !# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0 # 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0 # 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0 # 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0 # 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0 # 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0 # 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0 # 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0 #10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0 #11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0 #12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0 #13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0 #14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0 #15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0 #16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0 #17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0 #18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0 #19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd" !# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0 !# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0 # 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0 # 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0 # 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0 # 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0 # 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0 # 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0 # 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0 # 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0 #10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0 #11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0 #12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0 #13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0 #14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0 #15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0 #16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0 #17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0 The lock is used to synchronize different sysfs operations, it doesn't protect any resource that will be touched by an interrupt. Consequently it's not required to disable IRQs. Replace the spinlock with a mutex to fix the deadlock. Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com Reviewed-by: Mike Christie <michael.christie@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
delphix-devops-bot
pushed a commit
that referenced
this pull request
Feb 9, 2024
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 1, 2024
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 21, 2024
BugLink: https://bugs.launchpad.net/bugs/2050858 [ Upstream commit e3e82fc ] When creating ceq_0 during probing irdma, cqp.sc_cqp will be sent as a cqp_request to cqp->sc_cqp.sq_ring. If the request is pending when removing the irdma driver or unplugging its aux device, cqp.sc_cqp will be dereferenced as wrong struct in irdma_free_pending_cqp_request(). PID: 3669 TASK: ffff88aef892c000 CPU: 28 COMMAND: "kworker/28:0" #0 [fffffe0000549e38] crash_nmi_callback at ffffffff810e3a34 #1 [fffffe0000549e40] nmi_handle at ffffffff810788b2 #2 [fffffe0000549ea0] default_do_nmi at ffffffff8107938f #3 [fffffe0000549eb8] do_nmi at ffffffff81079582 #4 [fffffe0000549ef0] end_repeat_nmi at ffffffff82e016b4 [exception RIP: native_queued_spin_lock_slowpath+1291] RIP: ffffffff8127e72b RSP: ffff88aa841ef778 RFLAGS: 00000046 RAX: 0000000000000000 RBX: ffff88b01f849700 RCX: ffffffff8127e47e RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff83857ec0 RBP: ffff88afe3e4efc8 R8: ffffed15fc7c9dfa R9: ffffed15fc7c9dfa R10: 0000000000000001 R11: ffffed15fc7c9df9 R12: 0000000000740000 R13: ffff88b01f849708 R14: 0000000000000003 R15: ffffed1603f092e1 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000 -- <NMI exception stack> -- #5 [ffff88aa841ef778] native_queued_spin_lock_slowpath at ffffffff8127e72b #6 [ffff88aa841ef7b0] _raw_spin_lock_irqsave at ffffffff82c22aa4 #7 [ffff88aa841ef7c8] __wake_up_common_lock at ffffffff81257363 #8 [ffff88aa841ef888] irdma_free_pending_cqp_request at ffffffffa0ba12cc [irdma] #9 [ffff88aa841ef958] irdma_cleanup_pending_cqp_op at ffffffffa0ba1469 [irdma] #10 [ffff88aa841ef9c0] irdma_ctrl_deinit_hw at ffffffffa0b2989f [irdma] #11 [ffff88aa841efa28] irdma_remove at ffffffffa0b252df [irdma] #12 [ffff88aa841efae8] auxiliary_bus_remove at ffffffff8219afdb #13 [ffff88aa841efb00] device_release_driver_internal at ffffffff821882e6 #14 [ffff88aa841efb38] bus_remove_device at ffffffff82184278 #15 [ffff88aa841efb88] device_del at ffffffff82179d23 #16 [ffff88aa841efc48] ice_unplug_aux_dev at ffffffffa0eb1c14 [ice] #17 [ffff88aa841efc68] ice_service_task at ffffffffa0d88201 [ice] #18 [ffff88aa841efde8] process_one_work at ffffffff811c589a #19 [ffff88aa841efe60] worker_thread at ffffffff811c71ff #20 [ffff88aa841eff10] kthread at ffffffff811d87a0 #21 [ffff88aa841eff50] ret_from_fork at ffffffff82e0022f Fixes: 44d9e52 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20231130081415.891006-1-lishifeng@sangfor.com.cn Suggested-by: "Ismail, Mustafa" <mustafa.ismail@intel.com> Signed-off-by: Shifeng Li <lishifeng@sangfor.com.cn> Reviewed-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
delphix-devops-bot
pushed a commit
that referenced
this pull request
Mar 21, 2024
jwk404
pushed a commit
that referenced
this pull request
Apr 14, 2024
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
jwk404
pushed a commit
that referenced
this pull request
Apr 15, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a cherry-pick of an upstream linux kernel patch that is still in development. Because the issue being tested is difficult to reproduce, the patch has not been positively confirmed to resolve the issue; however, it looks like it should, according to our best theory of what is causing the bug. By integrating the patch now we can give it ample soak time before it goes into a release.
http://selfservice.jenkins.delphix.com/job/appliance-build-orchestrator-pre-push/3737