linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gabriele Monaco <gmonaco@redhat.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, aubrey.li@linux.intel.com,
	yu.c.chen@intel.com,  Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Ingo Molnar <mingo@redhat.org>
Subject: Re: [PATCH v14 2/3] sched: Move task_mm_cid_work to mm timer
Date: Thu, 10 Jul 2025 15:40:36 +0200	[thread overview]
Message-ID: <fa7a3ea2c6326639911fbe49b86975f79db92372.camel@redhat.com> (raw)
In-Reply-To: <d8eacb24-af73-4580-8248-1fd1ac33e28f@efficios.com>



On Thu, 2025-07-10 at 09:23 -0400, Mathieu Desnoyers wrote:
> On 2025-07-10 00:56, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed "WARNING:inconsistent_lock_state" on:
> > 
> > commit: d06e66c6025e44136e6715d24c23fb821a415577 ("[PATCH v14 2/3]
> > sched: Move task_mm_cid_work to mm timer")
> > url:
> > https://github.com/intel-lab-lkp/linux/commits/Gabriele-Monaco/sched-Add-prev_sum_exec_runtime-support-for-RT-DL-and-SCX-classes/20250707-224959
> > patch link:
> > https://lore.kernel.org/all/20250707144824.117014-3-gmonaco@redhat.com/
> > patch subject: [PATCH v14 2/3] sched: Move task_mm_cid_work to mm
> > timer
> > 
> > in testcase: boot
> > 
> > config: x86_64-randconfig-003-20250708
> > compiler: gcc-11
> > test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp
> > 2 -m 16G
> > 
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > 
> > 
> > +-------------------------------------------------+------------+---
> > ---------+
> > >                                                 | 50c1dc07ee |
> > > d06e66c602 |
> > +-------------------------------------------------+------------+---
> > ---------+
> > > WARNING:inconsistent_lock_state                 | 0          |
> > > 12         |
> > > inconsistent{SOFTIRQ-ON-W}->{IN-SOFTIRQ-W}usage | 0          |
> > > 12         |
> > +-------------------------------------------------+------------+---
> > ---------+
> > 
> 
> I suspect the issue comes from calling mmdrop(mm) from timer context
> in a scenario
> where the mm_count can drop to 0.
> 
> This causes calls to pgd_free() and such to take the pgd_lock in
> softirq
> context, when in other cases it's taken with softirqs enabled.
> 
> See "mmdrop_sched()" for RT. I think we need something similar for
> the
> non-RT case, e.g. a:
> 
> static inline void __mmdrop_delayed(struct rcu_head *rhp)
> {
>          struct mm_struct *mm = container_of(rhp, struct mm_struct,
> delayed_drop);
> 
>          __mmdrop(mm);
> }
> 
> static inline void mmdrop_timer(struct mm_struct *mm)
> {
>          /* Provides a full memory barrier. See mmdrop() */
>          if (atomic_dec_and_test(&mm->mm_count))
>                  call_rcu(&mm->delayed_drop, __mmdrop_delayed);
> }
> 
> Thoughts ?
> 

Thanks for the suggestion.

I noticed the problem is in the mmdrop over there, but I'm seeing this
is getting unnecessarily complicated.
I'm not sure it's worth going down this path, also considering pushing
the timer wheel like this might end up in unintended effects like it
happened with the workqueue.

I am going to try the alternative approach of running the scan in
batches [1] still using a task_work but triggering it from
__rseq_handle_notify_resume like here.
If that works in the original usecase, I guess it's better to keep it
that way.

What do you think?

Thanks,
Gabriele

[1] -
https://lore.kernel.org/lkml/20250217112317.258716-1-gmonaco@redhat.com

> Thanks,
> 
> Mathieu
> 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a
> > new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Closes:
> > > https://lore.kernel.org/oe-lkp/202507100606.90787fe6-lkp@intel.com
> > 
> > 
> > [   26.556715][    C0] WARNING: inconsistent lock state
> > [   26.557127][    C0] 6.16.0-rc5-00002-gd06e66c6025e #1 Tainted:
> > G                T
> > [   26.557730][    C0] --------------------------------
> > [   26.558133][    C0] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-
> > W} usage.
> > [   26.558662][    C0] stdbuf/386 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > [ 26.559118][ C0] ffffffff870d4438 (pgd_lock){+.?.}-{3:3}, at:
> > pgd_free (arch/x86/mm/pgtable.c:67 arch/x86/mm/pgtable.c:98
> > arch/x86/mm/pgtable.c:379)
> > [   26.559786][    C0] {SOFTIRQ-ON-W} state was registered at:
> > [ 26.560232][ C0] mark_usage (kernel/locking/lockdep.c:4669)
> > [ 26.560561][ C0] __lock_acquire (kernel/locking/lockdep.c:5194)
> > [ 26.560929][ C0] lock_acquire (kernel/locking/lockdep.c:473
> > kernel/locking/lockdep.c:5873)
> > [ 26.561267][ C0] _raw_spin_lock
> > (include/linux/spinlock_api_smp.h:134
> > kernel/locking/spinlock.c:154)
> > [ 26.561617][ C0] pgd_alloc (arch/x86/mm/pgtable.c:86
> > arch/x86/mm/pgtable.c:353)
> > [ 26.561950][ C0] mm_init+0x64f/0xbfb
> > [ 26.562342][ C0] mm_alloc (kernel/fork.c:1109)
> > [ 26.562655][ C0] dma_resv_lockdep (drivers/dma-buf/dma-resv.c:784)
> > [ 26.563020][ C0] do_one_initcall (init/main.c:1274)
> > [ 26.563389][ C0] do_initcalls (init/main.c:1335 init/main.c:1352)
> > [ 26.563744][ C0] kernel_init_freeable (init/main.c:1588)
> > [ 26.564144][ C0] kernel_init (init/main.c:1476)
> > [ 26.564402][ C0] ret_from_fork (arch/x86/kernel/process.c:154)
> > [ 26.564633][ C0] ret_from_fork_asm (arch/x86/entry/entry_64.S:258)
> > [   26.564871][    C0] irq event stamp: 4774
> > [ 26.565070][ C0] hardirqs last enabled at (4774):
> > _raw_spin_unlock_irq (arch/x86/include/asm/irqflags.h:42
> > arch/x86/include/asm/irqflags.h:119
> > include/linux/spinlock_api_smp.h:159 kernel/locking/spinlock.c:202)
> > [ 26.565526][ C0] hardirqs last disabled at (4773):
> > _raw_spin_lock_irq (arch/x86/include/asm/preempt.h:80
> > include/linux/spinlock_api_smp.h:118 kernel/locking/spinlock.c:170)
> > [ 26.565971][ C0] softirqs last enabled at (4256): local_bh_enable
> > (include/linux/bottom_half.h:33)
> > [ 26.566408][ C0] softirqs last disabled at (4771): __do_softirq
> > (kernel/softirq.c:614)
> > [   26.566823][    C0]
> > [   26.566823][    C0] other info that might help us debug this:
> > [   26.567198][    C0]  Possible unsafe locking scenario:
> > [   26.567198][    C0]
> > [   26.567548][    C0]        CPU0
> > [   26.567709][    C0]        ----
> > [   26.567869][    C0]   lock(pgd_lock);
> > [   26.568060][    C0]   <Interrupt>
> > [   26.568255][    C0]     lock(pgd_lock);
> > [   26.568452][    C0]
> > [   26.568452][    C0]  *** DEADLOCK ***
> > [   26.568452][    C0]
> > [   26.568830][    C0] 3 locks held by stdbuf/386:
> > [ 26.569056][ C0] #0: ffff888170d5c1a8 (&sb->s_type-
> > >i_mutex_key){++++}-{4:4}, at: lookup_slow (fs/namei.c:1834)
> > [ 26.569535][ C0] #1: ffff888170cf5850 (&lockref->lock){+.+.}-
> > {3:3}, at: d_alloc (include/linux/dcache.h:319 fs/dcache.c:1777)
> > [ 26.569961][ C0] #2: ffffc90000007d40 ((&mm->cid_timer)){+.-.}-
> > {0:0}, at: call_timer_fn (kernel/time/timer.c:1744)
> > [   26.570421][    C0]
> > [   26.570421][    C0] stack backtrace:
> > [   26.570704][    C0] CPU: 0 UID: 0 PID: 386 Comm: stdbuf Tainted:
> > G                T   6.16.0-rc5-00002-gd06e66c6025e #1
> > PREEMPT(voluntary)  39c5cbdaf5b4eb171776daa7d42daa95c0766676
> > [   26.570716][    C0] Tainted: [T]=RANDSTRUCT
> > [   26.570719][    C0] Call Trace:
> > [   26.570723][    C0]  <IRQ>
> > [ 26.570727][ C0] dump_stack_lvl (lib/dump_stack.c:122
> > (discriminator 4))
> > [ 26.570735][ C0] dump_stack (lib/dump_stack.c:130)
> > [ 26.570740][ C0] print_usage_bug (kernel/locking/lockdep.c:4047)
> > [ 26.570748][ C0] valid_state (kernel/locking/lockdep.c:4060)
> > [ 26.570755][ C0] mark_lock_irq (kernel/locking/lockdep.c:4270)
> > [ 26.570762][ C0] ? save_trace (kernel/locking/lockdep.c:592)
> > [ 26.570773][ C0] ? mark_lock (kernel/locking/lockdep.c:4728
> > (discriminator 3))
> > [ 26.570780][ C0] mark_lock (kernel/locking/lockdep.c:4756)
> > [ 26.570787][ C0] mark_usage (kernel/locking/lockdep.c:4645)
> > [ 26.570796][ C0] __lock_acquire (kernel/locking/lockdep.c:5194)
> > [ 26.570804][ C0] lock_acquire (kernel/locking/lockdep.c:473
> > kernel/locking/lockdep.c:5873)
> > [ 26.570811][ C0] ? pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570822][ C0] ? validate_chain (kernel/locking/lockdep.c:3826
> > kernel/locking/lockdep.c:3879)
> > [ 26.570828][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570839][ C0] _raw_spin_lock
> > (include/linux/spinlock_api_smp.h:134
> > kernel/locking/spinlock.c:154)
> > [ 26.570845][ C0] ? pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570854][ C0] pgd_free (arch/x86/mm/pgtable.c:67
> > arch/x86/mm/pgtable.c:98 arch/x86/mm/pgtable.c:379)
> > [ 26.570863][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570873][ C0] __mmdrop (kernel/fork.c:681)
> > [ 26.570882][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570891][ C0] mmdrop (include/linux/sched/mm.h:55)
> > [ 26.570901][ C0] task_mm_cid_scan (kernel/sched/core.c:10619
> > (discriminator 3))
> > [ 26.570910][ C0] ? lock_is_held (include/linux/lockdep.h:249)
> > [ 26.570918][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570928][ C0] call_timer_fn (arch/x86/include/asm/atomic.h:23
> > include/linux/atomic/atomic-arch-fallback.h:457
> > include/linux/jump_label.h:262 include/trace/events/timer.h:127
> > kernel/time/timer.c:1748)
> > [ 26.570935][ C0] ? trace_timer_base_idle
> > (kernel/time/timer.c:1724)
> > [ 26.570943][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570953][ C0] ? wake_up_new_task (kernel/sched/core.c:10597)
> > [ 26.570962][ C0] __run_timers (kernel/time/timer.c:1799
> > kernel/time/timer.c:2372)
> > [ 26.570970][ C0] ? add_timer_global (kernel/time/timer.c:2343)
> > [ 26.570977][ C0] ? __kasan_check_write (mm/kasan/shadow.c:38)
> > [ 26.570988][ C0] ? do_raw_spin_lock
> > (arch/x86/include/asm/atomic.h:107 include/linux/atomic/atomic-
> > arch-fallback.h:2170 include/linux/atomic/atomic-
> > instrumented.h:1302 include/asm-generic/qspinlock.h:111
> > kernel/locking/spinlock_debug.c:116)
> > [ 26.570996][ C0] ? __raw_spin_lock_init
> > (kernel/locking/spinlock_debug.c:114)
> > [ 26.571006][ C0] __run_timer_base (kernel/time/timer.c:2385)
> > [ 26.571014][ C0] run_timer_base (kernel/time/timer.c:2394)
> > [ 26.571021][ C0] run_timer_softirq
> > (arch/x86/include/asm/atomic.h:23 include/linux/atomic/atomic-arch-
> > fallback.h:457 include/linux/jump_label.h:262
> > kernel/time/timer.c:342 kernel/time/timer.c:2406)
> > [ 26.571028][ C0] handle_softirqs (arch/x86/include/asm/atomic.h:23
> > include/linux/atomic/atomic-arch-fallback.h:457
> > include/linux/jump_label.h:262 include/trace/events/irq.h:142
> > kernel/softirq.c:580)
> > [ 26.571039][ C0] __do_softirq (kernel/softirq.c:614)
> > [ 26.571046][ C0] __irq_exit_rcu (kernel/softirq.c:453
> > kernel/softirq.c:680)
> > [ 26.571055][ C0] irq_exit_rcu (kernel/softirq.c:698)
> > [ 26.571064][ C0] sysvec_apic_timer_interrupt
> > (arch/x86/kernel/apic/apic.c:1050 arch/x86/kernel/apic/apic.c:1050)
> > [   26.571076][    C0]  </IRQ>
> > [   26.571078][    C0]  <TASK>
> > [ 26.571081][ C0] asm_sysvec_apic_timer_interrupt
> > (arch/x86/include/asm/idtentry.h:574)
> > [ 26.571088][ C0] RIP: 0010:d_alloc (fs/dcache.c:1778)
> > [ 26.571100][ C0] Code: 8d 7c 24 50 b8 ff ff 37 00 ff 83 f8 00 00
> > 00 48 89 fa 48 c1 e0 2a 48 c1 ea 03 80 3c 02 00 74 05 e8 5f f3 f6
> > ff 49 89 5c 24 50 <49> 8d bc 24 10 01 00 00 48 8d b3 20 01 00 00 e8
> > 87 bc ff ff 4c 89
> > All code
> > ========
> >     0:	8d 7c 24 50          	lea    0x50(%rsp),%edi
> >     4:	b8 ff ff 37 00       	mov    $0x37ffff,%eax
> >     9:	ff 83 f8 00 00 00    	incl   0xf8(%rbx)
> >     f:	48 89 fa             	mov    %rdi,%rdx
> >    12:	48 c1 e0 2a          	shl    $0x2a,%rax
> >    16:	48 c1 ea 03          	shr    $0x3,%rdx
> >    1a:	80 3c 02 00          	cmpb   $0x0,(%rdx,%rax,1)
> >    1e:	74 05                	je     0x25
> >    20:	e8 5f f3 f6 ff       	call   0xfffffffffff6f384
> >    25:	49 89 5c 24 50       	mov    %rbx,0x50(%r12)
> >    2a:*	49 8d bc 24 10 01 00 	lea   
> > 0x110(%r12),%rdi		<-- trapping instruction
> >    31:	00
> >    32:	48 8d b3 20 01 00 00 	lea    0x120(%rbx),%rsi
> >    39:	e8 87 bc ff ff       	call   0xffffffffffffbcc5
> >    3e:	4c                   	rex.WR
> >    3f:	89                   	.byte 0x89
> > 
> > Code starting with the faulting instruction
> > ===========================================
> >     0:	49 8d bc 24 10 01 00 	lea    0x110(%r12),%rdi
> >     7:	00
> >     8:	48 8d b3 20 01 00 00 	lea    0x120(%rbx),%rsi
> >     f:	e8 87 bc ff ff       	call   0xffffffffffffbc9b
> >    14:	4c                   	rex.WR
> >    15:	89                   	.byte 0x89
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20250710/202507100606.90787fe6-lkp@intel.com
> > 
> > 
> > 
> 



  reply	other threads:[~2025-07-10 13:40 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20250707144824.117014-1-gmonaco@redhat.com>
2025-07-07 14:48 ` Gabriele Monaco
2025-07-07 15:19   ` Mathieu Desnoyers
2025-07-10  4:56   ` kernel test robot
2025-07-10 13:23     ` Mathieu Desnoyers
2025-07-10 13:40       ` Gabriele Monaco [this message]
2025-07-10 14:18         ` Mathieu Desnoyers
2025-07-10 13:47     ` Gabriele Monaco

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fa7a3ea2c6326639911fbe49b86975f79db92372.camel@redhat.com \
    --to=gmonaco@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aubrey.li@linux.intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mingo@redhat.org \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox