linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	 Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
	 Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Clark Williams <clrkwllms@kernel.org>,
	 Steven Rostedt <rostedt@goodmis.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	 linux-rt-devel@lists.linux.dev, stable@vger.kernel.org,
	 kernel test robot <oliver.sang@intel.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH mm-hotfixes] mm/page_alloc: prevent pcp corruption with SMP=n
Date: Wed, 7 Jan 2026 21:19:20 +0000	[thread overview]
Message-ID: <h33pscn2orhpb5dapttmo4vy4s3drfsjaahmjp4arsjjfngzno@bzbacqjvfe6r> (raw)
In-Reply-To: <20260105-fix-pcp-up-v1-1-5579662d2071@suse.cz>

On Mon, Jan 05, 2026 at 04:08:56PM +0100, Vlastimil Babka wrote:
> The kernel test robot has reported:
> 
>  BUG: spinlock trylock failure on UP on CPU#0, kcompactd0/28
>   lock: 0xffff888807e35ef0, .magic: dead4ead, .owner: kcompactd0/28, .owner_cpu: 0
>  CPU: 0 UID: 0 PID: 28 Comm: kcompactd0 Not tainted 6.18.0-rc5-00127-ga06157804399 #1 PREEMPT  8cc09ef94dcec767faa911515ce9e609c45db470
>  Call Trace:
>   <IRQ>
>   __dump_stack (lib/dump_stack.c:95)
>   dump_stack_lvl (lib/dump_stack.c:123)
>   dump_stack (lib/dump_stack.c:130)
>   spin_dump (kernel/locking/spinlock_debug.c:71)
>   do_raw_spin_trylock (kernel/locking/spinlock_debug.c:?)
>   _raw_spin_trylock (include/linux/spinlock_api_smp.h:89 kernel/locking/spinlock.c:138)
>   __free_frozen_pages (mm/page_alloc.c:2973)
>   ___free_pages (mm/page_alloc.c:5295)
>   __free_pages (mm/page_alloc.c:5334)
>   tlb_remove_table_rcu (include/linux/mm.h:? include/linux/mm.h:3122 include/asm-generic/tlb.h:220 mm/mmu_gather.c:227 mm/mmu_gather.c:290)
>   ? __cfi_tlb_remove_table_rcu (mm/mmu_gather.c:289)
>   ? rcu_core (kernel/rcu/tree.c:?)
>   rcu_core (include/linux/rcupdate.h:341 kernel/rcu/tree.c:2607 kernel/rcu/tree.c:2861)
>   rcu_core_si (kernel/rcu/tree.c:2879)
>   handle_softirqs (arch/x86/include/asm/jump_label.h:36 include/trace/events/irq.h:142 kernel/softirq.c:623)
>   __irq_exit_rcu (arch/x86/include/asm/jump_label.h:36 kernel/softirq.c:725)
>   irq_exit_rcu (kernel/softirq.c:741)
>   sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1052)
>   </IRQ>
>   <TASK>
>  RIP: 0010:_raw_spin_unlock_irqrestore (arch/x86/include/asm/preempt.h:95 include/linux/spinlock_api_smp.h:152 kernel/locking/spinlock.c:194)
>   free_pcppages_bulk (mm/page_alloc.c:1494)
>   drain_pages_zone (include/linux/spinlock.h:391 mm/page_alloc.c:2632)
>   __drain_all_pages (mm/page_alloc.c:2731)
>   drain_all_pages (mm/page_alloc.c:2747)
>   kcompactd (mm/compaction.c:3115)
>   kthread (kernel/kthread.c:465)
>   ? __cfi_kcompactd (mm/compaction.c:3166)
>   ? __cfi_kthread (kernel/kthread.c:412)
>   ret_from_fork (arch/x86/kernel/process.c:164)
>   ? __cfi_kthread (kernel/kthread.c:412)
>   ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
>   </TASK>
> 
> Matthew has analyzed the report and identified that in drain_page_zone()
> we are in a section protected by spin_lock(&pcp->lock) and then get an
> interrupt that attempts spin_trylock() on the same lock. The code is
> designed to work this way without disabling IRQs and occasionally fail
> the trylock with a fallback. However, the SMP=n spinlock implementation
> assumes spin_trylock() will always succeed, and thus it's normally a
> no-op. Here the enabled lock debugging catches the problem, but
> otherwise it could cause a corruption of the pcp structure.
> 
> The problem has been introduced by commit 574907741599 ("mm/page_alloc:
> leave IRQs enabled for per-cpu page allocations"). The pcp locking
> scheme recognizes the need for disabling IRQs to prevent nesting
> spin_trylock() sections on SMP=n, but the need to prevent the nesting in
> spin_lock() has not been recognized. Fix it by introducing local
> wrappers that change the spin_lock() to spin_lock_iqsave() with SMP=n
> and use them in all places that do spin_lock(&pcp->lock).
> 

Bah, correct.

> Fixes: 574907741599 ("mm/page_alloc: leave IRQs enabled for per-cpu page allocations")
> Cc: stable@vger.kernel.org
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202512101320.e2f2dd6f-lkp@intel.com
> Analyzed-by: Matthew Wilcox <willy@infradead.org>
> Link: https://lore.kernel.org/all/aUW05pyc9nZkvY-1@casper.infradead.org/
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
> This fix is intentionally made self-contained and not trying to expand
> upon the existing pcp[u]_spin() helpers. This is to make stable
> backports easier due to recent cleanups to that helpers.
> 
> We could follow up with a proper helpers integration going forward.
> However I think the assumptions SMP=n of the spinlock UP implementation
> are just wrong. It should be valid to do a spin_lock() without disabling
> irq's and rely on a nested spin_trylock() to fail. I will thus try
> proposing the remove the UP implementation first. It should be within
> the current trend of removing stuff that's optimized for a minority
> configuration if it makes maintainability of the majority worse.
> (c.f. recent scheduler SMP=n removal)

It would be fair. Maybe it'll take a performance hit because from a
maintenance perspective, it would be preferable. It's true that
spin_trylock within a lock protected region on UP is somewhat bogus, but
not impossible either. Even if the resulting code is buggy anyway, it
would be preferable to fail early than hide.

> ---
>  mm/page_alloc.c | 45 +++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 37 insertions(+), 8 deletions(-)
> 

With or without the renaming on top;

Acked-by: Mel Gorman <mgorman@techsingularity.net>

Thanks.

-- 
Mel Gorman
SUSE Labs


      parent reply	other threads:[~2026-01-07 21:19 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-05 15:08 Vlastimil Babka
2026-01-05 21:40 ` Steven Rostedt
2026-01-06  8:28   ` Vlastimil Babka
2026-01-06 15:21     ` Steven Rostedt
2026-01-07 21:19 ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=h33pscn2orhpb5dapttmo4vy4s3drfsjaahmjp4arsjjfngzno@bzbacqjvfe6r \
    --to=mgorman@techsingularity.net \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=clrkwllms@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=mhocko@suse.com \
    --cc=oliver.sang@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox