From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Waiman Long <llong@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Mike Rapoport <rppt@kernel.org>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev,
Wei Yang <richard.weiyang@gmail.com>,
David Hildenbrand <david@kernel.org>
Subject: Re: [PATCH] mm/mm_init: Don't call cond_resched() in deferred_init_memmap_chunk() if rcu_preempt_depth() set
Date: Thu, 22 Jan 2026 08:57:47 +0100 [thread overview]
Message-ID: <20260122075747.uSLrSJez@linutronix.de> (raw)
In-Reply-To: <13d0b8b5-1ba7-4a3e-a686-13a7b993d471@paulmck-laptop>
On 2026-01-21 13:27:32 [-0800], Paul E. McKenney wrote:
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -2085,7 +2085,12 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
> > > > spfn = chunk_end;
> > > > - if (irqs_disabled())
> > > > + /*
> > > > + * pgdat_resize_lock() only disables irqs in non-RT
> > > > + * kernels but calls rcu_read_lock() in a PREEMPT_RT
> > > > + * kernel.
> > > > + */
> > > > + if (irqs_disabled() || rcu_preempt_depth())
> > > > touch_nmi_watchdog();
> > > rcu_preempt_depth() seems a fairly internal low-level thing - it's
> > > rarely used.
If you acquire a lock from time to time and you pass a bool the let the
function below know whether scheduling is fine or not then it is
obvious. If you choose to check for symptoms of an acquired lock then
you have to use also the rarely used functions ;)
> > That is true. Beside the scheduler, workqueue also use rcu_preempt_depth().
> > This API is included in "include/linux/rcupdate.h" which is included
> > directly or indirectly by many kernel files. So even though it is rarely
> > used, but it is still a public API.
>
> It is a bit tricky, for example, given a kernel built with both
> CONFIG_PREEMPT_NONE=y and CONFIG_PREEMPT_DYNAMIC=y, it will never
> invoke touch_nmi_watchdog(), even if it really is in an RCU read-side
> critical section. This is because it was intended for lockdep-like use,
> where (for example) you don't want to complain about sleeping in an RCU
> read-side critical section unless you are 100% sure that you are in fact
> in an RCU read-side critical section.
>
> Maybe something like this?
>
> if (irqs_disabled() || !IS_ENABLED(CONFIG_PREEMPT_RCU) || rcu_preempt_depth())
> touch_nmi_watchdog();
I don't understand the PREEMPT_NONE+DYNAMIC reasoning. irqs_disabled()
should not be affected by this and rcu_preempt_depth() will be 0 for
!CONFIG_PREEMPT_RCU so I don't think this is required.
> This would *always* invoke touch_nmi_watchdog() for such kernels, which
> might or might not be OK.
>
> I freely confesss that I am not sure which of these is appropriate in
> this setting.
What about a more straight forward and obvious approach?
diff --git a/mm/mm_init.c b/mm/mm_init.c
index fc2a6f1e518f1..0b283fd48b282 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -2059,7 +2059,7 @@ static unsigned long __init deferred_init_pages(struct zone *zone,
*/
static unsigned long __init
deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
- struct zone *zone)
+ struct zone *zone, bool may_schedule)
{
int nid = zone_to_nid(zone);
unsigned long nr_pages = 0;
@@ -2085,10 +2085,10 @@ deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn,
spfn = chunk_end;
- if (irqs_disabled())
- touch_nmi_watchdog();
- else
+ if (may_schedule)
cond_resched();
+ else
+ touch_nmi_watchdog();
}
}
@@ -2101,7 +2101,7 @@ deferred_init_memmap_job(unsigned long start_pfn, unsigned long end_pfn,
{
struct zone *zone = arg;
- deferred_init_memmap_chunk(start_pfn, end_pfn, zone);
+ deferred_init_memmap_chunk(start_pfn, end_pfn, zone, true);
}
static unsigned int __init
@@ -2216,7 +2216,7 @@ bool __init deferred_grow_zone(struct zone *zone, unsigned int order)
for (spfn = first_deferred_pfn, epfn = SECTION_ALIGN_UP(spfn + 1);
nr_pages < nr_pages_needed && spfn < zone_end_pfn(zone);
spfn = epfn, epfn += PAGES_PER_SECTION) {
- nr_pages += deferred_init_memmap_chunk(spfn, epfn, zone);
+ nr_pages += deferred_init_memmap_chunk(spfn, epfn, zone, false);
}
/*
Wouldn't this work?
> Thanx, Paul
Sebastian
next prev parent reply other threads:[~2026-01-22 7:57 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-21 19:10 Waiman Long
2026-01-21 19:43 ` Andrew Morton
2026-01-21 20:07 ` Waiman Long
2026-01-21 21:27 ` Paul E. McKenney
2026-01-22 7:57 ` Sebastian Andrzej Siewior [this message]
2026-01-22 9:47 ` Mike Rapoport
2026-01-22 17:17 ` Paul E. McKenney
2026-01-22 17:59 ` Waiman Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260122075747.uSLrSJez@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=akpm@linux-foundation.org \
--cc=clrkwllms@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=llong@redhat.com \
--cc=paulmck@kernel.org \
--cc=richard.weiyang@gmail.com \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox