Re: [PATCH 2/2] arm64: mmu: use pagetable_alloc_nolock() while stop_machine()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yeoreum Yun <yeoreum.yun@arm.com>
To: Brendan Jackman <jackmanb@google.com>
Cc: akpm@linux-foundation.org, david@kernel.org,
	lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, ast@kernel.org, daniel@iogearbox.net,
	andrii@kernel.org, martin.lau@linux.dev, eddyz87@gmail.com,
	song@kernel.org, yonghong.song@linux.dev,
	john.fastabend@gmail.com, kpsingh@kernel.org, sdf@fomichev.me,
	haoluo@google.com, jolsa@kernel.org, hannes@cmpxchg.org,
	ziy@nvidia.com, bigeasy@linutronix.de, clrkwllms@kernel.org,
	rostedt@goodmis.org, catalin.marinas@arm.com, will@kernel.org,
	ryan.roberts@arm.com, kevin.brodsky@arm.com, dev.jain@arm.com,
	yang@os.amperecomputing.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	linux-rt-devel@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 2/2] arm64: mmu: use pagetable_alloc_nolock() while stop_machine()
Date: Tue, 16 Dec 2025 11:03:11 +0000	[thread overview]
Message-ID: <aUE8bwUVa6jSUft1@e129823.arm.com> (raw)
In-Reply-To: <DEZK5U2YP6I0.27VJHSVK14646@google.com>

Hi Brendan,

> On Mon Dec 15, 2025 at 10:06 AM UTC, Yeoreum Yun wrote:
> [snip]
> >> Overall I am feeling a bit uncomfortable about this use of _nolock, but
> >> I am also feeling pretty ignorant about PREEMPT_RT and also about this
> >> arm64 code, so I am hesitant to suggest alternatives, I hope someone
> >> else can offer some input here...
> >
> > I understand. However, as I mentioned earlier,
> > my main intention was to hear opinions specifically about memory contention.
> >
> > That said, if there is no memory contention,
> > I don’t think using the _nolock API is necessarily a bad approach.
>
>
> > In fact, I believe a bigger issue is that, under PREEMPT_RT,
> > code that uses the regular memory allocation APIs may give users the false impression
> > that those APIs are “safe to use,” even though they are not.
>
> Yeah, I share this concern. I would bet I have written code that's
> broken under PREEMPT_RT (luckily only in Google's kernel fork). The
> comment for GFP_ATOMIC says:
>
>  * %GFP_ATOMIC users can not sleep and need the allocation to succeed. A lower
>  * watermark is applied to allow access to "atomic reserves".
>  * The current implementation doesn't support NMI and few other strict
>  * non-preemptive contexts (e.g. raw_spin_lock). The same applies to %GFP_NOWAIT.
>
> It kinda sounds like it's supposed to be OK to use GFP_ATOMIC in a
> normal preempt_disable() context. So do you know exactly why it's
> invalid to use it in this stop_machine() context here? Maybe we need to
> update this comment.

In non-PREEMPT_RT configurations, this is fine to use.
However, in PREEMPT_RT, it should not be used because
spin_lock becomes a sleepable lock backed by an rt-mutex.

From Documentation/locking/locktypes.rst:

  The fact that PREEMPT_RT changes the lock category of spinlock_t and
  rwlock_t from spinning to sleeping.

As you know, all locks related to memory allocation
(e.g., zone_lock, PCP locks, etc.) use spin_lock,
which becomes sleepable under PREEMPT_RT.

The callback of stop_machine() is executed in a preemption-disabled context
(see cpu_stopper_thread()). In this context, if it fails to acquire a spinlock
during memory allocation,
the task would be able to go to sleep while preemption is disabled,
which is an obviously problematic situation.

> Or maybe actually we need to fix the allocator
> so that GFP_ATOMIC allocs are safe in this context?

I don’t think so, because GFP_ATOMIC can still be used by
IRQ threads in PREEMPT_RT. (If an IRQ handler is non-threaded
due to IRQF_NO_THREAD, then it cannot use it.)

Although the root cause appears to be that the memory allocator uses spin_lock(),
I believe the real issue is where such allocations are allowed to be used.
If we changed the lock to raw_spin_lock(),
this would introduce significant latency during memory allocation.
So I believe this is why the memory allocator continues to use spin_lock().

IOW, what I really want to ask is whether
general memory allocation/free operations are
permissible in a stop_machine() context
(I think nolock() can be allowable only).

Thanks

--
Sincerely,
Yeoreum Yun

next prev parent reply	other threads:[~2025-12-16 11:04 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-12 16:18 [PATCH 0/2] introduce pagetable_alloc_nolock() Yeoreum Yun
2025-12-12 16:18 ` [PATCH 1/2] mm: " Yeoreum Yun
2025-12-12 16:18 ` [PATCH 2/2] arm64: mmu: use pagetable_alloc_nolock() while stop_machine() Yeoreum Yun
2025-12-13  7:05   ` Brendan Jackman
2025-12-14  9:13     ` Yeoreum Yun
2025-12-15  9:22       ` Brendan Jackman
2025-12-15  9:34         ` Yeoreum Yun
2025-12-15  9:55           ` Brendan Jackman
2025-12-15 10:06             ` Yeoreum Yun
2025-12-16 10:10               ` Brendan Jackman
2025-12-16 11:03                 ` Yeoreum Yun [this message]
2025-12-16 11:26                   ` Brendan Jackman
2025-12-16 12:01                     ` Yeoreum Yun
2025-12-16 12:39                       ` Brendan Jackman
2025-12-16 13:25                         ` Yeoreum Yun
2025-12-18  9:30   ` Michal Hocko
2025-12-18  9:36     ` Yeoreum Yun
2025-12-18 12:02       ` Ryan Roberts
2025-12-18 12:17         ` Michal Hocko
2025-12-18 12:24           ` Yeoreum Yun
2025-12-16 15:11 ` [PATCH 0/2] introduce pagetable_alloc_nolock() Ryan Roberts
2025-12-16 16:52   ` Yeoreum Yun
2025-12-17  9:34     ` Ryan Roberts
2025-12-17 10:48       ` Yeoreum Yun
2025-12-17 12:04         ` Ryan Roberts
2025-12-17 12:52           ` Yeoreum Yun
2025-12-17 13:15             ` Vlastimil Babka
2025-12-17 13:35               ` Brendan Jackman
2025-12-17 13:56                 ` Yeoreum Yun
2025-12-17 15:10                 ` Vlastimil Babka
2025-12-17 17:19                   ` Brendan Jackman
2025-12-18  7:47                     ` Vlastimil Babka
2025-12-18  7:52                   ` David Hildenbrand (Red Hat)
2025-12-23 22:59           ` Yang Shi
2025-12-24  7:00             ` Yeoreum Yun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUE8bwUVa6jSUft1@e129823.arm.com \
    --to=yeoreum.yun@arm.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bigeasy@linutronix.de \
    --cc=bpf@vger.kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=clrkwllms@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=eddyz87@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=jackmanb@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kevin.brodsky@arm.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=martin.lau@linux.dev \
    --cc=mhocko@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=sdf@fomichev.me \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=will@kernel.org \
    --cc=yang@os.amperecomputing.com \
    --cc=yonghong.song@linux.dev \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox