From: Minchan Kim <minchan@kernel.org>
To: Christian Brauner <brauner@kernel.org>
Cc: akpm@linux-foundation.org, david@kernel.org, mhocko@suse.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
surenb@google.com, timmurray@google.com
Subject: Re: [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag
Date: Thu, 16 Apr 2026 23:30:09 -0700 [thread overview]
Message-ID: <aeHTcXfCyCED4WSl@google.com> (raw)
In-Reply-To: <20260416-planktont-abwinken-b9499483b939@brauner>
On Thu, Apr 16, 2026 at 11:13:35AM +0200, Christian Brauner wrote:
> On Mon, Apr 13, 2026 at 03:39:48PM -0700, Minchan Kim wrote:
> > Currently, process_mrelease() requires userspace to send a SIGKILL signal
> > prior to invocation. This separation introduces a race window where the
> > victim task may receive the signal and enter the exit path before the
> > reaper can invoke process_mrelease().
> >
> > In this case, the victim task frees its memory via the standard, unoptimized
> > exit path, bypassing the expedited clean file folio reclamation optimization
> > introduced in the previous patch (which relies on the MMF_UNSTABLE flag).
> >
> > This patch introduces the PROCESS_MRELEASE_REAP_KILL UAPI flag to support
> > an integrated auto-kill mode. When specified, process_mrelease() directly
> > injects a SIGKILL into the target task.
> >
> > Crucially, this patch utilizes a dedicated signal code (KILL_MRELEASE)
> > during signal injection, belonging to a new SIGKILL si_codes section.
> > This special code ensures that the kernel's signal delivery path reliably
> > intercepts the request and marks the target address space as unstable
> > (MMF_UNSTABLE). This mechanism guarantees that the MMF_UNSTABLE flag is set
> > before either the victim task or the reaper proceeds, ensuring that the
> > expedited reclamation optimization is utilized regardless of scheduling
> > order.
> >
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> > include/uapi/asm-generic/siginfo.h | 6 ++++++
> > include/uapi/linux/mman.h | 4 ++++
> > kernel/signal.c | 4 ++++
> > mm/oom_kill.c | 20 +++++++++++++++++++-
> > 4 files changed, 33 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/uapi/asm-generic/siginfo.h b/include/uapi/asm-generic/siginfo.h
> > index 5a1ca43b5fc6..0f59b791dab4 100644
> > --- a/include/uapi/asm-generic/siginfo.h
> > +++ b/include/uapi/asm-generic/siginfo.h
> > @@ -252,6 +252,12 @@ typedef struct siginfo {
> > #define BUS_MCEERR_AO 5
> > #define NSIGBUS 5
> >
> > +/*
> > + * SIGKILL si_codes
> > + */
> > +#define KILL_MRELEASE 1 /* sent by process_mrelease */
> > +#define NSIGKILL 1
> > +
> > /*
> > * SIGTRAP si_codes
> > */
> > diff --git a/include/uapi/linux/mman.h b/include/uapi/linux/mman.h
> > index e89d00528f2f..4266976b45ad 100644
> > --- a/include/uapi/linux/mman.h
> > +++ b/include/uapi/linux/mman.h
> > @@ -56,4 +56,8 @@ struct cachestat {
> > __u64 nr_recently_evicted;
> > };
> >
> > +/* Flags for process_mrelease */
> > +#define PROCESS_MRELEASE_REAP_KILL (1 << 0)
> > +#define PROCESS_MRELEASE_VALID_FLAGS (PROCESS_MRELEASE_REAP_KILL)
> > +
> > #endif /* _UAPI_LINUX_MMAN_H */
> > diff --git a/kernel/signal.c b/kernel/signal.c
> > index d65d0fe24bfb..c21b2176dc5e 100644
> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -1134,6 +1134,10 @@ static int __send_signal_locked(int sig, struct kernel_siginfo *info,
> >
> > out_set:
> > signalfd_notify(t, sig);
> > +
> > + if (sig == SIGKILL && !is_si_special(info) &&
> > + info->si_code == KILL_MRELEASE && t->mm)
> > + mm_flags_set(MMF_UNSTABLE, t->mm);
> > sigaddset(&pending->signal, sig);
> >
> > /* Let multiprocess signals appear after on-going forks */
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 5c6c95c169ee..0b5da5208707 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -20,6 +20,8 @@
> >
> > #include <linux/oom.h>
> > #include <linux/mm.h>
> > +#include <uapi/linux/mman.h>
> > +#include <linux/capability.h>
> > #include <linux/err.h>
> > #include <linux/gfp.h>
> > #include <linux/sched.h>
> > @@ -1218,13 +1220,29 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
> > bool reap = false;
> > long ret = 0;
> >
> > - if (flags)
> > + if (flags & ~PROCESS_MRELEASE_VALID_FLAGS)
> > return -EINVAL;
> >
> > task = pidfd_get_task(pidfd, &f_flags);
> > if (IS_ERR(task))
> > return PTR_ERR(task);
> >
> > + if (flags & PROCESS_MRELEASE_REAP_KILL) {
> > + struct kernel_siginfo info;
> > +
> > + if (!capable(CAP_KILL)) {
>
> Why? Just call a function that uses check_kill_permission() before
> firing the signal? What's the rational for doing it this way?
Thanks for pointing that out. I wasn't aware of check_kill_permission().
I took a look at it, and it seems check_kill_permission() handles permissions
primarily for signals sent from userspace. Since we are injecting the signal
from the kernel side using a positive si_code (KILL_MRELEASE),
check_kill_permission() would just return 0 and skip the permission checks
entirely.
I am open to better ideas if there is a more standard way to handle permission
checks for kernel-injected signals.
>
> Tbh, I really hate that process_mrelease() now has a kill side effect
> with non-standard permission handling as well.
>
> Seems like bad api design. Why can't you just raise the MMF_UNSTABLE bit
> before the SIGKILL as that's the problem you're trying to solve.
The problem is that process_mrelease() strictly requires the target process
to already have a pending fatal signal or be in the exit path before it allows
any operation.
Therefore, we cannot invoke process_mrelease() to just set the MMF_UNSTABLE
flag *before* the SIGKILL is sent.
If I send the SIGKILL first to satisfy the process_mrelease() requirement,
we immediately run into the scheduling race condition where the victim can
enter the exit path before the reaper can set the flag.
This circular dependency is exactly why I had to integrate the kill operation
into process_mrelease() to make it atomic.
>
> > + ret = -EPERM;
> > + goto put_task;
> > + }
> > + clear_siginfo(&info);
> > + info.si_signo = SIGKILL;
> > + info.si_code = KILL_MRELEASE;
> > + info.si_pid = task_tgid_vnr(current);
> > + info.si_uid = from_kuid_munged(current_user_ns(), current_uid());
>
> This should not be open-coded like this.
Good point.
Maybe, I can reuse prepare_kill_siginfo.
>
> > +
> > + do_send_sig_info(SIGKILL, &info, task, PIDTYPE_TGID);
> > + }
> > +
> > /*
> > * Make sure to choose a thread which still has a reference to mm
> > * during the group exit
> > --
> > 2.54.0.rc0.605.g598a273b03-goog
> >
next prev parent reply other threads:[~2026-04-17 6:30 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-13 22:39 [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Minchan Kim
2026-04-13 22:39 ` [RFC 1/3] mm: process_mrelease: expedite clean file folio reclaim via mmu_gather Minchan Kim
2026-04-14 7:45 ` David Hildenbrand (Arm)
2026-04-14 20:21 ` Minchan Kim
2026-04-13 22:39 ` [RFC 2/3] mm: process_mrelease: skip LRU movement for exclusive file folios Minchan Kim
2026-04-14 7:20 ` David Hildenbrand (Arm)
2026-04-14 20:22 ` Minchan Kim
2026-04-13 22:39 ` [RFC 3/3] mm: process_mrelease: introduce PROCESS_MRELEASE_REAP_KILL flag Minchan Kim
2026-04-16 9:13 ` Christian Brauner
2026-04-17 6:30 ` Minchan Kim [this message]
2026-04-17 7:04 ` Michal Hocko
2026-04-14 6:57 ` [RFC 0/3] mm: process_mrelease: expedited reclaim and auto-kill support Michal Hocko
2026-04-14 20:00 ` Minchan Kim
2026-04-15 7:38 ` Michal Hocko
2026-04-15 23:26 ` Minchan Kim
2026-04-16 6:54 ` Michal Hocko
2026-04-17 6:20 ` Minchan Kim
2026-04-17 7:11 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aeHTcXfCyCED4WSl@google.com \
--to=minchan@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=surenb@google.com \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox