From: Suren Baghdasaryan <surenb@google.com>
To: Florian Weimer <fweimer@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>, Michal Hocko <mhocko@suse.com>,
David Rientjes <rientjes@google.com>,
Matthew Wilcox <willy@infradead.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>, Rik van Riel <riel@surriel.com>,
Minchan Kim <minchan@kernel.org>,
Christian Brauner <christian@brauner.io>,
Christoph Hellwig <hch@infradead.org>,
Oleg Nesterov <oleg@redhat.com>,
David Hildenbrand <david@redhat.com>,
Jann Horn <jannh@google.com>, Shakeel Butt <shakeelb@google.com>,
Tim Murray <timmurray@google.com>,
Linux API <linux-api@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
kernel-team <kernel-team@android.com>
Subject: Re: [PATCH 1/1] mm: introduce process_reap system call
Date: Wed, 7 Jul 2021 23:39:34 -0700 [thread overview]
Message-ID: <CAJuCfpEUXz-oHi5Ho8nGAKtFV6ArQDx9yQwrdTzYgHr5+6=YaQ@mail.gmail.com> (raw)
In-Reply-To: <87zguxxrfl.fsf@oldenburg.str.redhat.com>
On Wed, Jul 7, 2021 at 11:15 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Suren Baghdasaryan:
>
> > On Wed, Jul 7, 2021 at 10:41 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Suren Baghdasaryan:
> >>
> >> > On Wed, Jul 7, 2021 at 2:47 AM Florian Weimer <fweimer@redhat.com> wrote:
> >> >>
> >> >> * Suren Baghdasaryan:
> >> >>
> >> >> > The API is as follows,
> >> >> >
> >> >> > int process_reap(int pidfd, unsigned int flags);
> >> >> >
> >> >> > DESCRIPTION
> >> >> > The process_reap() system call is used to free the memory of a
> >> >> > dying process.
> >> >> >
> >> >> > The pidfd selects the process referred to by the PID file
> >> >> > descriptor.
> >> >> > (See pidofd_open(2) for further information)
> >> >> >
> >> >> > The flags argument is reserved for future use; currently, this
> >> >> > argument must be specified as 0.
> >> >> >
> >> >> > RETURN VALUE
> >> >> > On success, process_reap() returns 0. On error, -1 is returned
> >> >> > and errno is set to indicate the error.
> >> >>
> >> >> I think the manual page should mention what it means for a process to be
> >> >> “dying”, and how to move a process to this state.
> >> >
> >> > Thanks for the suggestion, Florian! Would replacing "dying process"
> >> > with "process which was sent a SIGKILL signal" be sufficient?
> >>
> >> That explains very clearly the requirement, but it raises the question
> >> why this isn't an si_code flag for rt_sigqueueinfo, reusing the existing
> >> system call.
> >
> > I think you are suggesting to use sigqueue() to deliver the signal and
> > perform the reaping when a special value accompanies it. This would be
> > somewhat similar to my early suggestion to use a flag in
> > pidfd_send_signal() (see:
> > https://lore.kernel.org/patchwork/patch/1060407) to implement memory
> > reaping which has another advantage of operation on PIDFDs instead of
> > PIDs which can be recycled.
> > kill()/pidfd_send_signal()/sigqueue() are supposed to deliver the
> > signal and return without blocking. Changing that behavior was
> > considered unacceptable in these discussions.
>
> Does this mean that you need two threads, one that sends SIGKILL, and
> one that calls process_reap? Given that sending SIGKILL is blocking
> with the existing interfaces?
Sending SIGKILL is blocking in terms of delivering the signal, but it
does not block waiting for SIGKILL to be processed by the signal
recipient and memory to be released. When I was talking about
"blocking", I meant that current kill() and friends do not block to
wait for SIGKILL to be processed.
process_reap() will block until the memory is released. Whether the
userspace caller is using it right after sending a SIGKILL to reclaim
the memory synchronously or spawns a separate thread to reclaim memory
asynchronously is up to the user. Both patterns are supported.
> Please also note that asynchronous deallocation of resources leads to
> bugs and can cause unrelated workloads to fail. For example, in some
> configurations, clone can fail with EAGAIN even in cases where the total
> number of tasks is clearly bounded because the kernel signals task exit
> to applications before all resources are deallocated. I'm worried that
> the new interface makes things quite a bit worse in this regard.
The process_reap() releases memory synchronously, no kthreads are
being used. If asynchronous release is required, the userspace would
need to spawn a userspace thread and issue this syscall from it. I
hope this clears your concerns, which I think are about asynchronous
deallocations within the kernel.
Thanks!
>
> Thanks,
> Florian
>
next prev parent reply other threads:[~2021-07-08 6:39 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-23 19:28 Suren Baghdasaryan
2021-06-23 19:34 ` Suren Baghdasaryan
2021-06-29 13:13 ` Christian Brauner
2021-06-29 16:15 ` Suren Baghdasaryan
2021-06-30 18:00 ` Shakeel Butt
2021-06-30 18:43 ` Suren Baghdasaryan
2021-06-30 19:00 ` Shakeel Butt
2021-06-30 19:06 ` Suren Baghdasaryan
2021-06-30 18:26 ` Andy Lutomirski
2021-06-30 18:51 ` Suren Baghdasaryan
2021-06-30 21:45 ` Johannes Weiner
2021-07-01 0:44 ` Andy Lutomirski
2021-07-01 22:59 ` Suren Baghdasaryan
2021-07-02 15:27 ` Christian Brauner
2021-07-05 7:41 ` David Hildenbrand
2021-07-07 12:38 ` Michal Hocko
2021-07-07 21:14 ` Suren Baghdasaryan
2021-07-09 8:58 ` Christian Brauner
2021-07-09 20:05 ` Suren Baghdasaryan
2021-07-01 0:45 ` Andy Lutomirski
2021-07-01 23:08 ` Suren Baghdasaryan
2021-07-07 9:46 ` Florian Weimer
2021-07-07 21:07 ` Suren Baghdasaryan
2021-07-08 5:40 ` Florian Weimer
2021-07-08 6:05 ` Suren Baghdasaryan
2021-07-08 6:14 ` Florian Weimer
2021-07-08 6:39 ` Suren Baghdasaryan [this message]
2021-07-08 7:13 ` Florian Weimer
[not found] ` <q2s48op3-n660-p8r4-op50-po43r2249r24@vanv.qr>
2021-07-12 18:39 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpEUXz-oHi5Ho8nGAKtFV6ArQDx9yQwrdTzYgHr5+6=YaQ@mail.gmail.com' \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=christian@brauner.io \
--cc=david@redhat.com \
--cc=fweimer@redhat.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jannh@google.com \
--cc=kernel-team@android.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=oleg@redhat.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=timmurray@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox