From: Nadav Amit <nadav.amit@gmail.com>
To: Michal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Xu <peterx@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Minchan Kim <minchan@kernel.org>, Colin Cross <ccross@google.com>,
Suren Baghdasarya <surenb@google.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED)
Date: Wed, 29 Sep 2021 11:31:25 -0700 [thread overview]
Message-ID: <E8456D5C-4FCD-46E4-B6F8-771076243D7E@gmail.com> (raw)
In-Reply-To: <YVQbMREcRaCbUaUv@dhcp22.suse.cz>
> On Sep 29, 2021, at 12:52 AM, Michal Hocko <mhocko@suse.com> wrote:
>
> On Mon 27-09-21 12:12:46, Nadav Amit wrote:
>>
>>> On Sep 27, 2021, at 5:16 AM, Michal Hocko <mhocko@suse.com> wrote:
>>>
>>> On Mon 27-09-21 05:00:11, Nadav Amit wrote:
>>> [...]
>>>> The manager is notified on memory regions that it should monitor
>>>> (through PTRACE/LD_PRELOAD/explicit-API). It then monitors these regions
>>>> using the remote-userfaultfd that you saw on the second thread. When it wants
>>>> to reclaim (anonymous) memory, it:
>>>>
>>>> 1. Uses UFFD-WP to protect that memory (and for this matter I got a vectored
>>>> UFFD-WP to do so efficiently, a patch which I did not send yet).
>>>> 2. Calls process_vm_readv() to read that memory of that process.
>>>> 3. Write it back to “swap”.
>>>> 4. Calls process_madvise(MADV_DONTNEED) to zap it.
>>>
>>> Why cannot you use MADV_PAGEOUT/MADV_COLD for this usecase?
>>
>> Providing hints to the kernel takes you so far to a certain extent.
>> The kernel does not want to (for a good reason) to be completely
>> configurable when it comes to reclaim and prefetch policies. Doing
>> so from userspace allows you to be fully configurable.
>
> I am sorry but I do not follow. Your scenario is describing a user
> space driven reclaim. Something that MADV_{COLD,PAGEOUT} have been
> designed for. What are you missing in the existing functionality?
Using MADV_COLD/MADV_PAGEOUT does not allow userspace to control
many aspects of paging out memory:
1. Writeback: writeback ahead of time, dynamic clustering, etc.
2. Batching (regardless, MADV_PAGEOUT does pretty bad batching job
on non-contiguous memory).
3. No guarantee the page is actually reclaimed (e.g., writeback)
and the time it takes place.
4. I/O stack for swapping - you must use kernel I/O stack (FUSE
as non-performant as it is cannot be used for swap AFAIK).
5. Other operations (e.g., locking, working set tracking) that
might not be necessary or interfere.
In addition, the use of MADV_COLD/MADV_PAGEOUT prevents the use
of userfaultfd to trap page-faults and react accordingly, so you
are also prevented from:
6. Having your own custom prefetching policy in response to #PF.
There are additional use-cases I can try to formalize in which
MADV_COLD/MADV_PAGEOUT is insufficient. But the main difference
is pretty clear, I think: one is a hint that only applied to
page reclamation. The other enables the direct control of
userspace over (almost) all aspects of paging.
As I suggested before, if it is preferred, this can be a UFFD
IOCTL instead of process_madvise() behavior, thereby lowering
the risk of a misuse.
I would emphasize that this feature (i.e.,
process_madvise(MADV_DONTNEED) or a similar new UFFD feature)
has little to no effect on the kernel robustness, complexity,
security or API changes. So the impact on the kernel is
negligible.
next prev parent reply other threads:[~2021-09-29 18:31 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-26 16:12 Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes Nadav Amit
2021-09-27 9:08 ` Kirill A. Shutemov
2021-09-27 10:11 ` Nadav Amit
2021-09-27 11:55 ` Kirill A. Shutemov
2021-09-27 12:33 ` Nadav Amit
2021-09-27 12:45 ` Kirill A. Shutemov
2021-09-27 12:59 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 2/8] mm/madvise: remove unnecessary check on madvise_dontneed_free() Nadav Amit
2021-09-27 9:11 ` Kirill A. Shutemov
2021-09-27 11:05 ` Nadav Amit
2021-09-27 12:19 ` Kirill A. Shutemov
2021-09-27 12:52 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 3/8] mm/madvise: remove unnecessary checks on madvise_free_single_vma() Nadav Amit
2021-09-27 9:17 ` Kirill A. Shutemov
2021-09-27 9:24 ` Kirill A. Shutemov
2021-09-26 16:12 ` [RFC PATCH 4/8] mm/madvise: define madvise behavior in a struct Nadav Amit
2021-09-27 9:31 ` Kirill A. Shutemov
2021-09-27 10:31 ` Nadav Amit
2021-09-27 12:14 ` Kirill A. Shutemov
2021-09-27 20:36 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 5/8] mm/madvise: perform certain operations once on process_madvise() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 6/8] mm/madvise: more aggressive TLB batching Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 7/8] mm/madvise: deduplicate code in madvise_dontneed_free() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 8/8] mm/madvise: process_madvise(MADV_DONTNEED) Nadav Amit
2021-09-27 9:24 ` [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) David Hildenbrand
2021-09-27 10:41 ` Nadav Amit
2021-09-27 10:58 ` David Hildenbrand
2021-09-27 12:00 ` Nadav Amit
2021-09-27 12:16 ` Michal Hocko
2021-09-27 19:12 ` Nadav Amit
2021-09-29 7:52 ` Michal Hocko
2021-09-29 18:31 ` Nadav Amit [this message]
2021-10-12 23:14 ` Peter Xu
2021-10-13 15:47 ` Nadav Amit
2021-10-13 23:09 ` Peter Xu
2021-09-27 17:05 ` David Hildenbrand
2021-09-27 19:59 ` Nadav Amit
2021-09-28 8:53 ` David Hildenbrand
2021-09-28 22:56 ` Nadav Amit
2021-10-04 17:58 ` David Hildenbrand
2021-10-07 16:19 ` Nadav Amit
2021-10-07 16:46 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=E8456D5C-4FCD-46E4-B6F8-771076243D7E@gmail.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ccross@google.com \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox