linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Christian Brauner <brauner@kernel.org>,
	linux-mm@kvack.org, linux-arch@vger.kernel.org,
	linux-kernel@vger.kernel.org, SeongJae Park <sj@kernel.org>,
	Usama Arif <usamaarif642@gmail.com>
Subject: Re: [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT process_madvise() flag
Date: Thu, 29 May 2025 15:46:28 +0100	[thread overview]
Message-ID: <e48b0293-5ccd-4205-910c-302b61b12b55@lucifer.local> (raw)
In-Reply-To: <20250520222609.GD773385@cmpxchg.org>

On Tue, May 20, 2025 at 06:26:09PM -0400, Johannes Weiner wrote:
> On Mon, May 19, 2025 at 09:52:41PM +0100, Lorenzo Stoakes wrote:
> > We intentionally limit this only to flags that we know should function
> > correctly without issue, and to be conservative about this, so we initially
> > limit ourselves only to MADV_HUGEPAGE, MADV_NOHUGEPAGE, that is - setting
> > the VM_HUGEPAGE, VM_NOHUGEPAGE VMA flags.
>
> Hm, but do we actually expect more to show up? Looking at the current
> list of advisories, we have the conventional ones:
>
>        MADV_NORMAL
>               No special treatment.  This is the default.
>
>        MADV_RANDOM
>               Expect page references in random order.  (Hence, read ahead may be less useful than normally.)
>
>        MADV_SEQUENTIAL
>               Expect page references in sequential order.  (Hence, pages in the given range can be aggressively read ahead, and may be freed soon after they are accessed.)
>
>        MADV_WILLNEED
>               Expect access in the near future.  (Hence, it might be a good idea to read some pages ahead.)
>
>        MADV_DONTNEED
>               Do not expect access in the near future.  (For the time being, the application is finished with the given range, so the kernel can free resources associated with it.)
>
> where only RANDOM and SEQUENTIAL are actual policies. But since those
> apply to file mappings only, they don't seem to make much sense for a
> process-wide setting.
>
> For Linux-specific advisories we have
>
> 	MADV_REMOVE		- action
> 	MADV_DONTFORK		- policy, but not sure how this could work as a process-wide thing
> 	MADV_HWPOISON		- action
> 	MADV_MERGEABLE		- policy, but we have a prctl() for process-wide settings
> 	MADV_SOFTOFFLINE	- action
> 	MADV_HUGEPAGE		- policy, but we have a prctl() for process-wide disabling
> 	MADV_COLLAPSE		- action
> 	MADV_DONTDUMP		- policy, but there is /proc/<pid>/coredump_filter for process-wide settings
> 	MADV_FREE		- action
> 	MADV_WIPEONFORK		- policy, but similar to DONTFORK, not sure how this would work process-wide
> 	MADV_COLD		- action
> 	MADV_PAGEOUT		- action
> 	MADV_POPULATE_READ	- action
> 	MADV_POPULATE_WRITE	- action
> 	MADV_GUARD_INSTALL	- action
>
> So the vast majority of advisories are either one-off actions, or they
> are policies that naturally only make sense for very specific ranges.
>
> KSM and THP really seem like the notable exception[1], rather than a
> rule. And we already *have* prctl() ops to modify per-process policies
> for these. So I'm reluctant to agree we should drill open and expand
> madvise() to cover them - especially with the goal or the expectation
> that THP per-process policies shouldn't matter that much down the line.
>
> I will admit, I don't hate prctl() as much as you do. It *is* a fairly
> broad interface - setting per-process policies. So it's bound to pick
> odds and ends of various subsystems that don't quite fit elsewhere.
>
> Is it the right choice everywhere? Of course not. And can its
> broadness be abused to avoid real interface design? Absolutely.
>
> I don't think that makes it inherently bad, though. As long as we make
> an honest effort to find the best home for new knobs.
>
> But IMO, in this case it is. The inheritance-along-the-process-tree
> behavior that we want here is an established pattern in prctl().
>
> And *because* that propagation is a security-sensitive pattern
> (although I don't think the THP policy specifically *is* a security
> issue), to me it makes more sense to keep this behavior confined to
> prctl() at least, and not add it to madvise too.
>
> [1] Maybe we should have added sys_andrea() to cover those, as well as
>     the SECCOMP prctl()s ;)

So sorry to have missed you really excellent and well thought-out reply
here Johannes (grr mail etc.).

You make a really good point, and I think this does overall, sadly, suggest
the concept of a 'general' madvise() call while, in a sense, 'neat',
doesn't quite fit.

I do think it aligns well with a dedicated mctl() API call, have sent out
an RFC discussion thread about this so we can determine if this makes
sense.

Thanks, Lorenzo


  reply	other threads:[~2025-05-29 14:46 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-19 20:52 [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Lorenzo Stoakes
2025-05-19 20:52 ` [RFC PATCH 1/5] mm: madvise: refactor madvise_populate() Lorenzo Stoakes
2025-05-20 10:30   ` David Hildenbrand
2025-05-20 10:36     ` Lorenzo Stoakes
2025-05-20 10:42       ` David Hildenbrand
2025-05-22 12:32         ` Mike Rapoport
2025-05-19 20:52 ` [RFC PATCH 2/5] mm/madvise: add PMADV_SKIP_ERRORS process_madvise() flag Lorenzo Stoakes
2025-05-19 20:52 ` [RFC PATCH 3/5] mm/madvise: add PMADV_NO_ERROR_ON_UNMAPPED " Lorenzo Stoakes
2025-05-19 20:52 ` [RFC PATCH 4/5] mm/madvise: add PMADV_SET_FORK_EXEC_DEFAULT " Lorenzo Stoakes
2025-05-20  8:38   ` Pedro Falcato
2025-05-20 10:21     ` Lorenzo Stoakes
2025-05-20 11:41       ` Pedro Falcato
2025-05-20 13:39         ` Lorenzo Stoakes
2025-05-20 16:11     ` Jann Horn
2025-05-20 16:19       ` Lorenzo Stoakes
2025-05-20 16:35         ` David Hildenbrand
2025-05-20 22:26   ` Johannes Weiner
2025-05-29 14:46     ` Lorenzo Stoakes [this message]
2025-05-19 20:52 ` [RFC PATCH 5/5] mm/madvise: add PMADV_ENTIRE_ADDRESS_SPACE " Lorenzo Stoakes
2025-05-19 21:53 ` [RFC PATCH 0/5] add process_madvise() flags to modify behaviour Jann Horn
2025-05-20  5:35   ` Lorenzo Stoakes
2025-05-20 16:04     ` Jann Horn
2025-05-20 16:14       ` Lorenzo Stoakes
2025-05-20 15:28 ` David Hildenbrand
2025-05-20 17:47   ` Lorenzo Stoakes
2025-05-20 18:24     ` Usama Arif
2025-05-20 19:21       ` Lorenzo Stoakes
2025-05-20 19:42         ` Usama Arif
2025-05-20 20:15           ` Lorenzo Stoakes
2025-05-20 18:25     ` Lorenzo Stoakes
2025-05-20 18:39       ` David Hildenbrand
2025-05-20 18:25 ` Shakeel Butt
2025-05-20 18:45   ` Lorenzo Stoakes
2025-05-20 19:49     ` Shakeel Butt
2025-05-20 20:39       ` Lorenzo Stoakes
2025-05-20 22:02         ` Shakeel Butt
2025-05-21  4:21           ` Lorenzo Stoakes
2025-05-21 16:28             ` Shakeel Butt
2025-05-21 16:49               ` Lorenzo Stoakes
2025-05-21 17:39                 ` Shakeel Butt
2025-05-22 13:05                   ` David Hildenbrand
2025-05-22 13:21                     ` Lorenzo Stoakes
2025-05-22 20:53                     ` Shakeel Butt
2025-05-26 12:57                       ` David Hildenbrand
2025-05-21 16:57               ` Usama Arif
2025-05-21 17:39                 ` Lorenzo Stoakes
2025-05-21 18:25                   ` Usama Arif
2025-05-21 18:40                     ` Lorenzo Stoakes
2025-05-21 18:45                       ` Usama Arif
2025-05-21 17:32             ` Johannes Weiner
2025-05-21 18:11               ` Lorenzo Stoakes
2025-05-22 12:45               ` David Hildenbrand
2025-05-22 13:49                 ` Lorenzo Stoakes
2025-05-22 15:32               ` Mike Rapoport
2025-05-22 15:47                 ` Lorenzo Stoakes
2025-05-21  2:16       ` Liam R. Howlett
2025-05-22 12:12 ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e48b0293-5ccd-4205-910c-302b61b12b55@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=arnd@arndb.de \
    --cc=brauner@kernel.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sj@kernel.org \
    --cc=usamaarif642@gmail.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox