From: "Lorenzo Stoakes (Oracle)" <ljs@kernel.org>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Steven Rostedt <rostedt@goodmis.org>,
Jason Gunthorpe <jgg@ziepe.ca>,
Leon Romanovsky <leon@kernel.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
Huiwen He <hehuiwen@kylinos.cn>,
Jerome Marchand <jmarchan@redhat.com>,
Qing Wang <wangqing7171@gmail.com>,
Shengming Hu <hu.shengming@zte.com.cn>,
Linux-MM <linux-mm@kvack.org>,
linux-rdma <linux-rdma@vger.kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [GIT PULL] tracing: Fixes for 7.0
Date: Fri, 6 Mar 2026 10:33:12 +0000 [thread overview]
Message-ID: <8f60e23e-dd52-45df-89e6-bcc256ad704d@lucifer.local> (raw)
In-Reply-To: <a21ea7c3-fbdb-4ac7-8be5-0173f54890c7@kernel.org>
On Thu, Mar 05, 2026 at 07:59:14PM +0100, David Hildenbrand (Arm) wrote:
> On 3/5/26 18:17, Linus Torvalds wrote:
> > On Thu, 5 Mar 2026 at 09:00, David Hildenbrand (Arm) <david@kernel.org> wrote:
> >>
> >> QEMU traditionally sets MADV_DONTFORK on guest RAM. One reason is to
> >> speed up fork(), because it doesn't need all the guest RAM in fork'ed
> >> child processes.
> >
> > Yes, I think the MADV_DONTFORK thing makes sense on its own - more so
> > than MADV_DOFORK does.
> >
> > Because it's a very valid thing for user space to do exactly for that
> > "speed up fork()" case.
> >
> > It's similar to how we also export a MADV_WIPEONFORK - for a different
> > use-case, where we don't want the copying behavior (typically because
> > we want the child to re-create its own set of data: I thin the main
> > reason tends to be for things like reseeding random number generation
> > after fork etc).
> >
> > So it's just MADV_DOFORK I don't particularly like, because it had
> > pre-existing kernel semantics (the VM_DONTCOPY bit predates the MADV_*
> > bits by many many years).
> >
> > Not copying on fork is always safe. But copying something that the
> > kernel has said "don't copy" just sounds *wrong*.
> >
> >>> But I get the feeling that maybe we should at least limit MADV_DOFORK
> >>> only to the case where the *source* of the DONTFORK was the user, not
> >>> some kernel mapping.
> >>
> >> ... that makes sense. Forbid toggling it on something that has
> >> VM_SPECIAL set, maybe.
Yes, I agree. It's odd that we explicitly gate on VM_IO there, it's unusual.
>
> CCing Lorenzo.
>
> >
> > Yeah, I think VM_SPECIAL would be a better match than just checking
> > VM_IO. At least it would also catch things like that VM_DONTEXPAND,
> > and PFN mappings.
Some of the madvise() operations explicitly work with PFN mappings, but it
really makes no sense to fiddle with them in this case.
> >
> > So just changing the existing VM_IO test to cover all the VM_SPECIAL
> > bits would be a simple improvement.
>
> Ack.
Feel free to add:
Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
To a patch that simply does something like:
diff --git a/mm/madvise.c b/mm/madvise.c
index c0370d9b4e23..dbb69400786d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1389,7 +1389,7 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)
new_flags |= VM_DONTCOPY;
break;
case MADV_DOFORK:
- if (new_flags & VM_IO)
+ if (new_flags & VM_SPECIAL)
return -EINVAL;
new_flags &= ~VM_DONTCOPY;
break;
--
2.53.0
That makes me wonder about whether we want to permit VM_DONTFORK for
MADV_DONTFORK, it's kinda a weird usecase but anyway this is the safer
change for now as I think it's pretty obviously sane.
>
> >
> > Maybe I should just do that and see if anybody even notices (and
> > revert and re-think if somebody does)
>
> Agreed. We could think about letting it sit a bit in -next before moving
> it to mainline.
I would eat my hat, board a flying pig and note the sound of several trees
falling when there's nobody around if anybody complained :)
>
> --
> Cheers,
>
> David
Cheers, Lorenzo
next prev parent reply other threads:[~2026-03-06 10:33 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20260305103941.11f1b27d@gandalf.local.home>
2026-03-05 16:44 ` Linus Torvalds
2026-03-05 16:52 ` Steven Rostedt
2026-03-05 17:00 ` David Hildenbrand (Arm)
2026-03-05 17:17 ` Linus Torvalds
2026-03-05 18:59 ` David Hildenbrand (Arm)
2026-03-06 10:33 ` Lorenzo Stoakes (Oracle) [this message]
2026-03-06 16:50 ` Linus Torvalds
2026-03-06 16:58 ` David Hildenbrand (Arm)
2026-03-05 19:07 ` Jason Gunthorpe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8f60e23e-dd52-45df-89e6-bcc256ad704d@lucifer.local \
--to=ljs@kernel.org \
--cc=david@kernel.org \
--cc=hehuiwen@kylinos.cn \
--cc=hu.shengming@zte.com.cn \
--cc=jgg@ziepe.ca \
--cc=jmarchan@redhat.com \
--cc=leon@kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mathieu.desnoyers@efficios.com \
--cc=mhiramat@kernel.org \
--cc=rostedt@goodmis.org \
--cc=torvalds@linux-foundation.org \
--cc=wangqing7171@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox