linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Peter Xu <peterx@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	 Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	 Michal Hocko <mhocko@suse.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm/memory: Document how we make a coherent memory snapshot
Date: Wed, 4 Jun 2025 20:11:08 +0200	[thread overview]
Message-ID: <CAG48ez3cgG-PikyO7a84CFdPFvPY9BSNJOZ7wZVQ7Q9Qju_6Ng@mail.gmail.com> (raw)
In-Reply-To: <aEB8fFEXKPR54LdA@x1.local>

On Wed, Jun 4, 2025 at 7:04 PM Peter Xu <peterx@redhat.com> wrote:
> On Tue, Jun 03, 2025 at 08:21:03PM +0200, Jann Horn wrote:
> > It is not currently documented that the child of fork() should receive a
> > coherent snapshot of the parent's memory, or how we get such a snapshot.
> > Add a comment block to explain this.
> >
> > Signed-off-by: Jann Horn <jannh@google.com>
> > ---
> >  kernel/fork.c | 34 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 34 insertions(+)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 85afccfdf3b1..f78f5df596a9 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -604,6 +604,40 @@ static void dup_mm_exe_file(struct mm_struct *mm, struct mm_struct *oldmm)
> >  }
> >
> >  #ifdef CONFIG_MMU
> > +/*
> > + * Anonymous memory inherited by the child MM must, on success, contain a
> > + * coherent snapshot of corresponding anonymous memory in the parent MM.
>
> Should we better define what is a coherent snapshot?  Or maybe avoid using
> this term which seems to apply to the whole mm?
>
> I think it's at least not a snapshot of whole mm at a specific time,
> because as long as there can be more than one concurrent writers (hence, it
> needs to be at least 3 threads in the parent process, 1 in charge of fork),
> this can happen:
>
>   parent writer 1      parent writer 2    parent fork thr
>   ---------------      ---------------    ---------------
>                                           wr-protect P1
>   write P1                                                  <---- T1
>   (trapped, didn't happen)
>                        write PN                             <---- T2
>                        (went through)
>                                           ...
>                                           wr-protect PN
>
> The result of above would be that child process will see a mixture of old
> P1 (at timestamp T1) but updated P2 (timestamp T2).  I don't think it's
> impossible that the userapp could try to serialize "write P1" and "write
> PN" operations in a way that it would also get a surprise seeing in the
> child PN updated but P1 didn't.

If the write at T1 hits a page fault, then it doesn't actually happen
at T1. The write instruction starts doing something at T1, but it does
not fully retire, and the architectural register state does not
change, and in particular the instruction pointer does not advance
past this instruction; just like when speculative execution is aborted
after a branch misprediction, except that the CPU raises an exception
and we enter the page fault handler. The write actually happens when
the instruction is executed a second time after page fault handling
has completed after the mmap lock is dropped. (Unless something during
page fault handling raises a signal, in which case the instruction
might never architecturally execute.)

(There is a caveat to what I just said, which makes this more complex
but does not fundamentally change the outcome: An instruction that
performs multiple memory writes without specific atomicity guarantees
can successfully do some writes and then fail on a later write. In
this case, after the page fault handler resolves the fault, the entire
instruction will run from the start again, including re-doing the
writes that were already done on the first execution, and this works
because such instructions are designed to be idempotent in this regard
and they don't make atomicity guarantees.)

> I do agree it at least recovered the per-page coherence, though, no matter
> what is the POSIX definition of that.  IIUC an userapp can always fix such
> problem, but maybe it's too complicated in some cases, and if Linux used to
> at least maintain per-page coherency, then it may make sense to recover the
> behavior especially when it only affects pinned.
>
> Said that, maybe we still want to be specific on the goal of the change.


  reply	other threads:[~2025-06-04 18:11 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-03 18:21 [PATCH 0/2] mm/memory: fix memory tearing on threaded fork Jann Horn
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
2025-06-03 18:29   ` Matthew Wilcox
2025-06-03 18:37     ` David Hildenbrand
2025-06-03 19:09       ` Jann Horn
2025-06-03 20:17         ` David Hildenbrand
2025-06-03 19:03     ` Jann Horn
2025-06-04 12:22       ` David Hildenbrand
2025-06-03 18:33   ` David Hildenbrand
2025-06-03 20:32   ` Pedro Falcato
2025-06-04 15:41     ` Jann Horn
2025-06-04 16:16       ` Pedro Falcato
2025-06-05  7:33   ` Vlastimil Babka
2025-06-05 12:30     ` Pedro Falcato
2025-06-06 12:55     ` Jann Horn
2025-06-06 15:34       ` Vlastimil Babka
2025-06-06 12:49   ` Jann Horn
2025-06-06 15:49     ` Vlastimil Babka
2025-06-03 18:21 ` [PATCH 2/2] mm/memory: Document how we make a " Jann Horn
2025-06-04 17:03   ` Peter Xu
2025-06-04 18:11     ` Jann Horn [this message]
2025-06-04 20:10       ` Peter Xu
2025-06-04 20:28         ` David Hildenbrand
2025-06-06 14:11         ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAG48ez3cgG-PikyO7a84CFdPFvPY9BSNJOZ7wZVQ7Q9Qju_6Ng@mail.gmail.com \
    --to=jannh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox