From: Jann Horn <jannh@google.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-mm@kvack.org, Pedro Falcato <pfalcato@suse.de>,
Peter Xu <peterx@redhat.com>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot
Date: Fri, 6 Jun 2025 14:55:25 +0200 [thread overview]
Message-ID: <CAG48ez1R7v-L-L33nJUXtj_Y=SKyyFcU8amLs0dQ6ecuC3xMWA@mail.gmail.com> (raw)
In-Reply-To: <ba208d76-7992-4c70-be8f-49082001f194@suse.cz>
On Thu, Jun 5, 2025 at 9:33 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> On 6/3/25 20:21, Jann Horn wrote:
> > When fork() encounters possibly-pinned pages, those pages are immediately
> > copied instead of just marking PTEs to make CoW happen later. If the parent
> > is multithreaded, this can cause the child to see memory contents that are
> > inconsistent in multiple ways:
> >
> > 1. We are copying the contents of a page with a memcpy() while userspace
> > may be writing to it. This can cause the resulting data in the child to
> > be inconsistent.
> > 2. After we've copied this page, future writes to other pages may
> > continue to be visible to the child while future writes to this page are
> > no longer visible to the child.
> >
> > This means the child could theoretically see incoherent states where
> > allocator freelists point to objects that are actually in use or stuff like
> > that. A mitigating factor is that, unless userspace already has a deadlock
> > bug, userspace can pretty much only observe such issues when fancy lockless
> > data structures are used (because if another thread was in the middle of
> > mutating data during fork() and the post-fork child tried to take the mutex
> > protecting that data, it might wait forever).
> >
> > On top of that, this issue is only observable when pages are either
> > DMA-pinned or appear false-positive-DMA-pinned due to a page having >=1024
> > references and the parent process having used DMA-pinning at least once
> > before.
>
> Seems the changelog seems to be missing the part describing what it's doing
> to fix the issue? Some details are not immediately obvious (the writing
> threads become blocked in page fault) as the conversation has shown.
I tried to document this in patch 2/2, where I wrote this (though I
guess I should maybe make it more verbose and not just say "subsequent
writes are delayed until mmap_write_unlock()"):
+ * - Before mmap_write_unlock(), a TLB flush ensures that parent threads can't
+ * write to copy-on-write pages anymore.
+ * - Before dup_mmap() copies page contents (which happens rarely), the
+ * parent's PTE for the page is made read-only and a TLB flush is issued, so
+ * subsequent writes are delayed until mmap_write_unlock().
But I guess this way makes it hard to review patch 1/2 individually.
Should I just squash the two patches together, and then write in the
commit message "see the comment blocks I'm adding for the fix
approach"? Or is there value in repeating the explanation in the
commit message?
> > Fixes: 70e806e4e645 ("mm: Do early cow for pinned pages during fork() for ptes")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Jann Horn <jannh@google.com>
>
> Given how the fix seems to be localized to the already rare slowpath and
> doesn't require us to pessimize every trivial fork(), it seems reasonable to
> me even if don't have a concrete example of a sane code in the wild that's
> broken by the current behavior, so:
>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
Thanks!
next prev parent reply other threads:[~2025-06-06 12:56 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-03 18:21 [PATCH 0/2] mm/memory: fix memory tearing on threaded fork Jann Horn
2025-06-03 18:21 ` [PATCH 1/2] mm/memory: ensure fork child sees coherent memory snapshot Jann Horn
2025-06-03 18:29 ` Matthew Wilcox
2025-06-03 18:37 ` David Hildenbrand
2025-06-03 19:09 ` Jann Horn
2025-06-03 20:17 ` David Hildenbrand
2025-06-03 19:03 ` Jann Horn
2025-06-04 12:22 ` David Hildenbrand
2025-06-03 18:33 ` David Hildenbrand
2025-06-03 20:32 ` Pedro Falcato
2025-06-04 15:41 ` Jann Horn
2025-06-04 16:16 ` Pedro Falcato
2025-06-05 7:33 ` Vlastimil Babka
2025-06-05 12:30 ` Pedro Falcato
2025-06-06 12:55 ` Jann Horn [this message]
2025-06-06 15:34 ` Vlastimil Babka
2025-06-06 12:49 ` Jann Horn
2025-06-06 15:49 ` Vlastimil Babka
2025-06-03 18:21 ` [PATCH 2/2] mm/memory: Document how we make a " Jann Horn
2025-06-04 17:03 ` Peter Xu
2025-06-04 18:11 ` Jann Horn
2025-06-04 20:10 ` Peter Xu
2025-06-04 20:28 ` David Hildenbrand
2025-06-06 14:11 ` Jann Horn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez1R7v-L-L33nJUXtj_Y=SKyyFcU8amLs0dQ6ecuC3xMWA@mail.gmail.com' \
--to=jannh@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=rppt@kernel.org \
--cc=stable@vger.kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox