From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: David Hildenbrand <david@redhat.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Charan Teja Kalla <quic_charante@quicinc.com>,
akpm@linux-foundation.org, shikemeng@huaweicloud.com,
kasong@tencent.com, nphamcs@gmail.com, bhe@redhat.com,
baohua@kernel.org, chrisl@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Matthew Wilcox <willy@infradead.org>
Subject: Re: [PATCH] mm: swap: check for xa_zero_entry() on vma in swapoff path
Date: Mon, 11 Aug 2025 11:48:27 -0400 [thread overview]
Message-ID: <nnuncvxj3p7zszgojgst4z5dv3mn3xkfymty33x3rwzopr4ecv@mev6cvnkr2wy> (raw)
In-Reply-To: <2e8df53b-d953-43fb-9c69-7d7d60e95c9a@redhat.com>
* David Hildenbrand <david@redhat.com> [250811 11:39]:
> > >
> > > I think it may actually be difficult to do on some level or there was some
> > > reason we couldn't, but I may be mistaken.
> >
>
> Thanks for the information!
>
> > Down the rabbit hole we go..
> >
> > The cloning of the tree happens by copying the tree in DFS and
> > replacing the old nodes with new nodes. The tree leaves end up being
> > copied, which contains all the vmas (unless DONT_COPY is set, so
> > basically always all of them..). When the tree is copied, we have a
> > duplicate of the tree with pointers to all the vmas in the old process.
> >
> > The way the tree fails is that we've been unable to finish cloning it,
> > usually for out of memory reasons. So, this means we have a tree with
> > new and exciting vmas that have never been used and old but still active
> > vmas in oldmm.
> >
> > The failure point is then marked with an XA_ZERO_ENTRY, which will
> > succeed in storing as it's a direct replacement in the tree so no
> > allocations necessary. Thus this is safe even in -ENOMEM scenarios.
> >
> > Clearing out the stale data means we may actually need to allocate to
> > remove vmas from the new tree, because we use allocated memory in the
> > maple tree - we'll need to rebalance, new parents, etc, etc.
> >
...
> >
> > I could make a function that frees all new vmas and destroys the tree
> > specifically for this failure state?
>
> I think the problem is that some page tables were already copied, so we
> would have to zap them as well.
>
> Maybe just factoring stuff from the exit_mmap() function could be one way to
> do it.
Yes, this is much easier now that both are in the same .c file.
..
> >
> > This is funny because we already have a (probably) benign race with oom
> > here. This code may already visit the mm after __oom_reap_task_mm() and
> > the mm disappearing, but since the anon vmas should be removed,
> > unuse_mm() will skip them.
> >
> > Although, I'm not sure what happens when
> > mmu_notifier_invalidate_range_start_nonblock() fails AND unuse_mm() is
> > called on the mm after. Maybe checking the unstable mm is necessary
> > here anyways?
>
> Can we have MMU notifiers active while the process never even ran and we are
> only halfway through duplicating VMAs?
>
I doubt it. I was thinking in other cases where the MMF_UNSTABLE flag
was set but the oom code failed to free all anon vmas based on the MMU
notifier. That is, does this code have an existing race that's much
harder to hit?
Thanks,
Liam
next prev parent reply other threads:[~2025-08-11 15:48 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-08 9:21 Charan Teja Kalla
2025-08-08 12:01 ` David Hildenbrand
2025-08-08 12:04 ` David Hildenbrand
2025-08-11 9:43 ` Charan Teja Kalla
2025-08-11 12:14 ` Lorenzo Stoakes
2025-08-11 13:03 ` David Hildenbrand
2025-08-11 13:08 ` Lorenzo Stoakes
2025-08-11 13:19 ` David Hildenbrand
2025-08-11 13:22 ` Lorenzo Stoakes
2025-08-11 15:17 ` Liam R. Howlett
2025-08-11 15:39 ` David Hildenbrand
2025-08-11 15:48 ` Lorenzo Stoakes
2025-08-11 15:51 ` David Hildenbrand
2025-08-11 15:48 ` Liam R. Howlett [this message]
2025-08-11 12:07 ` Lorenzo Stoakes
2025-08-11 16:29 ` Charan Teja Kalla
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=nnuncvxj3p7zszgojgst4z5dv3mn3xkfymty33x3rwzopr4ecv@mev6cvnkr2wy \
--to=liam.howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=kasong@tencent.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=nphamcs@gmail.com \
--cc=quic_charante@quicinc.com \
--cc=shikemeng@huaweicloud.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox