linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: maple-tree@lists.infradead.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	David Hildenbrand <david@redhat.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Jann Horn <jannh@google.com>,
	Pedro Falcato <pfalcato@suse.de>,
	Charan Teja Kalla <quic_charante@quicinc.com>,
	shikemeng@huaweicloud.com, kasong@tencent.com, nphamcs@gmail.com,
	bhe@redhat.com, baohua@kernel.org, chrisl@kernel.org,
	Matthew Wilcox <willy@infradead.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: [PATCH v1 0/9] Remove XA_ZERO from error recovery of dup_mmap()
Date: Tue,  9 Sep 2025 15:09:36 -0400	[thread overview]
Message-ID: <20250909190945.1030905-1-Liam.Howlett@oracle.com> (raw)

It is possible that the dup_mmap() call fails on allocating or setting
up a vma after the maple tree of the oldmm is copied.  Today, that
failure point is marked by inserting an XA_ZERO entry over the failure
point so that the exact location does not need to be communicated
through to exit_mmap().

However, a race exists in the tear down process because the dup_mmap()
drops the mmap lock before exit_mmap() can remove the partially set up
vma tree.  This means that other tasks may get to the mm tree and find
the invalid vma pointer (since it's an XA_ZERO entry), even though the
mm is marked as MMF_OOM_SKIP and MMF_UNSTABLE.

To remove the race fully, the tree must be cleaned up before dropping
the lock.  This is accomplished by extracting the vma cleanup in
exit_mmap() and changing the required functions to pass through the vma
search limit.  Any other tree modifications would require extra cycles
which should be spent on freeing memory.

This does run the risk of increasing the possibility of finding no vmas
(which is already possible!) in code this isn't careful.

The final three patches are to address the excessive argument lists
being passed between the functions.  Using the struct unmap_desc also
allows some special-case code to be removed in favour of the struct
setup differences.

RFC: https://lore.kernel.org/linux-mm/20250815191031.3769540-1-Liam.Howlett@oracle.com/
Changes since RFC:
- Change BUG_ON to WARN_ON_ONCE in tear_down_vmas() - Thanks Lorenzo
- Added sanity WARN_ON_ONCE() to free_pgtables() for ceiling and
  tree_max - Thanks Lorenzo
- Added comment to free_pgtables() in a vain attempt to inform users
  what's going on - Thanks Lorenzo
- Fixed free_pgtables() in testing code
- Added struct unmap_desc to describe the unmap event to reduce the
  argument list  - Thanks Lorenzo
- Rebased against mm-new

Liam R. Howlett (9):
  mm/mmap: Move exit_mmap() trace point
  mm/mmap: Abstract vma clean up from exit_mmap()
  mm/vma: Add limits to unmap_region() for vmas
  mm/memory: Add tree limit to free_pgtables()
  mm/vma: Add page table limit to unmap_region()
  mm: Change dup_mmap() recovery
  mm: Introduce unmap_desc struct to reduce function arguments
  mm/vma: Use unmap_desc in vms_clear_ptes() and exit_mmap()
  mm: Use unmap_desc struct for freeing page tables.

 include/linux/mm.h               |  3 -
 mm/internal.h                    |  8 ++-
 mm/memory.c                      | 69 +++++++++++++----------
 mm/mmap.c                        | 96 ++++++++++++++++++++++----------
 mm/vma.c                         | 55 +++++++++---------
 mm/vma.h                         | 48 +++++++++++++++-
 tools/testing/vma/vma_internal.h | 24 ++------
 7 files changed, 193 insertions(+), 110 deletions(-)

-- 
2.47.2



             reply	other threads:[~2025-09-09 19:10 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-09 19:09 Liam R. Howlett [this message]
2025-09-09 19:09 ` [PATCH v1 1/9] mm/mmap: Move exit_mmap() trace point Liam R. Howlett
2025-09-09 20:07   ` Suren Baghdasaryan
2025-09-10 12:51   ` Pedro Falcato
2025-09-11  9:19   ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 2/9] mm/mmap: Abstract vma clean up from exit_mmap() Liam R. Howlett
2025-09-09 20:09   ` Suren Baghdasaryan
2025-09-10 12:54   ` Pedro Falcato
2025-09-11  9:21   ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 3/9] mm/vma: Add limits to unmap_region() for vmas Liam R. Howlett
2025-09-09 20:09   ` Suren Baghdasaryan
2025-09-10 12:56   ` Pedro Falcato
2025-09-11  8:56   ` Lorenzo Stoakes
2025-09-11  9:22   ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 4/9] mm/memory: Add tree limit to free_pgtables() Liam R. Howlett
2025-09-09 21:05   ` Suren Baghdasaryan
2025-09-10 13:08   ` Pedro Falcato
2025-09-11  9:08   ` Lorenzo Stoakes
2025-09-11  9:24   ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 5/9] mm/vma: Add page table limit to unmap_region() Liam R. Howlett
2025-09-09 21:29   ` Suren Baghdasaryan
2025-09-10 13:08   ` Pedro Falcato
2025-09-09 19:09 ` [PATCH v1 6/9] mm: Change dup_mmap() recovery Liam R. Howlett
2025-09-09 22:03   ` Suren Baghdasaryan
2025-09-10 13:23   ` Pedro Falcato
2025-09-11  9:13   ` Lorenzo Stoakes
2025-09-11  9:31   ` David Hildenbrand
2025-09-09 19:09 ` [PATCH v1 7/9] mm: Introduce unmap_desc struct to reduce function arguments Liam R. Howlett
2025-09-09 21:44   ` Suren Baghdasaryan
2025-09-11  9:22     ` Lorenzo Stoakes
2025-09-11 16:51       ` Suren Baghdasaryan
2025-09-11 16:56         ` Liam R. Howlett
2025-09-11 17:03           ` Suren Baghdasaryan
2025-09-10 13:10   ` Pedro Falcato
2025-09-11  9:20   ` Lorenzo Stoakes
2025-09-09 19:09 ` [PATCH v1 8/9] mm/vma: Use unmap_desc in vms_clear_ptes() and exit_mmap() Liam R. Howlett
2025-09-09 22:16   ` Suren Baghdasaryan
2025-09-11 16:59     ` Liam R. Howlett
2025-09-11  9:53   ` Lorenzo Stoakes
2025-09-12  5:08   ` kernel test robot
2025-09-09 19:09 ` [PATCH v1 9/9] mm: Use unmap_desc struct for freeing page tables Liam R. Howlett
2025-09-09 22:27   ` Suren Baghdasaryan
2025-09-11 10:06   ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250909190945.1030905-1-Liam.Howlett@oracle.com \
    --to=liam.howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=baohua@kernel.org \
    --cc=bhe@redhat.com \
    --cc=chrisl@kernel.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=kasong@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=maple-tree@lists.infradead.org \
    --cc=mhocko@suse.com \
    --cc=nphamcs@gmail.com \
    --cc=pfalcato@suse.de \
    --cc=quic_charante@quicinc.com \
    --cc=shikemeng@huaweicloud.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox