linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Zhiguo Jiang <justinjiang@vivo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	opensource.kernel@vivo.com
Subject: Re: [PATCH v2] vma remove the unneeded avc bound with non-CoWed folio
Date: Sat, 24 Aug 2024 19:04:11 +0100	[thread overview]
Message-ID: <aba1cad3-ab23-4b42-8ce8-0ed662919c99@lucifer.local> (raw)
In-Reply-To: <73ad9540-3fb8-4154-9a4f-30a0a2b03d41@lucifer.local>

[-- Attachment #1: Type: text/plain, Size: 1241 bytes --]

On Sat, Aug 24, 2024 at 05:26:46PM GMT, Lorenzo Stoakes wrote:
> On Fri, Aug 23, 2024 at 11:02:06PM GMT, Zhiguo Jiang wrote:
> > After CoWed by do_wp_page, the vma established a new mapping relationship
> > with the CoWed folio instead of the non-CoWed folio. However, regarding
> > the situation where vma->anon_vma and the non-CoWed folio's anon_vma are
> > not same, the avc binding relationship between them will no longer be
> > needed, so it is issue for the avc binding relationship still existing
> > between them.
> >
> > This patch will remove the avc binding relationship between vma and the
> > non-CoWed folio's anon_vma, which each has their own independent
> > anon_vma. It can also alleviates rmap overhead simultaneously.
> >
> > Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>
>
> NACK (until fixed). This is broken (see below).
>

[snip]

I enclose a patch that fixes the issue, but leaves a LOT still
broken/resolved/todo including locking of the reparented anon_vma (that'll
really need re-rooting too).

I still seriously doubt the value of this patch given the complexity risks,
but since I got bored and looked into this it's useful to examine something
that works, and which might be helpful to you in testing.

[-- Attachment #2: 0001-mm-fixup-orphan-avc-cleanup-logic.patch --]
[-- Type: text/plain, Size: 6109 bytes --]

From 973ce5f0aea78196088cd527905cc0fad40edb29 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Sat, 24 Aug 2024 18:55:31 +0100
Subject: [RFC PATCH] mm: fixup orphan avc cleanup logic

Existing logic failed to reparent the anon_vma whose avc was removed which
resulted in assertion failures.

This patch corrects this, fixes up some comments, and does some other
cleanups.

We also do not do anything relating to anon_vma->parent manipulation if no
orphaned AVC is found.

I still feel this logic is highly dubious, but this does fix the issue with
anon_vma->num_children accounting.

This doesn't correctly handle locking of the reparented anon_vma.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/rmap.h |   2 +-
 mm/memory.c          |   2 +-
 mm/rmap.c            | 101 ++++++++++++++++++++++++++++---------------
 3 files changed, 68 insertions(+), 37 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8607d28a3146..f1a835f54064 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,7 +257,7 @@ void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
 	folio_remove_rmap_ptes(folio, page, 1, vma)
 void folio_remove_rmap_pmd(struct folio *, struct page *,
 		struct vm_area_struct *);
-void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
+void cleanup_orphan_avc(struct folio *, struct vm_area_struct *);
 
 void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
 		unsigned long address, rmap_t flags);
diff --git a/mm/memory.c b/mm/memory.c
index 4c89cb1cb73e..989b078dd860 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3435,7 +3435,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
 			 * between vma and the old_folio's anon_vma is removed,
 			 * avoiding rmap redundant overhead.
 			 */
-			folio_remove_anon_avc(old_folio, vma);
+			cleanup_orphan_avc(old_folio, vma);
 		}
 
 		/* Free the old page.. */
diff --git a/mm/rmap.c b/mm/rmap.c
index 56fc16fcf2a9..3ac264962917 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1523,56 +1523,87 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
 }
 
 /**
- * folio_remove_anon_avc - remove the avc binding relationship between
- * folio and vma with different anon_vmas.
- * @folio:	The folio with anon_vma to remove the binded avc from
- * @vma:	The vm area to remove the binded avc with folio's anon_vma
+ * cleanup_orphan_avc - remove the avc binding relationship between a parent
+ * folio and child vma with different anon_vmas which, due to an operation such
+ * as CoW'ing a folio, is no longer meaningful.
  *
- * The caller is currently used for CoWed scene.
+ * (insert ASCII diagrams and explanation here...)
+ *
+ * @old_folio:  The folio which contains the parent anon_vma which has an unneeded
+ *              avc binding.
+ * @new_vma:	The VMA which is unnecessarily bound to folio.
  */
-void folio_remove_anon_avc(struct folio *folio,
-		struct vm_area_struct *vma)
+void cleanup_orphan_avc(struct folio *old_folio, struct vm_area_struct *new_vma)
 {
-	struct anon_vma *anon_vma = folio_anon_vma(folio);
+	struct anon_vma *parent_anon_vma = folio_anon_vma(old_folio);
+	struct anon_vma *child_anon_vma = new_vma->anon_vma;
 	pgoff_t pgoff_start, pgoff_end;
 	struct anon_vma_chain *avc;
+	bool removed = false;
 
 	/*
-	 * Ensure that the vma's anon_vma and the folio's
-	 * anon_vma exist and are not same.
+	 * If this folio were not anonymous, folio_anon_vma() would have
+	 * returned NULL. Equally, if the parent and child anon_vma objects are
+	 * the same, then we have nothing to do here.
 	 */
-	if (!folio_test_anon(folio) || unlikely(!anon_vma) ||
-	    anon_vma == vma->anon_vma)
+	if (!parent_anon_vma || parent_anon_vma == child_anon_vma)
 		return;
 
-	pgoff_start = folio_pgoff(folio);
-	pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
+	pgoff_start = folio_pgoff(old_folio);
+	pgoff_end = pgoff_start + folio_nr_pages(old_folio) - 1;
 
-	if (!anon_vma_trylock_write(anon_vma))
+	/* This is an optimistic attempt. */
+	if (!anon_vma_trylock_write(parent_anon_vma))
 		return;
 
-	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
-			pgoff_start, pgoff_end) {
-		/*
-		 * Find the avc associated with vma from the folio's
-		 * anon_vma and remove it.
-		 */
-		if (avc->vma == vma) {
-			anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
-			/*
-			 * When removing the avc with anon_vma that is
-			 * different from the parent anon_vma from parent
-			 * anon_vma->rb_root, the parent num_children
-			 * count value is needed to reduce one.
-			 */
-			anon_vma->num_children--;
+	/*
+	 * Iterate through all AVC's tied to the old folio, looking for the
+	 * redundant one pointing at the new VMA.
+	 */
+	anon_vma_interval_tree_foreach(avc, &parent_anon_vma->rb_root,
+				       pgoff_start, pgoff_end) {
+		if (avc->vma != new_vma)
+			continue;
 
-			list_del(&avc->same_vma);
-			anon_vma_chain_free(avc);
-			break;
-		}
+		/* Remove the unneeded avc. */
+		anon_vma_interval_tree_remove(avc, &parent_anon_vma->rb_root);
+		list_del(&avc->same_vma);
+		anon_vma_chain_free(avc);
+
+		removed = true;
+		break;
 	}
-	anon_vma_unlock_write(anon_vma);
+
+	if (!removed)
+		goto unlock;
+
+	/*
+	 * Removing an avc implies that the associated avc MAY no longer need
+	 * to point to its parent, and we need to reparent it.
+	 */
+
+	/*
+	 * If somehow we aren't already the child of the parent anon_vma, we
+	 * have nothing to do here.
+	 */
+	if (child_anon_vma->parent != parent_anon_vma)
+		goto unlock;
+
+	/* OK, we abandon our parent, and reparent to ourselves. */
+
+	parent_anon_vma->num_children--;
+
+	child_anon_vma->parent = child_anon_vma;
+	child_anon_vma->num_children++;
+
+	/*
+	 * Here we should probably reset the anon_vma->root, as per
+	 * anon_vma_ctor() but this feels icky and horrible. Bit weird to share
+	 * a lock with the old parent's root.
+	 */
+
+unlock:
+	anon_vma_unlock_write(parent_anon_vma);
 }
 
 static __always_inline void __folio_remove_rmap(struct folio *folio,
-- 
2.46.0


  reply	other threads:[~2024-08-24 18:04 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-23 15:02 Zhiguo Jiang
2024-08-24  5:35 ` Andrew Morton
2024-08-25  4:10   ` zhiguojiang
2024-08-25  4:17     ` zhiguojiang
2024-08-24 16:26 ` Lorenzo Stoakes
2024-08-24 18:04   ` Lorenzo Stoakes [this message]
2024-08-25  5:06   ` zhiguojiang
2024-08-25  6:39     ` Lorenzo Stoakes
2024-08-25 18:13       ` Mika Penttilä
2024-08-26  2:56         ` zhiguojiang
2024-08-26  4:30           ` Mika Penttilä
2024-08-25  6:42     ` Lorenzo Stoakes
2024-08-25  7:08       ` zhiguojiang
2024-08-26 17:03       ` David Hildenbrand
  -- strict thread matches above, loose matches on Subject: below --
2024-08-23 14:01 Zhiguo Jiang
2024-08-26 17:24 ` David Hildenbrand
2024-08-27  1:50   ` zhiguojiang
2024-08-27 17:35     ` David Hildenbrand
2024-08-28  1:14       ` zhiguojiang
2024-08-28  3:51         ` Mika Penttilä

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aba1cad3-ab23-4b42-8ce8-0ed662919c99@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=justinjiang@vivo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=opensource.kernel@vivo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox