From: zhiguojiang <justinjiang@vivo.com>
To: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
oe-lkp@lists.linux.dev, oliver.sang@intel.com
Cc: opensource.kernel@vivo.com
Subject: Re: [PATCH v2] vma remove the unneeded avc bound with non-CoWed folio
Date: Wed, 28 Aug 2024 09:14:40 +0800 [thread overview]
Message-ID: <5b56e76b-cf73-4cfd-b4c5-03fdece234fd@vivo.com> (raw)
In-Reply-To: <1e95a6e4-9993-40ae-b563-44b7024da25c@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 6401 bytes --]
在 2024/8/28 1:35, David Hildenbrand 写道:
> On 27.08.24 03:50, zhiguojiang wrote:
>>
>>
>> 在 2024/8/27 1:24, David Hildenbrand 写道:
>>> On 23.08.24 16:01, Zhiguo Jiang wrote:
>>>> After CoWed by do_wp_page, the vma established a new mapping
>>>> relationship
>>>> with the CoWed folio instead of the non-CoWed folio. However,
>>>> regarding
>>>> the situation where vma->anon_vma and the non-CoWed folio's
>>>> anon_vma are
>>>> not same, the avc binding relationship between them will no longer be
>>>> needed, so it is issue for the avc binding relationship still existing
>>>> between them.
>>>>
>>>> This patch will remove the avc binding relationship between vma and
>>>> the
>>>> non-CoWed folio's anon_vma, which each has their own independent
>>>> anon_vma. It can also alleviates rmap overhead simultaneously.
>>>>
>>>> Signed-off-by: Zhiguo Jiang <justinjiang@vivo.com>
>>>> ---
>>>> -v2:
>>>> * Solve the kernel test robot noticed "WARNING"
>>>> Reported-by: kernel test robot <oliver.sang@intel.com>
>>>> Closes:
>>>> https://lore.kernel.org/oe-lkp/202408230938.43f55b4-lkp@intel.com
>>>>
>>>> * Update comments to more accurately describe this patch.
>>>>
>>>> -v1:
>>>> https://lore.kernel.org/linux-mm/20240820143359.199-1-justinjiang@vivo.com/
>>>>
>>>>
>>>> include/linux/rmap.h | 1 +
>>>> mm/memory.c | 8 +++++++
>>>> mm/rmap.c | 53
>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>> 3 files changed, 62 insertions(+)
>>>>
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 91b5935e8485..8607d28a3146
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -257,6 +257,7 @@ void folio_remove_rmap_ptes(struct folio *,
>>>> struct page *, int nr_pages,
>>>> folio_remove_rmap_ptes(folio, page, 1, vma)
>>>> void folio_remove_rmap_pmd(struct folio *, struct page *,
>>>> struct vm_area_struct *);
>>>> +void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
>>>> void hugetlb_add_anon_rmap(struct folio *, struct
>>>> vm_area_struct *,
>>>> unsigned long address, rmap_t flags);
>>>> diff --git a/mm/memory.c b/mm/memory.c
>>>> index 93c0c25433d0..4c89cb1cb73e
>>>> --- a/mm/memory.c
>>>> +++ b/mm/memory.c
>>>> @@ -3428,6 +3428,14 @@ static vm_fault_t wp_page_copy(struct vm_fault
>>>> *vmf)
>>>> * old page will be flushed before it can be reused.
>>>> */
>>>> folio_remove_rmap_pte(old_folio, vmf->page, vma);
>>>> +
>>>> + /*
>>>> + * If the new_folio's anon_vma is different from the
>>>> + * old_folio's anon_vma, the avc binding relationship
>>>> + * between vma and the old_folio's anon_vma is removed,
>>>> + * avoiding rmap redundant overhead.
>>>> + */
>>>> + folio_remove_anon_avc(old_folio, vma);
>>>
>>> ... by increasing write fault latency, introducing an RMAP walk (!)?
>>> Hmm?
>>>
>>> On the reuse path, we do a folio_move_anon_rmap(), to optimize that.
>>>
>> Thanks for your comments. This may not be a good fixup patch. The
>> resue patch folio_move_anon_rmap() seems to be exclusive or
>> _refcount = 1 folios. The fork() path seems to clear exclusive flag
>> in copy_page_range() --> ... --> __folio_try_dup_anon_rmap(). However,
>> I observed lots of orphan avcs by the above debug trace logs in
>> wp_page_copy(). But they may be not removed by discussing with Mika.
>
> Was this patch ever tested? I cannot even boot a simple VM without an
> endless stream of
>
> [ 5.804598] ------------[ cut here ]------------
> [ 5.805494] WARNING: CPU: 11 PID: 595 at mm/rmap.c:443
> unlink_anon_vmas+0x19b/0x1d0
> [ 5.806962] Modules linked in: qemu_fw_cfg
> [ 5.807762] CPU: 11 UID: 0 PID: 595 Comm: dracut-rootfs-g Tainted:
> G W 6.11.0-rc4+ #72
> [ 5.809546] Tainted: [W]=WARN
> [ 5.810127] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009),
> BIOS 1.16.3-2.fc40 04/01/2014
> [ 5.811753] RIP: 0010:unlink_anon_vmas+0x19b/0x1d0
> [ 5.812680] Code: b0 00 00 00 00 75 1f f0 ff 8f a0 00 00 00 75 a2
> e8 8a fd ff ff eb 9b 5b 5d 41 5c 41 5d 41 5e 41 5f e9 d4 82 d0 00 0f
> 0b eb dd <0f> 0b eb cf 0f 0b 48 83 c7 08 e8 16 40 d7 ff e9 ea fe ff ff
> 48 8b
> [ 5.816247] RSP: 0018:ffffa19f43bb78d0 EFLAGS: 00010286
> [ 5.817258] RAX: ffff8a71c1bdd2d0 RBX: ffff8a71c1bdd2c0 RCX:
> ffff8a71c27a86c8
> [ 5.818624] RDX: 0000000000000001 RSI: ffff8a71c2771b28 RDI:
> ffff8a71c27a9e60
> [ 5.820011] RBP: dead000000000122 R08: 0000000000000000 R09:
> 0000000000000001
> [ 5.821380] R10: 0000000000000200 R11: 0000000000000001 R12:
> ffff8a71c2771b28
> [ 5.822748] R13: dead000000000100 R14: ffff8a71c2771b18 R15:
> ffff8a71c27a9e60
> [ 5.824122] FS: 0000000000000000(0000) GS:ffff8a7337980000(0000)
> knlGS:0000000000000000
> [ 5.825665] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 5.826775] CR2: 00007fca7f70ac58 CR3: 00000001027b2004 CR4:
> 0000000000770ef0
> [ 5.828146] PKRU: 55555554
> [ 5.828686] Call Trace:
> [ 5.829169] <TASK>
> [ 5.829594] ? __warn.cold+0xb1/0x13e
> [ 5.830312] ? unlink_anon_vmas+0x19b/0x1d0
> [ 5.831118] ? report_bug+0xff/0x140
> [ 5.831840] ? handle_bug+0x3c/0x80
> [ 5.832524] ? exc_invalid_op+0x17/0x70
> [ 5.833262] ? asm_exc_invalid_op+0x1a/0x20
> [ 5.834086] ? unlink_anon_vmas+0x19b/0x1d0
> [ 5.834908] free_pgtables+0x130/0x290
> [ 5.835661] exit_mmap+0x19a/0x460
> [ 5.836351] __mmput+0x4b/0x120
> [ 5.836965] do_exit+0x2e1/0xac0
> [ 5.837601] ? lock_release+0xd5/0x2c0
> [ 5.838343] do_group_exit+0x36/0xa0
> [ 5.839035] __x64_sys_exit_group+0x18/0x20
> [ 5.839866] x64_sys_call+0x14b4/0x14c0
Arm64 machine tested it and no crashes detected. You may try the
attachment modifition provided by Lorenzo Stoakes. Can you please
check if there are any opportunities for further improvement?
>
>
> Andrew, please remove this from mm-unstable.
Thanks
Zhiguo
[-- Attachment #2: 0001-mm-fixup-orphan-avc-cleanup-logic.patch --]
[-- Type: text/plain, Size: 6291 bytes --]
From 973ce5f0aea78196088cd527905cc0fad40edb29 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date: Sat, 24 Aug 2024 18:55:31 +0100
Subject: [RFC PATCH] mm: fixup orphan avc cleanup logic
Existing logic failed to reparent the anon_vma whose avc was removed which
resulted in assertion failures.
This patch corrects this, fixes up some comments, and does some other
cleanups.
We also do not do anything relating to anon_vma->parent manipulation if no
orphaned AVC is found.
I still feel this logic is highly dubious, but this does fix the issue with
anon_vma->num_children accounting.
This doesn't correctly handle locking of the reparented anon_vma.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/rmap.h | 2 +-
mm/memory.c | 2 +-
mm/rmap.c | 101 ++++++++++++++++++++++++++++---------------
3 files changed, 68 insertions(+), 37 deletions(-)
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8607d28a3146..f1a835f54064 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -257,7 +257,7 @@ void folio_remove_rmap_ptes(struct folio *, struct page *, int nr_pages,
folio_remove_rmap_ptes(folio, page, 1, vma)
void folio_remove_rmap_pmd(struct folio *, struct page *,
struct vm_area_struct *);
-void folio_remove_anon_avc(struct folio *, struct vm_area_struct *);
+void cleanup_orphan_avc(struct folio *, struct vm_area_struct *);
void hugetlb_add_anon_rmap(struct folio *, struct vm_area_struct *,
unsigned long address, rmap_t flags);
diff --git a/mm/memory.c b/mm/memory.c
index 4c89cb1cb73e..989b078dd860 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3435,7 +3435,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf)
* between vma and the old_folio's anon_vma is removed,
* avoiding rmap redundant overhead.
*/
- folio_remove_anon_avc(old_folio, vma);
+ cleanup_orphan_avc(old_folio, vma);
}
/* Free the old page.. */
diff --git a/mm/rmap.c b/mm/rmap.c
index 56fc16fcf2a9..3ac264962917 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1523,56 +1523,87 @@ void folio_add_file_rmap_pmd(struct folio *folio, struct page *page,
}
/**
- * folio_remove_anon_avc - remove the avc binding relationship between
- * folio and vma with different anon_vmas.
- * @folio: The folio with anon_vma to remove the binded avc from
- * @vma: The vm area to remove the binded avc with folio's anon_vma
+ * cleanup_orphan_avc - remove the avc binding relationship between a parent
+ * folio and child vma with different anon_vmas which, due to an operation such
+ * as CoW'ing a folio, is no longer meaningful.
*
- * The caller is currently used for CoWed scene.
+ * (insert ASCII diagrams and explanation here...)
+ *
+ * @old_folio: The folio which contains the parent anon_vma which has an unneeded
+ * avc binding.
+ * @new_vma: The VMA which is unnecessarily bound to folio.
*/
-void folio_remove_anon_avc(struct folio *folio,
- struct vm_area_struct *vma)
+void cleanup_orphan_avc(struct folio *old_folio, struct vm_area_struct *new_vma)
{
- struct anon_vma *anon_vma = folio_anon_vma(folio);
+ struct anon_vma *parent_anon_vma = folio_anon_vma(old_folio);
+ struct anon_vma *child_anon_vma = new_vma->anon_vma;
pgoff_t pgoff_start, pgoff_end;
struct anon_vma_chain *avc;
+ bool removed = false;
/*
- * Ensure that the vma's anon_vma and the folio's
- * anon_vma exist and are not same.
+ * If this folio were not anonymous, folio_anon_vma() would have
+ * returned NULL. Equally, if the parent and child anon_vma objects are
+ * the same, then we have nothing to do here.
*/
- if (!folio_test_anon(folio) || unlikely(!anon_vma) ||
- anon_vma == vma->anon_vma)
+ if (!parent_anon_vma || parent_anon_vma == child_anon_vma)
return;
- pgoff_start = folio_pgoff(folio);
- pgoff_end = pgoff_start + folio_nr_pages(folio) - 1;
+ pgoff_start = folio_pgoff(old_folio);
+ pgoff_end = pgoff_start + folio_nr_pages(old_folio) - 1;
- if (!anon_vma_trylock_write(anon_vma))
+ /* This is an optimistic attempt. */
+ if (!anon_vma_trylock_write(parent_anon_vma))
return;
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
- pgoff_start, pgoff_end) {
- /*
- * Find the avc associated with vma from the folio's
- * anon_vma and remove it.
- */
- if (avc->vma == vma) {
- anon_vma_interval_tree_remove(avc, &anon_vma->rb_root);
- /*
- * When removing the avc with anon_vma that is
- * different from the parent anon_vma from parent
- * anon_vma->rb_root, the parent num_children
- * count value is needed to reduce one.
- */
- anon_vma->num_children--;
+ /*
+ * Iterate through all AVC's tied to the old folio, looking for the
+ * redundant one pointing at the new VMA.
+ */
+ anon_vma_interval_tree_foreach(avc, &parent_anon_vma->rb_root,
+ pgoff_start, pgoff_end) {
+ if (avc->vma != new_vma)
+ continue;
- list_del(&avc->same_vma);
- anon_vma_chain_free(avc);
- break;
- }
+ /* Remove the unneeded avc. */
+ anon_vma_interval_tree_remove(avc, &parent_anon_vma->rb_root);
+ list_del(&avc->same_vma);
+ anon_vma_chain_free(avc);
+
+ removed = true;
+ break;
}
- anon_vma_unlock_write(anon_vma);
+
+ if (!removed)
+ goto unlock;
+
+ /*
+ * Removing an avc implies that the associated avc MAY no longer need
+ * to point to its parent, and we need to reparent it.
+ */
+
+ /*
+ * If somehow we aren't already the child of the parent anon_vma, we
+ * have nothing to do here.
+ */
+ if (child_anon_vma->parent != parent_anon_vma)
+ goto unlock;
+
+ /* OK, we abandon our parent, and reparent to ourselves. */
+
+ parent_anon_vma->num_children--;
+
+ child_anon_vma->parent = child_anon_vma;
+ child_anon_vma->num_children++;
+
+ /*
+ * Here we should probably reset the anon_vma->root, as per
+ * anon_vma_ctor() but this feels icky and horrible. Bit weird to share
+ * a lock with the old parent's root.
+ */
+
+unlock:
+ anon_vma_unlock_write(parent_anon_vma);
}
static __always_inline void __folio_remove_rmap(struct folio *folio,
--
2.46.0
next prev parent reply other threads:[~2024-08-28 1:14 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-23 14:01 Zhiguo Jiang
2024-08-26 17:24 ` David Hildenbrand
2024-08-27 1:50 ` zhiguojiang
2024-08-27 17:35 ` David Hildenbrand
2024-08-28 1:14 ` zhiguojiang [this message]
2024-08-28 3:51 ` Mika Penttilä
2024-08-23 15:02 Zhiguo Jiang
2024-08-24 5:35 ` Andrew Morton
2024-08-25 4:10 ` zhiguojiang
2024-08-25 4:17 ` zhiguojiang
2024-08-24 16:26 ` Lorenzo Stoakes
2024-08-24 18:04 ` Lorenzo Stoakes
2024-08-25 5:06 ` zhiguojiang
2024-08-25 6:39 ` Lorenzo Stoakes
2024-08-25 18:13 ` Mika Penttilä
2024-08-26 2:56 ` zhiguojiang
2024-08-26 4:30 ` Mika Penttilä
2024-08-25 6:42 ` Lorenzo Stoakes
2024-08-25 7:08 ` zhiguojiang
2024-08-26 17:03 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5b56e76b-cf73-4cfd-b4c5-03fdece234fd@vivo.com \
--to=justinjiang@vivo.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=opensource.kernel@vivo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox