linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>, Jann Horn <jannh@google.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	Pedro Falcato <pfalcato@suse.de>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v2 01/10] mm/mremap: introduce more mergeable mremap via MREMAP_RELOCATE_ANON
Date: Thu, 1 May 2025 10:27:47 +0100	[thread overview]
Message-ID: <d6d5a67e-efcf-4e23-90c4-4f6e370bde32@lucifer.local> (raw)
In-Reply-To: <20250501011845.ktbfgymor4oz5sok@master>

On Thu, May 01, 2025 at 01:18:45AM +0000, Wei Yang wrote:
> On Wed, Apr 30, 2025 at 05:07:40PM +0100, Lorenzo Stoakes wrote:
> >On Wed, Apr 30, 2025 at 03:41:19PM +0000, Wei Yang wrote:
> >> On Wed, Apr 30, 2025 at 02:15:24PM +0100, Lorenzo Stoakes wrote:
> >> >On Wed, Apr 30, 2025 at 12:47:03AM +0000, Wei Yang wrote:
> >> >> On Tue, Apr 22, 2025 at 09:09:20AM +0100, Lorenzo Stoakes wrote:
> >> >> [...]
> >> >> >+bool vma_had_uncowed_children(struct vm_area_struct *vma)
> >> >> >+{
> >> >> >+	struct anon_vma *anon_vma = vma ? vma->anon_vma : NULL;
> >> >> >+	bool ret;
> >> >> >+
> >> >> >+	if (!anon_vma)
> >> >> >+		return false;
> >> >> >+
> >> >> >+	/*
> >> >> >+	 * If we're mmap locked then there's no way for this count to change, as
> >> >> >+	 * any such change would require this lock not be held.
> >> >> >+	 */
> >> >> >+	if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
> >> >> >+		return anon_vma->num_children > 1;
> >> >>
> >> >> Hi, Lorenzo
> >> >>
> >> >> May I have a question here?
> >> >
> >> >Just ask the question.
> >> >
> >>
> >> Thanks.
> >>
> >> My question is the function is expected to return true, if we have forked a
> >> vma from this one, right?
> >>
> >> IMO there are cases when it has one forked child and anon_vma->num_children == 1,
> >> which means folios are not exclusively mapped. But the function would return
> >> false.
> >>
> >> Or maybe I misunderstand the logic here.
> >
> >I mean, it'd be helpful if you delineated which cases these were?
> >
>
> Sorry, I should be more specific.
>
> >Presumably you're thiking of something like:
> >
> >1. Process 1: VMA A is established. num_children == 1 (self-reference is counted).
> >2. Process 2: Process 1 forks, VMA B references A, a->num_children++
> >3. Process 3: Process 2 forks, VMA C is established (maybe you think b->num_children++?)
>
> Maybe this is the key point. Will explain below at ***.
>
> >4. Unmap vma B, oops, a->num_children == 1 but it still has C!
> >
> >But that won't happen, as VMA C will be referencing a->anon_vma, so in reality
> >a->anon_vma->num_children == 3, then after unmap == 2.
> >
>
> The case here could be handled well, I am thinking a little different one.
>
> Here is the case I am thinking about. If my understanding is wrong, please
> correct me.
>
> 	a                  VMA A
> 	+-----------+      +-----------+
> 	|           | ---> |         av| == a
> 	+-----------+      +-----------+
> 	             \
> 	              \
> 	              |\   VMA B
> 	              | \  +-----------+
> 	              |  > |         av| == b
> 	              |    +-----------+
> 	              \
> 	               \   VMA C
> 	                \  +-----------+
> 	                 > |         av| == c
> 	                   +-----------+
>
> 1. Process 1: VMA A is established, num_children == 1
> 2. Process 2: Process 1 forks, a->num_children++ and b->num_children == 0
> 3. Process 3: Process 2 forks, b->num_children++ => b->number_children == 1
>
> If vma_had_uncowed_children(VMA B), we would check b->number_children and
> return false since it is not greater than 1. But we do have a child process 3.
>
> ***
>
> Come back the b->num_children. After re-read your example, I guess this is the
> key point. In anon_vma_fork(), we do anon_vma->parent->num_children++. So when
> fork VMA C, we increase b->num_children instead of a->num_children.
>
> To verify this, I did a quick test in my test cases in
> test_fork_grand_child[1]. I see b->num_children is increased to 1 after C is
> forked. Will reply in that thread and hope that would be helpful to
> communicate the case.
>
> Well, if I am not correct, feel free to correct me :-)

OK so you've expressed this in a very confusing way and the diagram is
wrong but I think I see the point.

Because of anon_vma reuse logic in anon_vma_clone() we might end up in the
situation where num_children (which strictly reports number of anon_vma
objects whose parent pointer points at that anon_vma) does not actually
correctly reflect the fact that there are multiple mappings of a folio.

I think correct approach is to also look at num_active_vmas which accounts
for this, but I think overall we should move these checks to being a 'best
guess' and remove the WARN_ON() around the multiply-mapped folio
logic. It's fine to just back out if we guesstimated wrong.

I'll also add a bunch of tests to assert specific fork scenarios.

>
> [1]: http://lkml.kernel.org/r/20250429090639.784-3-richard.weiyang@gmail.com
>
> >References to the originally faulted-in anon_vma is propagated through the
> >forks.
> >
> >anon_vma logic is tricky, one of many reasons I want to (significantly) rework
> >it.
> >
> >Though sadly there is a lot of _essential_ complexity, I do think we can do
> >better.
> >
>
> --
> Wei Yang
> Help you, Help me


  reply	other threads:[~2025-05-01  9:28 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-22  8:09 [RFC PATCH v2 00/10] " Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 01/10] " Lorenzo Stoakes
2025-04-30  0:47   ` Wei Yang
2025-04-30 12:50     ` Vlastimil Babka
2025-04-30 13:15     ` Lorenzo Stoakes
2025-04-30 15:41       ` Wei Yang
2025-04-30 16:07         ` Lorenzo Stoakes
2025-05-01  1:18           ` Wei Yang
2025-05-01  9:27             ` Lorenzo Stoakes [this message]
2025-05-01 14:35               ` Wei Yang
2025-05-01 14:38                 ` Lorenzo Stoakes
2025-05-03 14:29                   ` Lorenzo Stoakes
2025-05-03 17:50                     ` Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 02/10] mm/mremap: add MREMAP_MUST_RELOCATE_ANON Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 03/10] mm/mremap: add MREMAP[_MUST]_RELOCATE_ANON support for large folios Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 04/10] tools UAPI: Update copy of linux/mman.h from the kernel sources Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 05/10] tools/testing/selftests: add sys_mremap() helper to vm_util.h Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 06/10] tools/testing/selftests: add mremap() cases that merge normally Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 07/10] tools/testing/selftests: add MREMAP_RELOCATE_ANON merge test cases Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 08/10] tools/testing/selftests: expand mremap() tests for MREMAP_RELOCATE_ANON Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 09/10] tools/testing/selftests: have CoW self test use MREMAP_RELOCATE_ANON Lorenzo Stoakes
2025-04-22  8:09 ` [RFC PATCH v2 10/10] tools/testing/selftests: test relocate anon in split huge page test Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d6d5a67e-efcf-4e23-90c4-4f6e370bde32@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pfalcato@suse.de \
    --cc=richard.weiyang@gmail.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox