Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	linux-mm <linux-mm@kvack.org>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded
Date: Mon, 31 Mar 2025 15:24:09 -0400	[thread overview]
Message-ID: <4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt> (raw)
In-Reply-To: <CAGudoHFfsmbdA=aYXnXO87+8hLgZeC7mDm32oqowHiUUxO-Xrw@mail.gmail.com>

* Mateusz Guzik <mjguzik@gmail.com> [250331 13:51]:
> On Mon, Mar 31, 2025 at 6:43 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Mateusz Guzik <mjguzik@gmail.com> [250330 15:43]:
> > > On Sun, Mar 30, 2025 at 9:23 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > However, the good news is that mm_count tends to be 1. If both
> > > > > mm_count and mm_users are 1, then there is no usefaultfd in use and
> > > > > nobody to add it either.
> > > >
> > > > I'm not sure... IIUC new_userfaultfd() does not take mmap_lock while
> > > > calling mmgrab(), therefore I think it can race with the code checking
> > > > its value.
> > > >
> >
> > Considering the mm struct isn't the only way to find the vmas and there
> > are users who use other locks to ensure the mm and vma don't go away
> > (rmap, for example).  It is reasonable to think that other users may use
> > the vma lock to avoid mm_struct accesses racing.
> >
> > Although I don't know of a way this is unsafe today, we are complicating
> > the locking story of the mm with this change and no data has been given
> > on the benefit.  I don't recall any regression caused by the addition of
> > per-vma locking?
> >
> 
> I was not involved in any of this, but I learned about the issue from lwn:
> https://lwn.net/Articles/937943/
> 
> the new (at the time) per-vma locking was suffering weird crashes in
> multithreaded programs and Suren ultimately fixed it by locking parent
> vma at a 5% hit,
> see fb49c455323f ("fork: lock VMAs of the parent process when
> forking"). The patch merely adds vma_start_write(mpnt) in dup_mmap.

Ah, thanks for the pointer.  I forgot about that part.

You are referencing the 5% on 5,000 forks of 10,000 vmas in a tight
loop.  The kernel build time was not affected.  I'm sticking with this
being an inconsequential increase on any meaningful benchmark.

Reading the commit did trigger my memory about the 5% regression, but I
disregarded it as it seems so unlikely in real life that the benefits
outweighed the upper bounds of 5% negative.

The 5,000 forks of 10,000 vmas benchmark would also be at least a bit
faster with Suren's more recent rcu safe vma change [1].

Although, that doesn't rule out changing this to get higher number, I
still don't think the complexity of the change is worth it for a
contrived benchmark.

> 
> What I'm proposing here remedies the problem for most commonly forking
> consumers (single-threaded), assuming it does work. ;)
> 
> To that end see below.
> 
> > >
> > > It issues:
> > > ctx->mm = current->mm;
> > > ...
> > > mmgrab(ctx->mm);
> > >
> > > Thus I claim if mm_count is 1 *and* mm_users is 1 *and* we are in
> > > dup_mmap(), nobody has a userfaultfd for our mm and there is nobody to
> > > create it either and the optimization is saved.
> >
> > mm_count is lazy, so I am not entirely sure we can trust what it says.
> > But maybe that's only true of mmgrab_lazy_tlb() now?
> >
> 
> warning: I don't know the Linux nomenclature here. I'm going to
> outline how I see it.
> 
> There is an idiomatic way of splitting ref counts into two:
> - something to prevent the struct itself from getting freed ("hold
> count" where I'm from, in this case ->mm_count)
> - something to prevent data used by the structure from getting freed
> ("use count" where I'm from, in this case ->mm_users)
> 
> mm_users > 0 keeps one ref on ->mm_count
> 
> AFAICS the scheme employed for mm follows the mold.
> 
> So with that mmgrab_lazy_tlb() example, the call bumps the count on first use.
> 
> Suppose we are left with one thread in the process and a lazy tlb
> somewhere as the only consumers. mm_users is 1 because of the only
> thread and mm_count is 2 -- one ref for mm_users > 0 and one ref for
> lazy tlb.
> 
> Then my proposed check: ->mm_count == 1 && mm->mm_users == 1
> 
> ... fails and the optimization is not used.
> 
> Now, per the above, the lock was added to protect against faults
> happening in parallel. The only cases I found where this is of note
> are remote accesses (e.g., from /proc/pid/cmdline) and userfaultfd.
> 
> I'm not an mm person and this is why I referred to Suren to sort this
> out, hoping he would both have interest and enough knowledge about mm
> to validate it.
> 
> That is to say I don't *vouch* the idea myself (otherwise I would sign
> off on a patch), I am merely bringing it up again long after the dust
> has settled. If the idea is a nogo, then so be it, but then it would
> be nice to document somewhere why is it so.

Thanks for the breakdown here.

I am also not sure, but I felt it worth the mention.

Reading Documentation/mm/active_mm.rst makes me uncertain of the
mm_count == 1 and mm_users == 1 test.  Since mm_count is the number of
'lazy' users, and it may no longer include the lazy users..

Since there are no real workloads that suffer, is this worth it?

[1]. https://lore.kernel.org/all/202503311656.e3596aaf-lkp@intel.com/

Thanks,
Liam

next prev parent reply	other threads:[~2025-03-31 19:24 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-27  5:46 Mateusz Guzik
2025-03-29  0:56 ` Suren Baghdasaryan
2025-03-29  1:15   ` Mateusz Guzik
2025-03-29  1:35     ` Suren Baghdasaryan
2025-03-29  1:51       ` Mateusz Guzik
2025-03-30 19:23         ` Suren Baghdasaryan
2025-03-30 19:25           ` Suren Baghdasaryan
2025-03-30 19:43           ` Mateusz Guzik
2025-03-31 16:43             ` Liam R. Howlett
2025-03-31 17:50               ` Mateusz Guzik
2025-03-31 18:42                 ` Suren Baghdasaryan
2025-03-31 19:24                 ` Liam R. Howlett [this message]
2025-03-31 20:27                   ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt \
    --to=liam.howlett@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox