From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
linux-mm <linux-mm@kvack.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded
Date: Mon, 31 Mar 2025 15:24:09 -0400 [thread overview]
Message-ID: <4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt> (raw)
In-Reply-To: <CAGudoHFfsmbdA=aYXnXO87+8hLgZeC7mDm32oqowHiUUxO-Xrw@mail.gmail.com>
* Mateusz Guzik <mjguzik@gmail.com> [250331 13:51]:
> On Mon, Mar 31, 2025 at 6:43 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Mateusz Guzik <mjguzik@gmail.com> [250330 15:43]:
> > > On Sun, Mar 30, 2025 at 9:23 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > However, the good news is that mm_count tends to be 1. If both
> > > > > mm_count and mm_users are 1, then there is no usefaultfd in use and
> > > > > nobody to add it either.
> > > >
> > > > I'm not sure... IIUC new_userfaultfd() does not take mmap_lock while
> > > > calling mmgrab(), therefore I think it can race with the code checking
> > > > its value.
> > > >
> >
> > Considering the mm struct isn't the only way to find the vmas and there
> > are users who use other locks to ensure the mm and vma don't go away
> > (rmap, for example). It is reasonable to think that other users may use
> > the vma lock to avoid mm_struct accesses racing.
> >
> > Although I don't know of a way this is unsafe today, we are complicating
> > the locking story of the mm with this change and no data has been given
> > on the benefit. I don't recall any regression caused by the addition of
> > per-vma locking?
> >
>
> I was not involved in any of this, but I learned about the issue from lwn:
> https://lwn.net/Articles/937943/
>
> the new (at the time) per-vma locking was suffering weird crashes in
> multithreaded programs and Suren ultimately fixed it by locking parent
> vma at a 5% hit,
> see fb49c455323f ("fork: lock VMAs of the parent process when
> forking"). The patch merely adds vma_start_write(mpnt) in dup_mmap.
Ah, thanks for the pointer. I forgot about that part.
You are referencing the 5% on 5,000 forks of 10,000 vmas in a tight
loop. The kernel build time was not affected. I'm sticking with this
being an inconsequential increase on any meaningful benchmark.
Reading the commit did trigger my memory about the 5% regression, but I
disregarded it as it seems so unlikely in real life that the benefits
outweighed the upper bounds of 5% negative.
The 5,000 forks of 10,000 vmas benchmark would also be at least a bit
faster with Suren's more recent rcu safe vma change [1].
Although, that doesn't rule out changing this to get higher number, I
still don't think the complexity of the change is worth it for a
contrived benchmark.
>
> What I'm proposing here remedies the problem for most commonly forking
> consumers (single-threaded), assuming it does work. ;)
>
> To that end see below.
>
> > >
> > > It issues:
> > > ctx->mm = current->mm;
> > > ...
> > > mmgrab(ctx->mm);
> > >
> > > Thus I claim if mm_count is 1 *and* mm_users is 1 *and* we are in
> > > dup_mmap(), nobody has a userfaultfd for our mm and there is nobody to
> > > create it either and the optimization is saved.
> >
> > mm_count is lazy, so I am not entirely sure we can trust what it says.
> > But maybe that's only true of mmgrab_lazy_tlb() now?
> >
>
> warning: I don't know the Linux nomenclature here. I'm going to
> outline how I see it.
>
> There is an idiomatic way of splitting ref counts into two:
> - something to prevent the struct itself from getting freed ("hold
> count" where I'm from, in this case ->mm_count)
> - something to prevent data used by the structure from getting freed
> ("use count" where I'm from, in this case ->mm_users)
>
> mm_users > 0 keeps one ref on ->mm_count
>
> AFAICS the scheme employed for mm follows the mold.
>
> So with that mmgrab_lazy_tlb() example, the call bumps the count on first use.
>
> Suppose we are left with one thread in the process and a lazy tlb
> somewhere as the only consumers. mm_users is 1 because of the only
> thread and mm_count is 2 -- one ref for mm_users > 0 and one ref for
> lazy tlb.
>
> Then my proposed check: ->mm_count == 1 && mm->mm_users == 1
>
> ... fails and the optimization is not used.
>
> Now, per the above, the lock was added to protect against faults
> happening in parallel. The only cases I found where this is of note
> are remote accesses (e.g., from /proc/pid/cmdline) and userfaultfd.
>
> I'm not an mm person and this is why I referred to Suren to sort this
> out, hoping he would both have interest and enough knowledge about mm
> to validate it.
>
> That is to say I don't *vouch* the idea myself (otherwise I would sign
> off on a patch), I am merely bringing it up again long after the dust
> has settled. If the idea is a nogo, then so be it, but then it would
> be nice to document somewhere why is it so.
Thanks for the breakdown here.
I am also not sure, but I felt it worth the mention.
Reading Documentation/mm/active_mm.rst makes me uncertain of the
mm_count == 1 and mm_users == 1 test. Since mm_count is the number of
'lazy' users, and it may no longer include the lazy users..
Since there are no real workloads that suffer, is this worth it?
[1]. https://lore.kernel.org/all/202503311656.e3596aaf-lkp@intel.com/
Thanks,
Liam
next prev parent reply other threads:[~2025-03-31 19:24 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 5:46 Mateusz Guzik
2025-03-29 0:56 ` Suren Baghdasaryan
2025-03-29 1:15 ` Mateusz Guzik
2025-03-29 1:35 ` Suren Baghdasaryan
2025-03-29 1:51 ` Mateusz Guzik
2025-03-30 19:23 ` Suren Baghdasaryan
2025-03-30 19:25 ` Suren Baghdasaryan
2025-03-30 19:43 ` Mateusz Guzik
2025-03-31 16:43 ` Liam R. Howlett
2025-03-31 17:50 ` Mateusz Guzik
2025-03-31 18:42 ` Suren Baghdasaryan
2025-03-31 19:24 ` Liam R. Howlett [this message]
2025-03-31 20:27 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4v25j6kbo4flzy5zbubf2w6bgesz2juadevokqlg2ugbe3im3h@h37mcss2aclt \
--to=liam.howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mjguzik@gmail.com \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox