From: Suren Baghdasaryan <surenb@google.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
linux-mm <linux-mm@kvack.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Matthew Wilcox <willy@infradead.org>
Subject: Re: not issuing vma_start_write() in dup_mmap() if the caller is single-threaded
Date: Mon, 31 Mar 2025 11:42:35 -0700 [thread overview]
Message-ID: <CAJuCfpHJmmU-R4cPxG=5y1HDO=sB4j3LsuSoi5fg5xwwXbCY8g@mail.gmail.com> (raw)
In-Reply-To: <CAGudoHFfsmbdA=aYXnXO87+8hLgZeC7mDm32oqowHiUUxO-Xrw@mail.gmail.com>
On Mon, Mar 31, 2025 at 10:51 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
>
> On Mon, Mar 31, 2025 at 6:43 PM Liam R. Howlett <Liam.Howlett@oracle.com> wrote:
> >
> > * Mateusz Guzik <mjguzik@gmail.com> [250330 15:43]:
> > > On Sun, Mar 30, 2025 at 9:23 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > However, the good news is that mm_count tends to be 1. If both
> > > > > mm_count and mm_users are 1, then there is no usefaultfd in use and
> > > > > nobody to add it either.
> > > >
> > > > I'm not sure... IIUC new_userfaultfd() does not take mmap_lock while
> > > > calling mmgrab(), therefore I think it can race with the code checking
> > > > its value.
> > > >
> >
> > Considering the mm struct isn't the only way to find the vmas and there
> > are users who use other locks to ensure the mm and vma don't go away
> > (rmap, for example). It is reasonable to think that other users may use
> > the vma lock to avoid mm_struct accesses racing.
> >
> > Although I don't know of a way this is unsafe today, we are complicating
> > the locking story of the mm with this change and no data has been given
> > on the benefit. I don't recall any regression caused by the addition of
> > per-vma locking?
> >
>
> I was not involved in any of this, but I learned about the issue from lwn:
> https://lwn.net/Articles/937943/
>
> the new (at the time) per-vma locking was suffering weird crashes in
> multithreaded programs and Suren ultimately fixed it by locking parent
> vma at a 5% hit,
> see fb49c455323f ("fork: lock VMAs of the parent process when
> forking"). The patch merely adds vma_start_write(mpnt) in dup_mmap.
>
> What I'm proposing here remedies the problem for most commonly forking
> consumers (single-threaded), assuming it does work. ;)
>
> To that end see below.
>
> > >
> > > It issues:
> > > ctx->mm = current->mm;
> > > ...
> > > mmgrab(ctx->mm);
> > >
> > > Thus I claim if mm_count is 1 *and* mm_users is 1 *and* we are in
> > > dup_mmap(), nobody has a userfaultfd for our mm and there is nobody to
> > > create it either and the optimization is saved.
> >
> > mm_count is lazy, so I am not entirely sure we can trust what it says.
> > But maybe that's only true of mmgrab_lazy_tlb() now?
> >
>
> warning: I don't know the Linux nomenclature here. I'm going to
> outline how I see it.
>
> There is an idiomatic way of splitting ref counts into two:
> - something to prevent the struct itself from getting freed ("hold
> count" where I'm from, in this case ->mm_count)
> - something to prevent data used by the structure from getting freed
> ("use count" where I'm from, in this case ->mm_users)
>
> mm_users > 0 keeps one ref on ->mm_count
>
> AFAICS the scheme employed for mm follows the mold.
>
> So with that mmgrab_lazy_tlb() example, the call bumps the count on first use.
>
> Suppose we are left with one thread in the process and a lazy tlb
> somewhere as the only consumers. mm_users is 1 because of the only
> thread and mm_count is 2 -- one ref for mm_users > 0 and one ref for
> lazy tlb.
>
> Then my proposed check: ->mm_count == 1 && mm->mm_users == 1
>
> ... fails and the optimization is not used.
>
> Now, per the above, the lock was added to protect against faults
> happening in parallel. The only cases I found where this is of note
> are remote accesses (e.g., from /proc/pid/cmdline) and userfaultfd.
>
> I'm not an mm person and this is why I referred to Suren to sort this
> out, hoping he would both have interest and enough knowledge about mm
> to validate it.
>
> That is to say I don't *vouch* the idea myself (otherwise I would sign
> off on a patch), I am merely bringing it up again long after the dust
> has settled. If the idea is a nogo, then so be it, but then it would
> be nice to document somewhere why is it so.
I think it would be worth optimizing if it's as straight-forward as we
think so far. I would like to spend some more time checking uffd code
to see if anything else might be lurking there before posting the
patch. If I don't find anything new I'll post a patch sometime next
week.
Thanks,
Suren.
> --
> Mateusz Guzik <mjguzik gmail.com>
next prev parent reply other threads:[~2025-03-31 18:42 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-27 5:46 Mateusz Guzik
2025-03-29 0:56 ` Suren Baghdasaryan
2025-03-29 1:15 ` Mateusz Guzik
2025-03-29 1:35 ` Suren Baghdasaryan
2025-03-29 1:51 ` Mateusz Guzik
2025-03-30 19:23 ` Suren Baghdasaryan
2025-03-30 19:25 ` Suren Baghdasaryan
2025-03-30 19:43 ` Mateusz Guzik
2025-03-31 16:43 ` Liam R. Howlett
2025-03-31 17:50 ` Mateusz Guzik
2025-03-31 18:42 ` Suren Baghdasaryan [this message]
2025-03-31 19:24 ` Liam R. Howlett
2025-03-31 20:27 ` Mateusz Guzik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpHJmmU-R4cPxG=5y1HDO=sB4j3LsuSoi5fg5xwwXbCY8g@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mjguzik@gmail.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox