From: Suren Baghdasaryan <surenb@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org,
syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com,
"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH] mm: fix vma_start_write_killable() signal handling
Date: Wed, 26 Nov 2025 08:11:16 -0800 [thread overview]
Message-ID: <CAJuCfpEajhjsT0Y5sYiNGALUDMr+t3A=VvqsVzgYUvkHLdTW3g@mail.gmail.com> (raw)
In-Reply-To: <3d069afd-a19e-4ee7-bbb9-7d15b065ab1f@lucifer.local>
On Wed, Nov 26, 2025 at 8:01 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 07:49:13AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > > > > >
> > > > > > Doh! This is embarassing...
> > > > >
> > > > > Hand-rolled synchronization primitives are wonderful, aren't they?
> > > >
> > > > That's why I liked the original approach of just using rwsems. I
> > > > mst admit to having not paid attention to this recently so I don't
> > > > know what motivated the change.
> >
> > That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs
> > which improved the performance of allocating these structs during
> > calls like mmap and fork.
>
> Hmm, we doubled down :)
>
> Hope the wins were worth the complexity, but obviously the horse has bolted
> and I should have asked more about it at the time...
>
> >
> > > >
> > > > > > Wait, why do we consider this as a successful acquisition? The
> > > > > > vm_refcnt is 0, so this is similar situation to an earlier:
> > > > > >
> > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > > > > > return 0;
> > > > >
> > > > > But this means "vma is not attached" not "we failed to lock it".
> > > > >
> > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the
> > > > > > fix should be:
> > > > > >
> > > > > > if (err) {
> > > > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > > > > + /* Oh cobblers. While we got a fatal signal, we
> > > > > > + * raced with the last user. VMA is not referenced,
> > > > > > + * fail to lock it.
> > > > > > + */
> > > > > > + err = 0;
> > > > >
> > > > > Returning 0 in this situation therefore wouldn't be correct.
> > > > >
> > > > > AFAIU since we started with attached vma above, it's not possible that
> > > > > the refcount_sub_and_test here will drop the refcnt to zero. We could
> > > > > just WARN_ON_ONCE() on the result (in a way to make also the
> > > > > __must_check happy) and then can return err below.
> > > >
> > > > But how do we know that we started with an attached VMA? Maybe the VMA
> > > > was in the process of being detached and still has readers?
> > >
> > > So we're talking about:
> > >
> > > vma_mark_deteched()
> > > -> refcount_dec_and_test() [ ref count is zero ]
> > > -> __vma_enter_locked()
> > >
> > > (meanwhile...)
> > >
> > > -> reader attempts to read
> > > -> optimistic check doesn't successfully find write locked VMA
> > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> > > (??? how)
> > >
> > > (back to vma_mark_attached() -> __vma_enter_locked())
> > >
> > > -> refcount_add_not_zero() returns true
> > >
> > > [ process gets fatal signal ]
> > >
> > > -> rcuwait_wait_event() errors out
> > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> > >
> > > Correct me if the above is wrong.
> > >
> > > I mean is any of this actually possible...?
> > >
> > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> > >
> > > (Again I wonder why we made our lives so difficult here)
> > >
> > > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > > for the readers to go away, and our reader is about to go away anyway, but the
> > > process has a fatal signal so do we even care?
> > >
> > > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > > happens...
> > >
> > > OK just going to reattach... my head which just exploded from the above :P
> >
> > We are overthinking this. Vlastimil is right. If we entered with an
>
> Watch Vlastimil frame this... ;)
>
> > attached VMA (which is the case due to this check:
> > https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60)
> > then the only function that can detach the VMA from under us is
> > vma_mark_detached() but that function can't race with
>
> We hold the mmap/VMA write lock yes, but as per the comment in
> vma_mark_detached() + above list readers can spuriously increment the
> refcount (ostensibly) as per:
>
> /* Wait until vma is detached with no readers. */
>
> So doesn't that mean we can hit the issue Matthew raised?
>
> Otherwise there would be no need to do this dance in vma_mark_detached()
> anyway?
>
> Maybe I'm missing something.
No, you are right. I realized after sending the reply this temporary
refcount from the reader can be dropped from under the writer. Was in
the process of analyzing these paths when Vlastimil's reply came in.
I'll spend some more time stairing at it to see if we are missing
anything else.
>
> > __vma_enter_locked() because both functions are required to hold
> > mmap_write_lock. vma_mark_detached() has an explicit
> > vma_assert_write_locked(vma) check (which implies mmap_write_lock) and
> > vma_start_write() calls __is_vma_write_locked() which does
> > mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an
> > mmap_assert_write_locked(vma->vm_mm); at the beginning of the
> > __vma_enter_locked() to make that obvious.
> >
> > > Cheers, Lorenzo
> >
>
> Thanks, Lorenzo
next prev parent reply other threads:[~2025-11-26 16:11 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-26 3:42 Matthew Wilcox (Oracle)
2025-11-26 4:28 ` Suren Baghdasaryan
2025-11-26 14:26 ` Suren Baghdasaryan
2025-11-26 14:40 ` Vlastimil Babka
2025-11-26 15:01 ` Matthew Wilcox
2025-11-26 14:36 ` Vlastimil Babka
2025-11-26 15:02 ` Lorenzo Stoakes
2025-11-26 15:05 ` Matthew Wilcox
2025-11-26 15:20 ` Lorenzo Stoakes
2025-11-26 15:49 ` Suren Baghdasaryan
2025-11-26 16:00 ` Lorenzo Stoakes
2025-11-26 16:11 ` Suren Baghdasaryan [this message]
2025-11-26 16:04 ` Vlastimil Babka
2025-11-26 16:06 ` Matthew Wilcox
2025-11-26 16:18 ` Lorenzo Stoakes
2025-11-26 18:06 ` Suren Baghdasaryan
2025-11-26 18:11 ` Lorenzo Stoakes
2025-11-26 15:53 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpEajhjsT0Y5sYiNGALUDMr+t3A=VvqsVzgYUvkHLdTW3g@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox