linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,
	syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH] mm: fix vma_start_write_killable() signal handling
Date: Wed, 26 Nov 2025 16:18:38 +0000	[thread overview]
Message-ID: <a03aa1c8-b54a-4ac6-82f1-40a06dcd7150@lucifer.local> (raw)
In-Reply-To: <058f5858-f508-40f8-adfe-e5de78621d64@suse.cz>

On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
>
>
> On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> >>>>
> >>>> Doh! This is embarassing...
> >>>
> >>> Hand-rolled synchronization primitives are wonderful, aren't they?
> >>
> >> That's why I liked the original approach of just using rwsems.  I
> >> mst admit to having not paid attention to this recently so I don't
> >> know what motivated the change.
> >>
> >>>> Wait, why do we consider this as a successful acquisition? The
> >>>> vm_refcnt is 0, so this is similar situation to an earlier:
> >>>>
> >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> >>>>         return 0;
> >>>
> >>> But this means "vma is not attached" not "we failed to lock it".
> >>>
> >>>> IOW, the vma is not referenced, so we failed to lock it. I think the
> >>>> fix should be:
> >>>>
> >>>>         if (err) {
> >>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> >>>> +                       /* Oh cobblers.  While we got a fatal signal, we
> >>>> +                        * raced with the last user.  VMA is not referenced,
> >>>> +                        * fail to lock it.
> >>>> +                        */
> >>>> +                       err = 0;
> >>>
> >>> Returning 0 in this situation therefore wouldn't be correct.
> >>>
> >>> AFAIU since we started with attached vma above, it's not possible that
> >>> the refcount_sub_and_test here will drop the refcnt to zero. We could
> >>> just WARN_ON_ONCE() on the result (in a way to make also the
> >>> __must_check happy) and then can return err below.
> >>
> >> But how do we know that we started with an attached VMA?  Maybe the VMA
> >> was in the process of being detached and still has readers?
> >
> > So we're talking about:
> >
> > vma_mark_deteched()
> > -> refcount_dec_and_test() [ ref count is zero ]
> > -> __vma_enter_locked()
>
> I think it's refcount is NOT zero to continue with __vma_enter_locked().

Yup sorry, misread the ! clearly...

Which makes a lot more sense when it comes to picking up spurious reader
refcount increases :)

>
> > (meanwhile...)
> >
> > -> reader attempts to read
> >   -> optimistic check doesn't successfully find write locked VMA
> >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> >     (??? how)
>
> That shouldn't be possible, yeah. But per above, it's actually not zero.

Yup so this just makes it more likely to happen...

>
> > (back to vma_mark_attached() -> __vma_enter_locked())
>
> Back to _attached()? but it's _detached() above? You mean _detached()
> right? Just to be sure

Yup, I typed this all a bit too quick...

>
> > -> refcount_add_not_zero() returns true
>
> Ack.
>
> > [ process gets fatal signal ]
> >
> > -> rcuwait_wait_event() errors out
> > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
>
> AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> so this path can't error due to the fatal signal.

Right good point.

I hate that we make this so 'gosh darned' implicit.

We are now assuming that:

1. the only way that RCU wait can fail is due to pending fatal signal
2. and that we're fine here because it's uninterruptible.

I mean very doubtful we'll ever change that but it's still gross.

And as Willy says we're paving the road with good intent^Wlandmines.

>
> > Correct me if the above is wrong.

Yeah I was wrong thankfully :)

The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.

> >
> > I mean is any of this actually possible...?
> >
> > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> >
> > (Again I wonder why we made our lives so difficult here)
> >
> > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > for the readers to go away, and our reader is about to go away anyway, but the
> > process has a fatal signal so do we even care?
>
> Yeah I guess it's for the best to keep vma_mark_detached() use the
> TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
> would be counter productive.
>
> > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > happens...
>
> Not for this path, but for vma_start_write_killable -> __vma_start_write
> -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't

Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can
add a comment.

> trigger, but since we need to check result of the
> refcount_sub_and_test() anyway, we might as well WARN_ON it.

Probably it can't no.

>
> > OK just going to reattach... my head which just exploded from the above :P
> >
> > Cheers, Lorenzo
>
>

Thanks, Lorenzo


  parent reply	other threads:[~2025-11-26 16:18 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26  3:42 Matthew Wilcox (Oracle)
2025-11-26  4:28 ` Suren Baghdasaryan
2025-11-26 14:26   ` Suren Baghdasaryan
2025-11-26 14:40     ` Vlastimil Babka
2025-11-26 15:01     ` Matthew Wilcox
2025-11-26 14:36   ` Vlastimil Babka
2025-11-26 15:02     ` Lorenzo Stoakes
2025-11-26 15:05     ` Matthew Wilcox
2025-11-26 15:20       ` Lorenzo Stoakes
2025-11-26 15:49         ` Suren Baghdasaryan
2025-11-26 16:00           ` Lorenzo Stoakes
2025-11-26 16:11             ` Suren Baghdasaryan
2025-11-26 16:04         ` Vlastimil Babka
2025-11-26 16:06           ` Matthew Wilcox
2025-11-26 16:18           ` Lorenzo Stoakes [this message]
2025-11-26 18:06             ` Suren Baghdasaryan
2025-11-26 18:11               ` Lorenzo Stoakes
2025-11-26 15:53       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a03aa1c8-b54a-4ac6-82f1-40a06dcd7150@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox