linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	Matthew Wilcox <willy@infradead.org>,
	 Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,
	 syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH] mm: fix vma_start_write_killable() signal handling
Date: Wed, 26 Nov 2025 10:06:28 -0800	[thread overview]
Message-ID: <CAJuCfpFMt6P2cjgb7xfWVz5fhgK67=TQod8HZfJknLYSvZ5EgA@mail.gmail.com> (raw)
In-Reply-To: <a03aa1c8-b54a-4ac6-82f1-40a06dcd7150@lucifer.local>

On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
> >
> >
> > On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > >>>>
> > >>>> Doh! This is embarassing...
> > >>>
> > >>> Hand-rolled synchronization primitives are wonderful, aren't they?
> > >>
> > >> That's why I liked the original approach of just using rwsems.  I
> > >> mst admit to having not paid attention to this recently so I don't
> > >> know what motivated the change.
> > >>
> > >>>> Wait, why do we consider this as a successful acquisition? The
> > >>>> vm_refcnt is 0, so this is similar situation to an earlier:
> > >>>>
> > >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > >>>>         return 0;
> > >>>
> > >>> But this means "vma is not attached" not "we failed to lock it".
> > >>>
> > >>>> IOW, the vma is not referenced, so we failed to lock it. I think the
> > >>>> fix should be:
> > >>>>
> > >>>>         if (err) {
> > >>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > >>>> +                       /* Oh cobblers.  While we got a fatal signal, we
> > >>>> +                        * raced with the last user.  VMA is not referenced,
> > >>>> +                        * fail to lock it.
> > >>>> +                        */
> > >>>> +                       err = 0;
> > >>>
> > >>> Returning 0 in this situation therefore wouldn't be correct.
> > >>>
> > >>> AFAIU since we started with attached vma above, it's not possible that
> > >>> the refcount_sub_and_test here will drop the refcnt to zero. We could
> > >>> just WARN_ON_ONCE() on the result (in a way to make also the
> > >>> __must_check happy) and then can return err below.
> > >>
> > >> But how do we know that we started with an attached VMA?  Maybe the VMA
> > >> was in the process of being detached and still has readers?
> > >
> > > So we're talking about:
> > >
> > > vma_mark_deteched()
> > > -> refcount_dec_and_test() [ ref count is zero ]
> > > -> __vma_enter_locked()
> >
> > I think it's refcount is NOT zero to continue with __vma_enter_locked().
>
> Yup sorry, misread the ! clearly...
>
> Which makes a lot more sense when it comes to picking up spurious reader
> refcount increases :)
>
> >
> > > (meanwhile...)
> > >
> > > -> reader attempts to read
> > >   -> optimistic check doesn't successfully find write locked VMA
> > >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> > >     (??? how)
> >
> > That shouldn't be possible, yeah. But per above, it's actually not zero.
>
> Yup so this just makes it more likely to happen...
>
> >
> > > (back to vma_mark_attached() -> __vma_enter_locked())
> >
> > Back to _attached()? but it's _detached() above? You mean _detached()
> > right? Just to be sure
>
> Yup, I typed this all a bit too quick...
>
> >
> > > -> refcount_add_not_zero() returns true
> >
> > Ack.
> >
> > > [ process gets fatal signal ]
> > >
> > > -> rcuwait_wait_event() errors out
> > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> >
> > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> > so this path can't error due to the fatal signal.
>
> Right good point.
>
> I hate that we make this so 'gosh darned' implicit.
>
> We are now assuming that:
>
> 1. the only way that RCU wait can fail is due to pending fatal signal
> 2. and that we're fine here because it's uninterruptible.
>
> I mean very doubtful we'll ever change that but it's still gross.
>
> And as Willy says we're paving the road with good intent^Wlandmines.
>
> >
> > > Correct me if the above is wrong.
>
> Yeah I was wrong thankfully :)
>
> The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.

I went through different scenaros and I think the race Lorenzo
described would look something like this:

READER             WRITER
//recnt=1 (attached, no readers, not write-locked)
vma_start_read()
//vma->vm_lock_seq != mm->mm_lock_seq
                           vma_start_write()
                             __vma_enter_locked(TASK_INTERRUPTIBLE)
                               refcount_add_not_zero(VMA_LOCK_OFFSET)
//refcnt=1+VMA_LOCK_OFFSET
                             WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
//vma->vm_lock_seq == mm->mm_lock_seq
                             __vma_exit_locked()
                                 refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=1
  __refcount_inc_not_zero_limited_acquire()
//refcnt = 2
                           vma_mark_detached()
                             if (!refcount_dec_and_test())
//refcnt=1
                               __vma_enter_locked(TASK_UNINTERRUPTIBLE)
                                 if (refcount_add_not_zero(VMA_LOCK_OFFSET))
//refcnt=1+VMA_LOCK_OFFSET
                                   rcuwait_wait_event(TASK_UNINTERRUPTIBLE)
  if (vma->vm_lock_seq == mm->mm_lock_seq)
    vma_refcount_put(vma);
      __refcount_dec_and_test()
//refcnt=VMA_LOCK_OFFSET
      rcuwait_wake_up()
                           __vma_exit_locked()
                             refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=0 (detached)

This seems to be fine with vma_mark_detached() using
TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to
use TASK_INTERRUPTIBLE I think we need to handle the possible error
from __vma_enter_locked() inside vma_mark_detached() and allow for the
fact that refcnt can drop to 0 after the wait.


>
> > >
> > > I mean is any of this actually possible...?
> > >
> > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> > >
> > > (Again I wonder why we made our lives so difficult here)
> > >
> > > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > > for the readers to go away, and our reader is about to go away anyway, but the
> > > process has a fatal signal so do we even care?
> >
> > Yeah I guess it's for the best to keep vma_mark_detached() use the
> > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
> > would be counter productive.
> >
> > > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > > happens...
> >
> > Not for this path, but for vma_start_write_killable -> __vma_start_write
> > -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't
>
> Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can
> add a comment.
>
> > trigger, but since we need to check result of the
> > refcount_sub_and_test() anyway, we might as well WARN_ON it.
>
> Probably it can't no.
>
> >
> > > OK just going to reattach... my head which just exploded from the above :P
> > >
> > > Cheers, Lorenzo
> >
> >
>
> Thanks, Lorenzo


  reply	other threads:[~2025-11-26 18:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-26  3:42 Matthew Wilcox (Oracle)
2025-11-26  4:28 ` Suren Baghdasaryan
2025-11-26 14:26   ` Suren Baghdasaryan
2025-11-26 14:40     ` Vlastimil Babka
2025-11-26 15:01     ` Matthew Wilcox
2025-11-26 14:36   ` Vlastimil Babka
2025-11-26 15:02     ` Lorenzo Stoakes
2025-11-26 15:05     ` Matthew Wilcox
2025-11-26 15:20       ` Lorenzo Stoakes
2025-11-26 15:49         ` Suren Baghdasaryan
2025-11-26 16:00           ` Lorenzo Stoakes
2025-11-26 16:11             ` Suren Baghdasaryan
2025-11-26 16:04         ` Vlastimil Babka
2025-11-26 16:06           ` Matthew Wilcox
2025-11-26 16:18           ` Lorenzo Stoakes
2025-11-26 18:06             ` Suren Baghdasaryan [this message]
2025-11-26 18:11               ` Lorenzo Stoakes
2025-11-26 15:53       ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpFMt6P2cjgb7xfWVz5fhgK67=TQod8HZfJknLYSvZ5EgA@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox