From: Suren Baghdasaryan <surenb@google.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
Matthew Wilcox <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org,
syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com,
"Liam R. Howlett" <Liam.Howlett@oracle.com>
Subject: Re: [PATCH] mm: fix vma_start_write_killable() signal handling
Date: Wed, 26 Nov 2025 10:06:28 -0800 [thread overview]
Message-ID: <CAJuCfpFMt6P2cjgb7xfWVz5fhgK67=TQod8HZfJknLYSvZ5EgA@mail.gmail.com> (raw)
In-Reply-To: <a03aa1c8-b54a-4ac6-82f1-40a06dcd7150@lucifer.local>
On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
> >
> >
> > On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > >>>>
> > >>>> Doh! This is embarassing...
> > >>>
> > >>> Hand-rolled synchronization primitives are wonderful, aren't they?
> > >>
> > >> That's why I liked the original approach of just using rwsems. I
> > >> mst admit to having not paid attention to this recently so I don't
> > >> know what motivated the change.
> > >>
> > >>>> Wait, why do we consider this as a successful acquisition? The
> > >>>> vm_refcnt is 0, so this is similar situation to an earlier:
> > >>>>
> > >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > >>>> return 0;
> > >>>
> > >>> But this means "vma is not attached" not "we failed to lock it".
> > >>>
> > >>>> IOW, the vma is not referenced, so we failed to lock it. I think the
> > >>>> fix should be:
> > >>>>
> > >>>> if (err) {
> > >>>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > >>>> + /* Oh cobblers. While we got a fatal signal, we
> > >>>> + * raced with the last user. VMA is not referenced,
> > >>>> + * fail to lock it.
> > >>>> + */
> > >>>> + err = 0;
> > >>>
> > >>> Returning 0 in this situation therefore wouldn't be correct.
> > >>>
> > >>> AFAIU since we started with attached vma above, it's not possible that
> > >>> the refcount_sub_and_test here will drop the refcnt to zero. We could
> > >>> just WARN_ON_ONCE() on the result (in a way to make also the
> > >>> __must_check happy) and then can return err below.
> > >>
> > >> But how do we know that we started with an attached VMA? Maybe the VMA
> > >> was in the process of being detached and still has readers?
> > >
> > > So we're talking about:
> > >
> > > vma_mark_deteched()
> > > -> refcount_dec_and_test() [ ref count is zero ]
> > > -> __vma_enter_locked()
> >
> > I think it's refcount is NOT zero to continue with __vma_enter_locked().
>
> Yup sorry, misread the ! clearly...
>
> Which makes a lot more sense when it comes to picking up spurious reader
> refcount increases :)
>
> >
> > > (meanwhile...)
> > >
> > > -> reader attempts to read
> > > -> optimistic check doesn't successfully find write locked VMA
> > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> > > (??? how)
> >
> > That shouldn't be possible, yeah. But per above, it's actually not zero.
>
> Yup so this just makes it more likely to happen...
>
> >
> > > (back to vma_mark_attached() -> __vma_enter_locked())
> >
> > Back to _attached()? but it's _detached() above? You mean _detached()
> > right? Just to be sure
>
> Yup, I typed this all a bit too quick...
>
> >
> > > -> refcount_add_not_zero() returns true
> >
> > Ack.
> >
> > > [ process gets fatal signal ]
> > >
> > > -> rcuwait_wait_event() errors out
> > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> >
> > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> > so this path can't error due to the fatal signal.
>
> Right good point.
>
> I hate that we make this so 'gosh darned' implicit.
>
> We are now assuming that:
>
> 1. the only way that RCU wait can fail is due to pending fatal signal
> 2. and that we're fine here because it's uninterruptible.
>
> I mean very doubtful we'll ever change that but it's still gross.
>
> And as Willy says we're paving the road with good intent^Wlandmines.
>
> >
> > > Correct me if the above is wrong.
>
> Yeah I was wrong thankfully :)
>
> The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.
I went through different scenaros and I think the race Lorenzo
described would look something like this:
READER WRITER
//recnt=1 (attached, no readers, not write-locked)
vma_start_read()
//vma->vm_lock_seq != mm->mm_lock_seq
vma_start_write()
__vma_enter_locked(TASK_INTERRUPTIBLE)
refcount_add_not_zero(VMA_LOCK_OFFSET)
//refcnt=1+VMA_LOCK_OFFSET
WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
//vma->vm_lock_seq == mm->mm_lock_seq
__vma_exit_locked()
refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=1
__refcount_inc_not_zero_limited_acquire()
//refcnt = 2
vma_mark_detached()
if (!refcount_dec_and_test())
//refcnt=1
__vma_enter_locked(TASK_UNINTERRUPTIBLE)
if (refcount_add_not_zero(VMA_LOCK_OFFSET))
//refcnt=1+VMA_LOCK_OFFSET
rcuwait_wait_event(TASK_UNINTERRUPTIBLE)
if (vma->vm_lock_seq == mm->mm_lock_seq)
vma_refcount_put(vma);
__refcount_dec_and_test()
//refcnt=VMA_LOCK_OFFSET
rcuwait_wake_up()
__vma_exit_locked()
refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=0 (detached)
This seems to be fine with vma_mark_detached() using
TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to
use TASK_INTERRUPTIBLE I think we need to handle the possible error
from __vma_enter_locked() inside vma_mark_detached() and allow for the
fact that refcnt can drop to 0 after the wait.
>
> > >
> > > I mean is any of this actually possible...?
> > >
> > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> > >
> > > (Again I wonder why we made our lives so difficult here)
> > >
> > > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > > for the readers to go away, and our reader is about to go away anyway, but the
> > > process has a fatal signal so do we even care?
> >
> > Yeah I guess it's for the best to keep vma_mark_detached() use the
> > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
> > would be counter productive.
> >
> > > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > > happens...
> >
> > Not for this path, but for vma_start_write_killable -> __vma_start_write
> > -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't
>
> Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can
> add a comment.
>
> > trigger, but since we need to check result of the
> > refcount_sub_and_test() anyway, we might as well WARN_ON it.
>
> Probably it can't no.
>
> >
> > > OK just going to reattach... my head which just exploded from the above :P
> > >
> > > Cheers, Lorenzo
> >
> >
>
> Thanks, Lorenzo
next prev parent reply other threads:[~2025-11-26 18:06 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-26 3:42 Matthew Wilcox (Oracle)
2025-11-26 4:28 ` Suren Baghdasaryan
2025-11-26 14:26 ` Suren Baghdasaryan
2025-11-26 14:40 ` Vlastimil Babka
2025-11-26 15:01 ` Matthew Wilcox
2025-11-26 14:36 ` Vlastimil Babka
2025-11-26 15:02 ` Lorenzo Stoakes
2025-11-26 15:05 ` Matthew Wilcox
2025-11-26 15:20 ` Lorenzo Stoakes
2025-11-26 15:49 ` Suren Baghdasaryan
2025-11-26 16:00 ` Lorenzo Stoakes
2025-11-26 16:11 ` Suren Baghdasaryan
2025-11-26 16:04 ` Vlastimil Babka
2025-11-26 16:06 ` Matthew Wilcox
2025-11-26 16:18 ` Lorenzo Stoakes
2025-11-26 18:06 ` Suren Baghdasaryan [this message]
2025-11-26 18:11 ` Lorenzo Stoakes
2025-11-26 15:53 ` Vlastimil Babka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpFMt6P2cjgb7xfWVz5fhgK67=TQod8HZfJknLYSvZ5EgA@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox