[PATCH] mm: fix vma_start_write_killable() signal handling

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm: fix vma_start_write_killable() signal handling
@ 2025-11-26  3:42 Matthew Wilcox (Oracle)
  2025-11-26  4:28 ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-26  3:42 UTC (permalink / raw)
  To: Andrew Morton, linux-mm
  Cc: Matthew Wilcox (Oracle),
	syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett,
	Vlastimil Babka, Lorenzo Stoakes

If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
that is that we might be the last reference.  If that happens, fix the
refcount to look like we weren't interrupted by a fatal signal.

Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
Andrew, since the vma_start_write_killable() patch is in mm-stable,
I don't think you can put this in as a fixup, right?

Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
Any other stupid thing I've done?  And am I doing the right thing
with refcount_set()?

 mm/mmap_lock.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index e6e5570d1ec7..71af7f0a5fe1 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
 	if (err) {
+		if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+			/* Oh cobblers.  While we got a fatal signal, we
+			 * raced with the last user.  Pretend we didn't notice
+			 * the signal
+			 */
+			refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
+			goto acquired;
+		}
 		rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
 		return err;
 	}
+acquired:
 	lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
 
 	return 1;
-- 
2.47.2



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26  3:42 [PATCH] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle)
@ 2025-11-26  4:28 ` Suren Baghdasaryan
  2025-11-26 14:26   ` Suren Baghdasaryan
  2025-11-26 14:36   ` Vlastimil Babka
  0 siblings, 2 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-11-26  4:28 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes

On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
> that is that we might be the last reference.  If that happens, fix the
> refcount to look like we weren't interrupted by a fatal signal.
>
> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Suren Baghdasaryan <surenb@google.com>
> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> Andrew, since the vma_start_write_killable() patch is in mm-stable,
> I don't think you can put this in as a fixup, right?
>
> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.

Doh! This is embarassing...

> Any other stupid thing I've done?  And am I doing the right thing
> with refcount_set()?
>
>  mm/mmap_lock.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index e6e5570d1ec7..71af7f0a5fe1 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
>         if (err) {
> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                       /* Oh cobblers.  While we got a fatal signal, we
> +                        * raced with the last user.  Pretend we didn't notice
> +                        * the signal
> +                        */
> +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
> +                       goto acquired;

Wait, why do we consider this as a successful acquisition? The
vm_refcnt is 0, so this is similar situation to an earlier:

if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
        return 0;

IOW, the vma is not referenced, so we failed to lock it. I think the
fix should be:

        if (err) {
+               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+                       /* Oh cobblers.  While we got a fatal signal, we
+                        * raced with the last user.  VMA is not referenced,
+                        * fail to lock it.
+                        */
+                       err = 0;
+               }
                rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
                return err;
        }


> +               }
>                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>                 return err;
>         }
> +acquired:
>         lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
>
>         return 1;
> --
> 2.47.2
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26  4:28 ` Suren Baghdasaryan
@ 2025-11-26 14:26   ` Suren Baghdasaryan
  2025-11-26 14:40     ` Vlastimil Babka
  2025-11-26 15:01     ` Matthew Wilcox
  2025-11-26 14:36   ` Vlastimil Babka
  1 sibling, 2 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-11-26 14:26 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes

On Tue, Nov 25, 2025 at 8:28 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
> > that is that we might be the last reference.  If that happens, fix the
> > refcount to look like we weren't interrupted by a fatal signal.
> >
> > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
> > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Suren Baghdasaryan <surenb@google.com>
> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> > Andrew, since the vma_start_write_killable() patch is in mm-stable,
> > I don't think you can put this in as a fixup, right?
> >
> > Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
>
> Doh! This is embarassing...
>
> > Any other stupid thing I've done?  And am I doing the right thing
> > with refcount_set()?
> >
> >  mm/mmap_lock.c | 9 +++++++++
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index e6e5570d1ec7..71af7f0a5fe1 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> >                    state);
> >         if (err) {
> > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > +                       /* Oh cobblers.  While we got a fatal signal, we
> > +                        * raced with the last user.  Pretend we didn't notice
> > +                        * the signal
> > +                        */
> > +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
> > +                       goto acquired;
>
> Wait, why do we consider this as a successful acquisition? The
> vm_refcnt is 0, so this is similar situation to an earlier:
>
> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
>         return 0;
>
> IOW, the vma is not referenced, so we failed to lock it. I think the
> fix should be:

After sleeping on it, I don't think we should be masking EINTR error.
__vma_enter_locked() result might be the only place where an outer
loop is checking for fatal signals, so returning "failure to lock"
instead of -EINTR might cause the loop to continue. I think this fix
would be better:

          * If vma is detached then only vma_mark_attached() can raise the
          * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
          */
-        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
+        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+                if (fatal_signal_pending(current))
+                        return -EINTR;
                 return 0;
+        }

         rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
                    state);
         if (err) {
+                if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+                        /*
+                         * No more users but fatal signal is present,
+                         * still return the error.
+                         */
+                }
                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
                 return err;
         }


>
>         if (err) {
> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                       /* Oh cobblers.  While we got a fatal signal, we
> +                        * raced with the last user.  VMA is not referenced,
> +                        * fail to lock it.
> +                        */
> +                       err = 0;
> +               }
>                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>                 return err;
>         }
>
>
> > +               }
> >                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> >                 return err;
> >         }
> > +acquired:
> >         lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> >
> >         return 1;
> > --
> > 2.47.2
> >


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 14:26   ` Suren Baghdasaryan
@ 2025-11-26 14:40     ` Vlastimil Babka
  2025-11-26 15:01     ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-11-26 14:40 UTC (permalink / raw)
  To: Suren Baghdasaryan, Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett, Lorenzo Stoakes

On 11/26/25 3:26 PM, Suren Baghdasaryan wrote:
> On Tue, Nov 25, 2025 at 8:28 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle)
>> <willy@infradead.org> wrote:
>>>
>>> If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
>>> that is that we might be the last reference.  If that happens, fix the
>>> refcount to look like we weren't interrupted by a fatal signal.
>>>
>>> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
>>> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>>> Cc: Suren Baghdasaryan <surenb@google.com>
>>> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>>> ---
>>> Andrew, since the vma_start_write_killable() patch is in mm-stable,
>>> I don't think you can put this in as a fixup, right?
>>>
>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
>>
>> Doh! This is embarassing...
>>
>>> Any other stupid thing I've done?  And am I doing the right thing
>>> with refcount_set()?
>>>
>>>  mm/mmap_lock.c | 9 +++++++++
>>>  1 file changed, 9 insertions(+)
>>>
>>> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
>>> index e6e5570d1ec7..71af7f0a5fe1 100644
>>> --- a/mm/mmap_lock.c
>>> +++ b/mm/mmap_lock.c
>>> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>>>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>>>                    state);
>>>         if (err) {
>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
>>> +                       /* Oh cobblers.  While we got a fatal signal, we
>>> +                        * raced with the last user.  Pretend we didn't notice
>>> +                        * the signal
>>> +                        */
>>> +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
>>> +                       goto acquired;
>>
>> Wait, why do we consider this as a successful acquisition? The
>> vm_refcnt is 0, so this is similar situation to an earlier:
>>
>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
>>         return 0;
>>
>> IOW, the vma is not referenced, so we failed to lock it. I think the
>> fix should be:
> 
> After sleeping on it, I don't think we should be masking EINTR error.
> __vma_enter_locked() result might be the only place where an outer
> loop is checking for fatal signals, so returning "failure to lock"
> instead of -EINTR might cause the loop to continue. I think this fix
> would be better:
> 
>           * If vma is detached then only vma_mark_attached() can raise the
>           * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
>           */
> -        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> +        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                if (fatal_signal_pending(current))
> +                        return -EINTR;

Not necessary IMHO. It could become pending right after we return 0, no
point in testing it here. We should only care about the signals in the
part where we sleep, i.e. rcuwait_wait_event().

>                  return 0;
> +        }
> 
>          rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
>          err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>                     refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                     state);
>          if (err) {
> +                if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                        /*
> +                         * No more users but fatal signal is present,
> +                         * still return the error.
> +                         */
> +                }

Like I said in the other reply, the "no more users" should not happen,
so just WARN_ON_ONCE() on the result, but otherwise agreed.

>                  rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>                  return err;
>          }
> 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 14:26   ` Suren Baghdasaryan
  2025-11-26 14:40     ` Vlastimil Babka
@ 2025-11-26 15:01     ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2025-11-26 15:01 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes

On Wed, Nov 26, 2025 at 06:26:26AM -0800, Suren Baghdasaryan wrote:
> > >         if (err) {
> > > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > +                       /* Oh cobblers.  While we got a fatal signal, we
> > > +                        * raced with the last user.  Pretend we didn't notice
> > > +                        * the signal
> > > +                        */
> > > +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
> > > +                       goto acquired;
> >
> > Wait, why do we consider this as a successful acquisition? The
> > vm_refcnt is 0, so this is similar situation to an earlier:
> 
> > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> >         return 0;
> >
> > IOW, the vma is not referenced, so we failed to lock it. I think the
> > fix should be:

Yes, I also wondered about doing it that way.  Of course, we hold the
write lock, and we found the VMA, so I don't think this can actually
occur?

> After sleeping on it, I don't think we should be masking EINTR error.
> __vma_enter_locked() result might be the only place where an outer
> loop is checking for fatal signals, so returning "failure to lock"
> instead of -EINTR might cause the loop to continue. I think this fix
> would be better:
> 
>           * If vma is detached then only vma_mark_attached() can raise the
>           * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
>           */
> -        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> +        if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                if (fatal_signal_pending(current))
> +                        return -EINTR;

This part seems too much to me.  We don't need to check for signals
often, just when we'd otherwise sleep.

>          rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
>          err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>                     refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                     state);
>          if (err) {
> +                if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                        /*
> +                         * No more users but fatal signal is present,
> +                         * still return the error.
> +                         */
> +                }

Umm.  Does the last owner of a vm_refcnt not need to do anything like
free the vma?

>                  rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>                  return err;
>          }


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26  4:28 ` Suren Baghdasaryan
  2025-11-26 14:26   ` Suren Baghdasaryan
@ 2025-11-26 14:36   ` Vlastimil Babka
  2025-11-26 15:02     ` Lorenzo Stoakes
  2025-11-26 15:05     ` Matthew Wilcox
  1 sibling, 2 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-11-26 14:36 UTC (permalink / raw)
  To: Suren Baghdasaryan, Matthew Wilcox (Oracle)
  Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett, Lorenzo Stoakes

On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
>>
>> If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
>> that is that we might be the last reference.  If that happens, fix the
>> refcount to look like we weren't interrupted by a fatal signal.
>>
>> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
>> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>> Cc: Suren Baghdasaryan <surenb@google.com>
>> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> ---
>> Andrew, since the vma_start_write_killable() patch is in mm-stable,
>> I don't think you can put this in as a fixup, right?
>>
>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> 
> Doh! This is embarassing...

Hand-rolled synchronization primitives are wonderful, aren't they?

>> Any other stupid thing I've done?  And am I doing the right thing
>> with refcount_set()?

I think it's not wrong, but because it's dead code anyway. So it's
unnecessary.

>>  mm/mmap_lock.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
>> index e6e5570d1ec7..71af7f0a5fe1 100644
>> --- a/mm/mmap_lock.c
>> +++ b/mm/mmap_lock.c
>> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>>                    state);
>>         if (err) {
>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
>> +                       /* Oh cobblers.  While we got a fatal signal, we
>> +                        * raced with the last user.  Pretend we didn't notice
>> +                        * the signal
>> +                        */
>> +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
>> +                       goto acquired;
> 
> Wait, why do we consider this as a successful acquisition? The
> vm_refcnt is 0, so this is similar situation to an earlier:
> 
> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
>         return 0;

But this means "vma is not attached" not "we failed to lock it".

> IOW, the vma is not referenced, so we failed to lock it. I think the
> fix should be:
> 
>         if (err) {
> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +                       /* Oh cobblers.  While we got a fatal signal, we
> +                        * raced with the last user.  VMA is not referenced,
> +                        * fail to lock it.
> +                        */
> +                       err = 0;

Returning 0 in this situation therefore wouldn't be correct.

AFAIU since we started with attached vma above, it's not possible that
the refcount_sub_and_test here will drop the refcnt to zero. We could
just WARN_ON_ONCE() on the result (in a way to make also the
__must_check happy) and then can return err below.

> +               }
>                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>                 return err;
>         }
> 
> 
>> +               }
>>                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>>                 return err;
>>         }
>> +acquired:
>>         lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
>>
>>         return 1;
>> --
>> 2.47.2
>>



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 14:36   ` Vlastimil Babka
@ 2025-11-26 15:02     ` Lorenzo Stoakes
  2025-11-26 15:05     ` Matthew Wilcox
  1 sibling, 0 replies; 18+ messages in thread
From: Lorenzo Stoakes @ 2025-11-26 15:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Matthew Wilcox (Oracle),
	Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8,
	Liam R. Howlett

On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle)
> > <willy@infradead.org> wrote:
> >>
> >> If we get a signal, we need to restore the vm_refcnt.  The wrinkle in
> >> that is that we might be the last reference.  If that happens, fix the
> >> refcount to look like we weren't interrupted by a fatal signal.
> >>
> >> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
> >> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> >> Cc: Suren Baghdasaryan <surenb@google.com>
> >> Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
> >> Cc: Vlastimil Babka <vbabka@suse.cz>
> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >> ---
> >> Andrew, since the vma_start_write_killable() patch is in mm-stable,
> >> I don't think you can put this in as a fixup, right?
> >>
> >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> >
> > Doh! This is embarassing...
>
> Hand-rolled synchronization primitives are wonderful, aren't they?

I do wonder why we reworked the VMA locks again, I understood the previous
iteration :)

Not sure we fully justified that. But it's water under the bridge now.

>
> >> Any other stupid thing I've done?  And am I doing the right thing
> >> with refcount_set()?
>
> I think it's not wrong, but because it's dead code anyway. So it's
> unnecessary.
>
> >>  mm/mmap_lock.c | 9 +++++++++
> >>  1 file changed, 9 insertions(+)
> >>
> >> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> >> index e6e5570d1ec7..71af7f0a5fe1 100644
> >> --- a/mm/mmap_lock.c
> >> +++ b/mm/mmap_lock.c
> >> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> >>                    state);
> >>         if (err) {

OK so the sequence of events as I undersatnd it are:

- VMA_LOCK_OFFSET added to vma->vm_refcnt (assuming not zero)
- we try to wait on vma->vm_mm->vma_writer_wait

- this fails because ___rcuwait_wait_event() calls signal_pending_state()
  which (if task is in interruptible mode) indicates a fatal signal, so
  -EINTR is returned.
- Oopsies we need to subtract VMA_LOCK_OFFSET before we're done.
- If the resulting reference count is zero, i.e. we held the last
  reference, we then... take the lock anyway?

> >> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> >> +                       /* Oh cobblers.  While we got a fatal signal, we

Well for one this comment is broken ;)

Plus 'cobblers'... when 'Gosh darn it' was there for the taking...

> >> +                        * raced with the last user.  Pretend we didn't notice
> >> +                        * the signal
> >> +                        */

I don't love that we just assume that rcu_wait_event() can and forever more
only error out on fatal signal.

I mean I guess probably it can but... maybe worth rewording to say we
failed to wait so have to subtract but raced and probably if -EINTR this
means...

> >> +                       refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
> >> +                       goto acquired;
> >
> > Wait, why do we consider this as a successful acquisition? The
> > vm_refcnt is 0, so this is similar situation to an earlier:

Right yeah, same question?

> >
> > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> >         return 0;
>
> But this means "vma is not attached" not "we failed to lock it".

Yup as vma_mark_detached() always takes the last refcount afaict.

>
> > IOW, the vma is not referenced, so we failed to lock it. I think the
> > fix should be:
> >
> >         if (err) {
> > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > +                       /* Oh cobblers.  While we got a fatal signal, we
> > +                        * raced with the last user.  VMA is not referenced,
> > +                        * fail to lock it.
> > +                        */
> > +                       err = 0;
>
> Returning 0 in this situation therefore wouldn't be correct.
>
> AFAIU since we started with attached vma above, it's not possible that
> the refcount_sub_and_test here will drop the refcnt to zero. We could
> just WARN_ON_ONCE() on the result (in a way to make also the
> __must_check happy) and then can return err below.

Yeah we are only detaching in the case that we _know_ we are the only
writer right? So nobody else should be trying to get the write lock and
failing like this?

So maybe the 'wrinkle' as you say should be a WARN_ON()... or a
VM_WARN_ON() since this 'should never happen' (famous last words TM)?

>
> > +               }
> >                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> >                 return err;
> >         }
> >
> >
> >> +               }
> >>                 rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> >>                 return err;
> >>         }
> >> +acquired:
> >>         lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> >>
> >>         return 1;
> >> --
> >> 2.47.2
> >>
>
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 14:36   ` Vlastimil Babka
  2025-11-26 15:02     ` Lorenzo Stoakes
@ 2025-11-26 15:05     ` Matthew Wilcox
  2025-11-26 15:20       ` Lorenzo Stoakes
  2025-11-26 15:53       ` Vlastimil Babka
  1 sibling, 2 replies; 18+ messages in thread
From: Matthew Wilcox @ 2025-11-26 15:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes

On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > 
> > Doh! This is embarassing...
> 
> Hand-rolled synchronization primitives are wonderful, aren't they?

That's why I liked the original approach of just using rwsems.  I
mst admit to having not paid attention to this recently so I don't
know what motivated the change.

> > Wait, why do we consider this as a successful acquisition? The
> > vm_refcnt is 0, so this is similar situation to an earlier:
> > 
> > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> >         return 0;
> 
> But this means "vma is not attached" not "we failed to lock it".
> 
> > IOW, the vma is not referenced, so we failed to lock it. I think the
> > fix should be:
> > 
> >         if (err) {
> > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > +                       /* Oh cobblers.  While we got a fatal signal, we
> > +                        * raced with the last user.  VMA is not referenced,
> > +                        * fail to lock it.
> > +                        */
> > +                       err = 0;
> 
> Returning 0 in this situation therefore wouldn't be correct.
> 
> AFAIU since we started with attached vma above, it's not possible that
> the refcount_sub_and_test here will drop the refcnt to zero. We could
> just WARN_ON_ONCE() on the result (in a way to make also the
> __must_check happy) and then can return err below.

But how do we know that we started with an attached VMA?  Maybe the VMA
was in the process of being detached and still has readers?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 15:05     ` Matthew Wilcox
@ 2025-11-26 15:20       ` Lorenzo Stoakes
  2025-11-26 15:49         ` Suren Baghdasaryan
  2025-11-26 16:04         ` Vlastimil Babka
  2025-11-26 15:53       ` Vlastimil Babka
  1 sibling, 2 replies; 18+ messages in thread
From: Lorenzo Stoakes @ 2025-11-26 15:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Vlastimil Babka, Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > >
> > > Doh! This is embarassing...
> >
> > Hand-rolled synchronization primitives are wonderful, aren't they?
>
> That's why I liked the original approach of just using rwsems.  I
> mst admit to having not paid attention to this recently so I don't
> know what motivated the change.
>
> > > Wait, why do we consider this as a successful acquisition? The
> > > vm_refcnt is 0, so this is similar situation to an earlier:
> > >
> > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > >         return 0;
> >
> > But this means "vma is not attached" not "we failed to lock it".
> >
> > > IOW, the vma is not referenced, so we failed to lock it. I think the
> > > fix should be:
> > >
> > >         if (err) {
> > > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > +                       /* Oh cobblers.  While we got a fatal signal, we
> > > +                        * raced with the last user.  VMA is not referenced,
> > > +                        * fail to lock it.
> > > +                        */
> > > +                       err = 0;
> >
> > Returning 0 in this situation therefore wouldn't be correct.
> >
> > AFAIU since we started with attached vma above, it's not possible that
> > the refcount_sub_and_test here will drop the refcnt to zero. We could
> > just WARN_ON_ONCE() on the result (in a way to make also the
> > __must_check happy) and then can return err below.
>
> But how do we know that we started with an attached VMA?  Maybe the VMA
> was in the process of being detached and still has readers?

So we're talking about:

vma_mark_deteched()
-> refcount_dec_and_test() [ ref count is zero ]
-> __vma_enter_locked()

(meanwhile...)

-> reader attempts to read
  -> optimistic check doesn't successfully find write locked VMA
  -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
    (??? how)

(back to vma_mark_attached() -> __vma_enter_locked())

-> refcount_add_not_zero() returns true

[ process gets fatal signal ]

-> rcuwait_wait_event() errors out
-> oopsies need to do something, maybe [VM_]WARN_ON() not right?

Correct me if the above is wrong.

I mean is any of this actually possible...?

Seems dubious. But I guess right now we assume it _is_ possible. What a mess!

(Again I wonder why we made our lives so difficult here)

Anyway even if we are midway through a detach, the detach is ostensibly waiting
for the readers to go away, and our reader is about to go away anyway, but the
process has a fatal signal so do we even care?

I actually wonder if a WARN_ON() is warranted to see if this even ever
happens...

OK just going to reattach... my head which just exploded from the above :P

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 15:20       ` Lorenzo Stoakes
@ 2025-11-26 15:49         ` Suren Baghdasaryan
  2025-11-26 16:00           ` Lorenzo Stoakes
  2025-11-26 16:04         ` Vlastimil Babka
  1 sibling, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-11-26 15:49 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > > >
> > > > Doh! This is embarassing...
> > >
> > > Hand-rolled synchronization primitives are wonderful, aren't they?
> >
> > That's why I liked the original approach of just using rwsems.  I
> > mst admit to having not paid attention to this recently so I don't
> > know what motivated the change.

That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs
which improved the performance of allocating these structs during
calls like mmap and fork.

> >
> > > > Wait, why do we consider this as a successful acquisition? The
> > > > vm_refcnt is 0, so this is similar situation to an earlier:
> > > >
> > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > > >         return 0;
> > >
> > > But this means "vma is not attached" not "we failed to lock it".
> > >
> > > > IOW, the vma is not referenced, so we failed to lock it. I think the
> > > > fix should be:
> > > >
> > > >         if (err) {
> > > > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > > +                       /* Oh cobblers.  While we got a fatal signal, we
> > > > +                        * raced with the last user.  VMA is not referenced,
> > > > +                        * fail to lock it.
> > > > +                        */
> > > > +                       err = 0;
> > >
> > > Returning 0 in this situation therefore wouldn't be correct.
> > >
> > > AFAIU since we started with attached vma above, it's not possible that
> > > the refcount_sub_and_test here will drop the refcnt to zero. We could
> > > just WARN_ON_ONCE() on the result (in a way to make also the
> > > __must_check happy) and then can return err below.
> >
> > But how do we know that we started with an attached VMA?  Maybe the VMA
> > was in the process of being detached and still has readers?
>
> So we're talking about:
>
> vma_mark_deteched()
> -> refcount_dec_and_test() [ ref count is zero ]
> -> __vma_enter_locked()
>
> (meanwhile...)
>
> -> reader attempts to read
>   -> optimistic check doesn't successfully find write locked VMA
>   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
>     (??? how)
>
> (back to vma_mark_attached() -> __vma_enter_locked())
>
> -> refcount_add_not_zero() returns true
>
> [ process gets fatal signal ]
>
> -> rcuwait_wait_event() errors out
> -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
>
> Correct me if the above is wrong.
>
> I mean is any of this actually possible...?
>
> Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
>
> (Again I wonder why we made our lives so difficult here)
>
> Anyway even if we are midway through a detach, the detach is ostensibly waiting
> for the readers to go away, and our reader is about to go away anyway, but the
> process has a fatal signal so do we even care?
>
> I actually wonder if a WARN_ON() is warranted to see if this even ever
> happens...
>
> OK just going to reattach... my head which just exploded from the above :P

We are overthinking this. Vlastimil is right. If we entered with an
attached VMA (which is the case due to this check:
https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60)
then the only function that can detach the VMA from under us is
vma_mark_detached() but that function can't race with
__vma_enter_locked() because both functions are required to hold
mmap_write_lock. vma_mark_detached() has an explicit
vma_assert_write_locked(vma) check (which implies mmap_write_lock) and
vma_start_write() calls __is_vma_write_locked() which does
mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an
mmap_assert_write_locked(vma->vm_mm); at the beginning of the
__vma_enter_locked() to make that obvious.

> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 15:49         ` Suren Baghdasaryan
@ 2025-11-26 16:00           ` Lorenzo Stoakes
  2025-11-26 16:11             ` Suren Baghdasaryan
  0 siblings, 1 reply; 18+ messages in thread
From: Lorenzo Stoakes @ 2025-11-26 16:00 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 07:49:13AM -0800, Suren Baghdasaryan wrote:
> On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > > > >
> > > > > Doh! This is embarassing...
> > > >
> > > > Hand-rolled synchronization primitives are wonderful, aren't they?
> > >
> > > That's why I liked the original approach of just using rwsems.  I
> > > mst admit to having not paid attention to this recently so I don't
> > > know what motivated the change.
>
> That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs
> which improved the performance of allocating these structs during
> calls like mmap and fork.

Hmm, we doubled down :)

Hope the wins were worth the complexity, but obviously the horse has bolted
and I should have asked more about it at the time...

>
> > >
> > > > > Wait, why do we consider this as a successful acquisition? The
> > > > > vm_refcnt is 0, so this is similar situation to an earlier:
> > > > >
> > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > > > >         return 0;
> > > >
> > > > But this means "vma is not attached" not "we failed to lock it".
> > > >
> > > > > IOW, the vma is not referenced, so we failed to lock it. I think the
> > > > > fix should be:
> > > > >
> > > > >         if (err) {
> > > > > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > > > +                       /* Oh cobblers.  While we got a fatal signal, we
> > > > > +                        * raced with the last user.  VMA is not referenced,
> > > > > +                        * fail to lock it.
> > > > > +                        */
> > > > > +                       err = 0;
> > > >
> > > > Returning 0 in this situation therefore wouldn't be correct.
> > > >
> > > > AFAIU since we started with attached vma above, it's not possible that
> > > > the refcount_sub_and_test here will drop the refcnt to zero. We could
> > > > just WARN_ON_ONCE() on the result (in a way to make also the
> > > > __must_check happy) and then can return err below.
> > >
> > > But how do we know that we started with an attached VMA?  Maybe the VMA
> > > was in the process of being detached and still has readers?
> >
> > So we're talking about:
> >
> > vma_mark_deteched()
> > -> refcount_dec_and_test() [ ref count is zero ]
> > -> __vma_enter_locked()
> >
> > (meanwhile...)
> >
> > -> reader attempts to read
> >   -> optimistic check doesn't successfully find write locked VMA
> >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> >     (??? how)
> >
> > (back to vma_mark_attached() -> __vma_enter_locked())
> >
> > -> refcount_add_not_zero() returns true
> >
> > [ process gets fatal signal ]
> >
> > -> rcuwait_wait_event() errors out
> > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> >
> > Correct me if the above is wrong.
> >
> > I mean is any of this actually possible...?
> >
> > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> >
> > (Again I wonder why we made our lives so difficult here)
> >
> > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > for the readers to go away, and our reader is about to go away anyway, but the
> > process has a fatal signal so do we even care?
> >
> > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > happens...
> >
> > OK just going to reattach... my head which just exploded from the above :P
>
> We are overthinking this. Vlastimil is right. If we entered with an

Watch Vlastimil frame this... ;)

> attached VMA (which is the case due to this check:
> https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60)
> then the only function that can detach the VMA from under us is
> vma_mark_detached() but that function can't race with

We hold the mmap/VMA write lock yes, but as per the comment in
vma_mark_detached() + above list readers can spuriously increment the
refcount (ostensibly) as per:

		/* Wait until vma is detached with no readers. */

So doesn't that mean we can hit the issue Matthew raised?

Otherwise there would be no need to do this dance in vma_mark_detached()
anyway?

Maybe I'm missing something.

> __vma_enter_locked() because both functions are required to hold
> mmap_write_lock. vma_mark_detached() has an explicit
> vma_assert_write_locked(vma) check (which implies mmap_write_lock) and
> vma_start_write() calls __is_vma_write_locked() which does
> mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an
> mmap_assert_write_locked(vma->vm_mm); at the beginning of the
> __vma_enter_locked() to make that obvious.
>
> > Cheers, Lorenzo
>

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 16:00           ` Lorenzo Stoakes
@ 2025-11-26 16:11             ` Suren Baghdasaryan
  0 siblings, 0 replies; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-11-26 16:11 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 8:01 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 07:49:13AM -0800, Suren Baghdasaryan wrote:
> > On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > > > > >
> > > > > > Doh! This is embarassing...
> > > > >
> > > > > Hand-rolled synchronization primitives are wonderful, aren't they?
> > > >
> > > > That's why I liked the original approach of just using rwsems.  I
> > > > mst admit to having not paid attention to this recently so I don't
> > > > know what motivated the change.
> >
> > That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs
> > which improved the performance of allocating these structs during
> > calls like mmap and fork.
>
> Hmm, we doubled down :)
>
> Hope the wins were worth the complexity, but obviously the horse has bolted
> and I should have asked more about it at the time...
>
> >
> > > >
> > > > > > Wait, why do we consider this as a successful acquisition? The
> > > > > > vm_refcnt is 0, so this is similar situation to an earlier:
> > > > > >
> > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > > > > >         return 0;
> > > > >
> > > > > But this means "vma is not attached" not "we failed to lock it".
> > > > >
> > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the
> > > > > > fix should be:
> > > > > >
> > > > > >         if (err) {
> > > > > > +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > > > > > +                       /* Oh cobblers.  While we got a fatal signal, we
> > > > > > +                        * raced with the last user.  VMA is not referenced,
> > > > > > +                        * fail to lock it.
> > > > > > +                        */
> > > > > > +                       err = 0;
> > > > >
> > > > > Returning 0 in this situation therefore wouldn't be correct.
> > > > >
> > > > > AFAIU since we started with attached vma above, it's not possible that
> > > > > the refcount_sub_and_test here will drop the refcnt to zero. We could
> > > > > just WARN_ON_ONCE() on the result (in a way to make also the
> > > > > __must_check happy) and then can return err below.
> > > >
> > > > But how do we know that we started with an attached VMA?  Maybe the VMA
> > > > was in the process of being detached and still has readers?
> > >
> > > So we're talking about:
> > >
> > > vma_mark_deteched()
> > > -> refcount_dec_and_test() [ ref count is zero ]
> > > -> __vma_enter_locked()
> > >
> > > (meanwhile...)
> > >
> > > -> reader attempts to read
> > >   -> optimistic check doesn't successfully find write locked VMA
> > >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> > >     (??? how)
> > >
> > > (back to vma_mark_attached() -> __vma_enter_locked())
> > >
> > > -> refcount_add_not_zero() returns true
> > >
> > > [ process gets fatal signal ]
> > >
> > > -> rcuwait_wait_event() errors out
> > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> > >
> > > Correct me if the above is wrong.
> > >
> > > I mean is any of this actually possible...?
> > >
> > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> > >
> > > (Again I wonder why we made our lives so difficult here)
> > >
> > > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > > for the readers to go away, and our reader is about to go away anyway, but the
> > > process has a fatal signal so do we even care?
> > >
> > > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > > happens...
> > >
> > > OK just going to reattach... my head which just exploded from the above :P
> >
> > We are overthinking this. Vlastimil is right. If we entered with an
>
> Watch Vlastimil frame this... ;)
>
> > attached VMA (which is the case due to this check:
> > https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60)
> > then the only function that can detach the VMA from under us is
> > vma_mark_detached() but that function can't race with
>
> We hold the mmap/VMA write lock yes, but as per the comment in
> vma_mark_detached() + above list readers can spuriously increment the
> refcount (ostensibly) as per:
>
>                 /* Wait until vma is detached with no readers. */
>
> So doesn't that mean we can hit the issue Matthew raised?
>
> Otherwise there would be no need to do this dance in vma_mark_detached()
> anyway?
>
> Maybe I'm missing something.

No, you are right. I realized after sending the reply this temporary
refcount from the reader can be dropped from under the writer. Was in
the process of analyzing these paths when Vlastimil's reply came in.
I'll spend some more time stairing at it to see if we are missing
anything else.

>
> > __vma_enter_locked() because both functions are required to hold
> > mmap_write_lock. vma_mark_detached() has an explicit
> > vma_assert_write_locked(vma) check (which implies mmap_write_lock) and
> > vma_start_write() calls __is_vma_write_locked() which does
> > mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an
> > mmap_assert_write_locked(vma->vm_mm); at the beginning of the
> > __vma_enter_locked() to make that obvious.
> >
> > > Cheers, Lorenzo
> >
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 15:20       ` Lorenzo Stoakes
  2025-11-26 15:49         ` Suren Baghdasaryan
@ 2025-11-26 16:04         ` Vlastimil Babka
  2025-11-26 16:06           ` Matthew Wilcox
  2025-11-26 16:18           ` Lorenzo Stoakes
  1 sibling, 2 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-11-26 16:04 UTC (permalink / raw)
  To: Lorenzo Stoakes, Matthew Wilcox
  Cc: Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett



On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
>> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
>>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
>>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
>>>>
>>>> Doh! This is embarassing...
>>>
>>> Hand-rolled synchronization primitives are wonderful, aren't they?
>>
>> That's why I liked the original approach of just using rwsems.  I
>> mst admit to having not paid attention to this recently so I don't
>> know what motivated the change.
>>
>>>> Wait, why do we consider this as a successful acquisition? The
>>>> vm_refcnt is 0, so this is similar situation to an earlier:
>>>>
>>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
>>>>         return 0;
>>>
>>> But this means "vma is not attached" not "we failed to lock it".
>>>
>>>> IOW, the vma is not referenced, so we failed to lock it. I think the
>>>> fix should be:
>>>>
>>>>         if (err) {
>>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
>>>> +                       /* Oh cobblers.  While we got a fatal signal, we
>>>> +                        * raced with the last user.  VMA is not referenced,
>>>> +                        * fail to lock it.
>>>> +                        */
>>>> +                       err = 0;
>>>
>>> Returning 0 in this situation therefore wouldn't be correct.
>>>
>>> AFAIU since we started with attached vma above, it's not possible that
>>> the refcount_sub_and_test here will drop the refcnt to zero. We could
>>> just WARN_ON_ONCE() on the result (in a way to make also the
>>> __must_check happy) and then can return err below.
>>
>> But how do we know that we started with an attached VMA?  Maybe the VMA
>> was in the process of being detached and still has readers?
> 
> So we're talking about:
> 
> vma_mark_deteched()
> -> refcount_dec_and_test() [ ref count is zero ]
> -> __vma_enter_locked()

I think it's refcount is NOT zero to continue with __vma_enter_locked().

> (meanwhile...)
> 
> -> reader attempts to read
>   -> optimistic check doesn't successfully find write locked VMA
>   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
>     (??? how)

That shouldn't be possible, yeah. But per above, it's actually not zero.

> (back to vma_mark_attached() -> __vma_enter_locked())

Back to _attached()? but it's _detached() above? You mean _detached()
right? Just to be sure

> -> refcount_add_not_zero() returns true

Ack.

> [ process gets fatal signal ]
> 
> -> rcuwait_wait_event() errors out
> -> oopsies need to do something, maybe [VM_]WARN_ON() not right?

AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
so this path can't error due to the fatal signal.

> Correct me if the above is wrong.
> 
> I mean is any of this actually possible...?
> 
> Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> 
> (Again I wonder why we made our lives so difficult here)
> 
> Anyway even if we are midway through a detach, the detach is ostensibly waiting
> for the readers to go away, and our reader is about to go away anyway, but the
> process has a fatal signal so do we even care?

Yeah I guess it's for the best to keep vma_mark_detached() use the
TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
would be counter productive.

> I actually wonder if a WARN_ON() is warranted to see if this even ever
> happens...

Not for this path, but for vma_start_write_killable -> __vma_start_write
-> __vma_enter_locked(... TASK_KILLABLE). I think it still can't
trigger, but since we need to check result of the
refcount_sub_and_test() anyway, we might as well WARN_ON it.

> OK just going to reattach... my head which just exploded from the above :P
> 
> Cheers, Lorenzo



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 16:04         ` Vlastimil Babka
@ 2025-11-26 16:06           ` Matthew Wilcox
  2025-11-26 16:18           ` Lorenzo Stoakes
  1 sibling, 0 replies; 18+ messages in thread
From: Matthew Wilcox @ 2025-11-26 16:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Lorenzo Stoakes, Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
> > [ process gets fatal signal ]
> > 
> > -> rcuwait_wait_event() errors out
> > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> 
> AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> so this path can't error due to the fatal signal.

Well, today.  That's a nasty little landmine to leave for someone else
to stumble over in the future.



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 16:04         ` Vlastimil Babka
  2025-11-26 16:06           ` Matthew Wilcox
@ 2025-11-26 16:18           ` Lorenzo Stoakes
  2025-11-26 18:06             ` Suren Baghdasaryan
  1 sibling, 1 reply; 18+ messages in thread
From: Lorenzo Stoakes @ 2025-11-26 16:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
>
>
> On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> >>>>
> >>>> Doh! This is embarassing...
> >>>
> >>> Hand-rolled synchronization primitives are wonderful, aren't they?
> >>
> >> That's why I liked the original approach of just using rwsems.  I
> >> mst admit to having not paid attention to this recently so I don't
> >> know what motivated the change.
> >>
> >>>> Wait, why do we consider this as a successful acquisition? The
> >>>> vm_refcnt is 0, so this is similar situation to an earlier:
> >>>>
> >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> >>>>         return 0;
> >>>
> >>> But this means "vma is not attached" not "we failed to lock it".
> >>>
> >>>> IOW, the vma is not referenced, so we failed to lock it. I think the
> >>>> fix should be:
> >>>>
> >>>>         if (err) {
> >>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> >>>> +                       /* Oh cobblers.  While we got a fatal signal, we
> >>>> +                        * raced with the last user.  VMA is not referenced,
> >>>> +                        * fail to lock it.
> >>>> +                        */
> >>>> +                       err = 0;
> >>>
> >>> Returning 0 in this situation therefore wouldn't be correct.
> >>>
> >>> AFAIU since we started with attached vma above, it's not possible that
> >>> the refcount_sub_and_test here will drop the refcnt to zero. We could
> >>> just WARN_ON_ONCE() on the result (in a way to make also the
> >>> __must_check happy) and then can return err below.
> >>
> >> But how do we know that we started with an attached VMA?  Maybe the VMA
> >> was in the process of being detached and still has readers?
> >
> > So we're talking about:
> >
> > vma_mark_deteched()
> > -> refcount_dec_and_test() [ ref count is zero ]
> > -> __vma_enter_locked()
>
> I think it's refcount is NOT zero to continue with __vma_enter_locked().

Yup sorry, misread the ! clearly...

Which makes a lot more sense when it comes to picking up spurious reader
refcount increases :)

>
> > (meanwhile...)
> >
> > -> reader attempts to read
> >   -> optimistic check doesn't successfully find write locked VMA
> >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> >     (??? how)
>
> That shouldn't be possible, yeah. But per above, it's actually not zero.

Yup so this just makes it more likely to happen...

>
> > (back to vma_mark_attached() -> __vma_enter_locked())
>
> Back to _attached()? but it's _detached() above? You mean _detached()
> right? Just to be sure

Yup, I typed this all a bit too quick...

>
> > -> refcount_add_not_zero() returns true
>
> Ack.
>
> > [ process gets fatal signal ]
> >
> > -> rcuwait_wait_event() errors out
> > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
>
> AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> so this path can't error due to the fatal signal.

Right good point.

I hate that we make this so 'gosh darned' implicit.

We are now assuming that:

1. the only way that RCU wait can fail is due to pending fatal signal
2. and that we're fine here because it's uninterruptible.

I mean very doubtful we'll ever change that but it's still gross.

And as Willy says we're paving the road with good intent^Wlandmines.

>
> > Correct me if the above is wrong.

Yeah I was wrong thankfully :)

The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.

> >
> > I mean is any of this actually possible...?
> >
> > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> >
> > (Again I wonder why we made our lives so difficult here)
> >
> > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > for the readers to go away, and our reader is about to go away anyway, but the
> > process has a fatal signal so do we even care?
>
> Yeah I guess it's for the best to keep vma_mark_detached() use the
> TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
> would be counter productive.
>
> > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > happens...
>
> Not for this path, but for vma_start_write_killable -> __vma_start_write
> -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't

Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can
add a comment.

> trigger, but since we need to check result of the
> refcount_sub_and_test() anyway, we might as well WARN_ON it.

Probably it can't no.

>
> > OK just going to reattach... my head which just exploded from the above :P
> >
> > Cheers, Lorenzo
>
>

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 16:18           ` Lorenzo Stoakes
@ 2025-11-26 18:06             ` Suren Baghdasaryan
  2025-11-26 18:11               ` Lorenzo Stoakes
  0 siblings, 1 reply; 18+ messages in thread
From: Suren Baghdasaryan @ 2025-11-26 18:06 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Vlastimil Babka, Matthew Wilcox, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote:
> >
> >
> > On 11/26/25 4:20 PM, Lorenzo Stoakes wrote:
> > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote:
> > >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
> > >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
> > >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
> > >>>>
> > >>>> Doh! This is embarassing...
> > >>>
> > >>> Hand-rolled synchronization primitives are wonderful, aren't they?
> > >>
> > >> That's why I liked the original approach of just using rwsems.  I
> > >> mst admit to having not paid attention to this recently so I don't
> > >> know what motivated the change.
> > >>
> > >>>> Wait, why do we consider this as a successful acquisition? The
> > >>>> vm_refcnt is 0, so this is similar situation to an earlier:
> > >>>>
> > >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> > >>>>         return 0;
> > >>>
> > >>> But this means "vma is not attached" not "we failed to lock it".
> > >>>
> > >>>> IOW, the vma is not referenced, so we failed to lock it. I think the
> > >>>> fix should be:
> > >>>>
> > >>>>         if (err) {
> > >>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> > >>>> +                       /* Oh cobblers.  While we got a fatal signal, we
> > >>>> +                        * raced with the last user.  VMA is not referenced,
> > >>>> +                        * fail to lock it.
> > >>>> +                        */
> > >>>> +                       err = 0;
> > >>>
> > >>> Returning 0 in this situation therefore wouldn't be correct.
> > >>>
> > >>> AFAIU since we started with attached vma above, it's not possible that
> > >>> the refcount_sub_and_test here will drop the refcnt to zero. We could
> > >>> just WARN_ON_ONCE() on the result (in a way to make also the
> > >>> __must_check happy) and then can return err below.
> > >>
> > >> But how do we know that we started with an attached VMA?  Maybe the VMA
> > >> was in the process of being detached and still has readers?
> > >
> > > So we're talking about:
> > >
> > > vma_mark_deteched()
> > > -> refcount_dec_and_test() [ ref count is zero ]
> > > -> __vma_enter_locked()
> >
> > I think it's refcount is NOT zero to continue with __vma_enter_locked().
>
> Yup sorry, misread the ! clearly...
>
> Which makes a lot more sense when it comes to picking up spurious reader
> refcount increases :)
>
> >
> > > (meanwhile...)
> > >
> > > -> reader attempts to read
> > >   -> optimistic check doesn't successfully find write locked VMA
> > >   -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments
> > >     (??? how)
> >
> > That shouldn't be possible, yeah. But per above, it's actually not zero.
>
> Yup so this just makes it more likely to happen...
>
> >
> > > (back to vma_mark_attached() -> __vma_enter_locked())
> >
> > Back to _attached()? but it's _detached() above? You mean _detached()
> > right? Just to be sure
>
> Yup, I typed this all a bit too quick...
>
> >
> > > -> refcount_add_not_zero() returns true
> >
> > Ack.
> >
> > > [ process gets fatal signal ]
> > >
> > > -> rcuwait_wait_event() errors out
> > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right?
> >
> > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant
> > so this path can't error due to the fatal signal.
>
> Right good point.
>
> I hate that we make this so 'gosh darned' implicit.
>
> We are now assuming that:
>
> 1. the only way that RCU wait can fail is due to pending fatal signal
> 2. and that we're fine here because it's uninterruptible.
>
> I mean very doubtful we'll ever change that but it's still gross.
>
> And as Willy says we're paving the road with good intent^Wlandmines.
>
> >
> > > Correct me if the above is wrong.
>
> Yeah I was wrong thankfully :)
>
> The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.

I went through different scenaros and I think the race Lorenzo
described would look something like this:

READER             WRITER
//recnt=1 (attached, no readers, not write-locked)
vma_start_read()
//vma->vm_lock_seq != mm->mm_lock_seq
                           vma_start_write()
                             __vma_enter_locked(TASK_INTERRUPTIBLE)
                               refcount_add_not_zero(VMA_LOCK_OFFSET)
//refcnt=1+VMA_LOCK_OFFSET
                             WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
//vma->vm_lock_seq == mm->mm_lock_seq
                             __vma_exit_locked()
                                 refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=1
  __refcount_inc_not_zero_limited_acquire()
//refcnt = 2
                           vma_mark_detached()
                             if (!refcount_dec_and_test())
//refcnt=1
                               __vma_enter_locked(TASK_UNINTERRUPTIBLE)
                                 if (refcount_add_not_zero(VMA_LOCK_OFFSET))
//refcnt=1+VMA_LOCK_OFFSET
                                   rcuwait_wait_event(TASK_UNINTERRUPTIBLE)
  if (vma->vm_lock_seq == mm->mm_lock_seq)
    vma_refcount_put(vma);
      __refcount_dec_and_test()
//refcnt=VMA_LOCK_OFFSET
      rcuwait_wake_up()
                           __vma_exit_locked()
                             refcount_sub_and_test(VMA_LOCK_OFFSET)
//refcnt=0 (detached)

This seems to be fine with vma_mark_detached() using
TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to
use TASK_INTERRUPTIBLE I think we need to handle the possible error
from __vma_enter_locked() inside vma_mark_detached() and allow for the
fact that refcnt can drop to 0 after the wait.


>
> > >
> > > I mean is any of this actually possible...?
> > >
> > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess!
> > >
> > > (Again I wonder why we made our lives so difficult here)
> > >
> > > Anyway even if we are midway through a detach, the detach is ostensibly waiting
> > > for the readers to go away, and our reader is about to go away anyway, but the
> > > process has a fatal signal so do we even care?
> >
> > Yeah I guess it's for the best to keep vma_mark_detached() use the
> > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching
> > would be counter productive.
> >
> > > I actually wonder if a WARN_ON() is warranted to see if this even ever
> > > happens...
> >
> > Not for this path, but for vma_start_write_killable -> __vma_start_write
> > -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't
>
> Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can
> add a comment.
>
> > trigger, but since we need to check result of the
> > refcount_sub_and_test() anyway, we might as well WARN_ON it.
>
> Probably it can't no.
>
> >
> > > OK just going to reattach... my head which just exploded from the above :P
> > >
> > > Cheers, Lorenzo
> >
> >
>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 18:06             ` Suren Baghdasaryan
@ 2025-11-26 18:11               ` Lorenzo Stoakes
  0 siblings, 0 replies; 18+ messages in thread
From: Lorenzo Stoakes @ 2025-11-26 18:11 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Matthew Wilcox, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett

On Wed, Nov 26, 2025 at 10:06:28AM -0800, Suren Baghdasaryan wrote:
> On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes
> >
> > The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh.
>
> I went through different scenaros and I think the race Lorenzo
> described would look something like this:
>
> READER             WRITER
> //recnt=1 (attached, no readers, not write-locked)
> vma_start_read()
> //vma->vm_lock_seq != mm->mm_lock_seq
>                            vma_start_write()
>                              __vma_enter_locked(TASK_INTERRUPTIBLE)
>                                refcount_add_not_zero(VMA_LOCK_OFFSET)
> //refcnt=1+VMA_LOCK_OFFSET
>                              WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> //vma->vm_lock_seq == mm->mm_lock_seq
>                              __vma_exit_locked()
>                                  refcount_sub_and_test(VMA_LOCK_OFFSET)
> //refcnt=1
>   __refcount_inc_not_zero_limited_acquire()
> //refcnt = 2
>                            vma_mark_detached()
>                              if (!refcount_dec_and_test())
> //refcnt=1
>                                __vma_enter_locked(TASK_UNINTERRUPTIBLE)
>                                  if (refcount_add_not_zero(VMA_LOCK_OFFSET))
> //refcnt=1+VMA_LOCK_OFFSET
>                                    rcuwait_wait_event(TASK_UNINTERRUPTIBLE)
>   if (vma->vm_lock_seq == mm->mm_lock_seq)
>     vma_refcount_put(vma);
>       __refcount_dec_and_test()
> //refcnt=VMA_LOCK_OFFSET
>       rcuwait_wake_up()
>                            __vma_exit_locked()
>                              refcount_sub_and_test(VMA_LOCK_OFFSET)
> //refcnt=0 (detached)
>
> This seems to be fine with vma_mark_detached() using
> TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to
> use TASK_INTERRUPTIBLE I think we need to handle the possible error
> from __vma_enter_locked() inside vma_mark_detached() and allow for the
> fact that refcnt can drop to 0 after the wait.

Thanks for going deeper into this :)

Vlasta also pointed out the TASK_UNINTERRUPTIBLE saves us, but definitely
something to think about in the future...

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] mm: fix vma_start_write_killable() signal handling
  2025-11-26 15:05     ` Matthew Wilcox
  2025-11-26 15:20       ` Lorenzo Stoakes
@ 2025-11-26 15:53       ` Vlastimil Babka
  1 sibling, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2025-11-26 15:53 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Suren Baghdasaryan, Andrew Morton, linux-mm,
	syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes

On 11/26/25 4:05 PM, Matthew Wilcox wrote:
> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote:
>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote:
>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
>>>
>>> Doh! This is embarassing...
>>
>> Hand-rolled synchronization primitives are wonderful, aren't they?
> 
> That's why I liked the original approach of just using rwsems.  I
> mst admit to having not paid attention to this recently so I don't
> know what motivated the change.
> 
>>> Wait, why do we consider this as a successful acquisition? The
>>> vm_refcnt is 0, so this is similar situation to an earlier:
>>>
>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
>>>         return 0;
>>
>> But this means "vma is not attached" not "we failed to lock it".
>>
>>> IOW, the vma is not referenced, so we failed to lock it. I think the
>>> fix should be:
>>>
>>>         if (err) {
>>> +               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
>>> +                       /* Oh cobblers.  While we got a fatal signal, we
>>> +                        * raced with the last user.  VMA is not referenced,
>>> +                        * fail to lock it.
>>> +                        */
>>> +                       err = 0;
>>
>> Returning 0 in this situation therefore wouldn't be correct.
>>
>> AFAIU since we started with attached vma above, it's not possible that
>> the refcount_sub_and_test here will drop the refcnt to zero. We could
>> just WARN_ON_ONCE() on the result (in a way to make also the
>> __must_check happy) and then can return err below.
> 
> But how do we know that we started with an attached VMA?  Maybe the VMA
> was in the process of being detached and still has readers?

Yeah I didn't write it properly. What I mean is we passed the
"if (!refcount_add_not_zero()" check above, which AFAIU is a "is the vma
attached?" test. If it's not attached, then also no reader could have
incremented the refcount, and we can be the only writer thanks to mmap sem.

And only a writer can also detach it, so I believe we can't see the
refcount being decremented to zero by our refcount_sub_and_test().

But I might be wrong. I'll go through Lorenzo's scenario...


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2025-11-26 18:12 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-26  3:42 [PATCH] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle)
2025-11-26  4:28 ` Suren Baghdasaryan
2025-11-26 14:26   ` Suren Baghdasaryan
2025-11-26 14:40     ` Vlastimil Babka
2025-11-26 15:01     ` Matthew Wilcox
2025-11-26 14:36   ` Vlastimil Babka
2025-11-26 15:02     ` Lorenzo Stoakes
2025-11-26 15:05     ` Matthew Wilcox
2025-11-26 15:20       ` Lorenzo Stoakes
2025-11-26 15:49         ` Suren Baghdasaryan
2025-11-26 16:00           ` Lorenzo Stoakes
2025-11-26 16:11             ` Suren Baghdasaryan
2025-11-26 16:04         ` Vlastimil Babka
2025-11-26 16:06           ` Matthew Wilcox
2025-11-26 16:18           ` Lorenzo Stoakes
2025-11-26 18:06             ` Suren Baghdasaryan
2025-11-26 18:11               ` Lorenzo Stoakes
2025-11-26 15:53       ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox