* [PATCH] mm: fix vma_start_write_killable() signal handling
@ 2025-11-26 3:42 Matthew Wilcox (Oracle)
2025-11-26 4:28 ` Suren Baghdasaryan
0 siblings, 1 reply; 18+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-26 3:42 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: Matthew Wilcox (Oracle),
syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett,
Vlastimil Babka, Lorenzo Stoakes
If we get a signal, we need to restore the vm_refcnt. The wrinkle in
that is that we might be the last reference. If that happens, fix the
refcount to look like we weren't interrupted by a fatal signal.
Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
Andrew, since the vma_start_write_killable() patch is in mm-stable,
I don't think you can put this in as a fixup, right?
Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug.
Any other stupid thing I've done? And am I doing the right thing
with refcount_set()?
mm/mmap_lock.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index e6e5570d1ec7..71af7f0a5fe1 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
if (err) {
+ if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+ /* Oh cobblers. While we got a fatal signal, we
+ * raced with the last user. Pretend we didn't notice
+ * the signal
+ */
+ refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET);
+ goto acquired;
+ }
rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
return err;
}
+acquired:
lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
return 1;
--
2.47.2
^ permalink raw reply [flat|nested] 18+ messages in thread* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 3:42 [PATCH] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle) @ 2025-11-26 4:28 ` Suren Baghdasaryan 2025-11-26 14:26 ` Suren Baghdasaryan 2025-11-26 14:36 ` Vlastimil Babka 0 siblings, 2 replies; 18+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 4:28 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle) <willy@infradead.org> wrote: > > If we get a signal, we need to restore the vm_refcnt. The wrinkle in > that is that we might be the last reference. If that happens, fix the > refcount to look like we weren't interrupted by a fatal signal. > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > Andrew, since the vma_start_write_killable() patch is in mm-stable, > I don't think you can put this in as a fixup, right? > > Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. Doh! This is embarassing... > Any other stupid thing I've done? And am I doing the right thing > with refcount_set()? > > mm/mmap_lock.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index e6e5570d1ec7..71af7f0a5fe1 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > state); > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + /* Oh cobblers. While we got a fatal signal, we > + * raced with the last user. Pretend we didn't notice > + * the signal > + */ > + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); > + goto acquired; Wait, why do we consider this as a successful acquisition? The vm_refcnt is 0, so this is similar situation to an earlier: if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) return 0; IOW, the vma is not referenced, so we failed to lock it. I think the fix should be: if (err) { + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { + /* Oh cobblers. While we got a fatal signal, we + * raced with the last user. VMA is not referenced, + * fail to lock it. + */ + err = 0; + } rwsem_release(&vma->vmlock_dep_map, _RET_IP_); return err; } > + } > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } > +acquired: > lock_acquired(&vma->vmlock_dep_map, _RET_IP_); > > return 1; > -- > 2.47.2 > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 4:28 ` Suren Baghdasaryan @ 2025-11-26 14:26 ` Suren Baghdasaryan 2025-11-26 14:40 ` Vlastimil Babka 2025-11-26 15:01 ` Matthew Wilcox 2025-11-26 14:36 ` Vlastimil Babka 1 sibling, 2 replies; 18+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 14:26 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes On Tue, Nov 25, 2025 at 8:28 PM Suren Baghdasaryan <surenb@google.com> wrote: > > On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle) > <willy@infradead.org> wrote: > > > > If we get a signal, we need to restore the vm_refcnt. The wrinkle in > > that is that we might be the last reference. If that happens, fix the > > refcount to look like we weren't interrupted by a fatal signal. > > > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > --- > > Andrew, since the vma_start_write_killable() patch is in mm-stable, > > I don't think you can put this in as a fixup, right? > > > > Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > Doh! This is embarassing... > > > Any other stupid thing I've done? And am I doing the right thing > > with refcount_set()? > > > > mm/mmap_lock.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > > index e6e5570d1ec7..71af7f0a5fe1 100644 > > --- a/mm/mmap_lock.c > > +++ b/mm/mmap_lock.c > > @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > > state); > > if (err) { > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > + /* Oh cobblers. While we got a fatal signal, we > > + * raced with the last user. Pretend we didn't notice > > + * the signal > > + */ > > + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); > > + goto acquired; > > Wait, why do we consider this as a successful acquisition? The > vm_refcnt is 0, so this is similar situation to an earlier: > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > return 0; > > IOW, the vma is not referenced, so we failed to lock it. I think the > fix should be: After sleeping on it, I don't think we should be masking EINTR error. __vma_enter_locked() result might be the only place where an outer loop is checking for fatal signals, so returning "failure to lock" instead of -EINTR might cause the loop to continue. I think this fix would be better: * If vma is detached then only vma_mark_attached() can raise the * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). */ - if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) + if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { + if (fatal_signal_pending(current)) + return -EINTR; return 0; + } rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait, refcount_read(&vma->vm_refcnt) == tgt_refcnt, state); if (err) { + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { + /* + * No more users but fatal signal is present, + * still return the error. + */ + } rwsem_release(&vma->vmlock_dep_map, _RET_IP_); return err; } > > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + /* Oh cobblers. While we got a fatal signal, we > + * raced with the last user. VMA is not referenced, > + * fail to lock it. > + */ > + err = 0; > + } > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } > > > > + } > > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > > return err; > > } > > +acquired: > > lock_acquired(&vma->vmlock_dep_map, _RET_IP_); > > > > return 1; > > -- > > 2.47.2 > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 14:26 ` Suren Baghdasaryan @ 2025-11-26 14:40 ` Vlastimil Babka 2025-11-26 15:01 ` Matthew Wilcox 1 sibling, 0 replies; 18+ messages in thread From: Vlastimil Babka @ 2025-11-26 14:40 UTC (permalink / raw) To: Suren Baghdasaryan, Matthew Wilcox (Oracle) Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes On 11/26/25 3:26 PM, Suren Baghdasaryan wrote: > On Tue, Nov 25, 2025 at 8:28 PM Suren Baghdasaryan <surenb@google.com> wrote: >> >> On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle) >> <willy@infradead.org> wrote: >>> >>> If we get a signal, we need to restore the vm_refcnt. The wrinkle in >>> that is that we might be the last reference. If that happens, fix the >>> refcount to look like we weren't interrupted by a fatal signal. >>> >>> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com >>> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") >>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>> Cc: Suren Baghdasaryan <surenb@google.com> >>> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >>> Cc: Vlastimil Babka <vbabka@suse.cz> >>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >>> --- >>> Andrew, since the vma_start_write_killable() patch is in mm-stable, >>> I don't think you can put this in as a fixup, right? >>> >>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. >> >> Doh! This is embarassing... >> >>> Any other stupid thing I've done? And am I doing the right thing >>> with refcount_set()? >>> >>> mm/mmap_lock.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c >>> index e6e5570d1ec7..71af7f0a5fe1 100644 >>> --- a/mm/mmap_lock.c >>> +++ b/mm/mmap_lock.c >>> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, >>> refcount_read(&vma->vm_refcnt) == tgt_refcnt, >>> state); >>> if (err) { >>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >>> + /* Oh cobblers. While we got a fatal signal, we >>> + * raced with the last user. Pretend we didn't notice >>> + * the signal >>> + */ >>> + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); >>> + goto acquired; >> >> Wait, why do we consider this as a successful acquisition? The >> vm_refcnt is 0, so this is similar situation to an earlier: >> >> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) >> return 0; >> >> IOW, the vma is not referenced, so we failed to lock it. I think the >> fix should be: > > After sleeping on it, I don't think we should be masking EINTR error. > __vma_enter_locked() result might be the only place where an outer > loop is checking for fatal signals, so returning "failure to lock" > instead of -EINTR might cause the loop to continue. I think this fix > would be better: > > * If vma is detached then only vma_mark_attached() can raise the > * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). > */ > - if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > + if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + if (fatal_signal_pending(current)) > + return -EINTR; Not necessary IMHO. It could become pending right after we return 0, no point in testing it here. We should only care about the signals in the part where we sleep, i.e. rcuwait_wait_event(). > return 0; > + } > > rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); > err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait, > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > state); > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + /* > + * No more users but fatal signal is present, > + * still return the error. > + */ > + } Like I said in the other reply, the "no more users" should not happen, so just WARN_ON_ONCE() on the result, but otherwise agreed. > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 14:26 ` Suren Baghdasaryan 2025-11-26 14:40 ` Vlastimil Babka @ 2025-11-26 15:01 ` Matthew Wilcox 1 sibling, 0 replies; 18+ messages in thread From: Matthew Wilcox @ 2025-11-26 15:01 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes On Wed, Nov 26, 2025 at 06:26:26AM -0800, Suren Baghdasaryan wrote: > > > if (err) { > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > + /* Oh cobblers. While we got a fatal signal, we > > > + * raced with the last user. Pretend we didn't notice > > > + * the signal > > > + */ > > > + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); > > > + goto acquired; > > > > Wait, why do we consider this as a successful acquisition? The > > vm_refcnt is 0, so this is similar situation to an earlier: > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > return 0; > > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > fix should be: Yes, I also wondered about doing it that way. Of course, we hold the write lock, and we found the VMA, so I don't think this can actually occur? > After sleeping on it, I don't think we should be masking EINTR error. > __vma_enter_locked() result might be the only place where an outer > loop is checking for fatal signals, so returning "failure to lock" > instead of -EINTR might cause the loop to continue. I think this fix > would be better: > > * If vma is detached then only vma_mark_attached() can raise the > * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached(). > */ > - if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > + if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + if (fatal_signal_pending(current)) > + return -EINTR; This part seems too much to me. We don't need to check for signals often, just when we'd otherwise sleep. > rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_); > err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait, > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > state); > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + /* > + * No more users but fatal signal is present, > + * still return the error. > + */ > + } Umm. Does the last owner of a vm_refcnt not need to do anything like free the vma? > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 4:28 ` Suren Baghdasaryan 2025-11-26 14:26 ` Suren Baghdasaryan @ 2025-11-26 14:36 ` Vlastimil Babka 2025-11-26 15:02 ` Lorenzo Stoakes 2025-11-26 15:05 ` Matthew Wilcox 1 sibling, 2 replies; 18+ messages in thread From: Vlastimil Babka @ 2025-11-26 14:36 UTC (permalink / raw) To: Suren Baghdasaryan, Matthew Wilcox (Oracle) Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle) > <willy@infradead.org> wrote: >> >> If we get a signal, we need to restore the vm_refcnt. The wrinkle in >> that is that we might be the last reference. If that happens, fix the >> refcount to look like we weren't interrupted by a fatal signal. >> >> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com >> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >> Cc: Suren Baghdasaryan <surenb@google.com> >> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >> --- >> Andrew, since the vma_start_write_killable() patch is in mm-stable, >> I don't think you can put this in as a fixup, right? >> >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > Doh! This is embarassing... Hand-rolled synchronization primitives are wonderful, aren't they? >> Any other stupid thing I've done? And am I doing the right thing >> with refcount_set()? I think it's not wrong, but because it's dead code anyway. So it's unnecessary. >> mm/mmap_lock.c | 9 +++++++++ >> 1 file changed, 9 insertions(+) >> >> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c >> index e6e5570d1ec7..71af7f0a5fe1 100644 >> --- a/mm/mmap_lock.c >> +++ b/mm/mmap_lock.c >> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, >> refcount_read(&vma->vm_refcnt) == tgt_refcnt, >> state); >> if (err) { >> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >> + /* Oh cobblers. While we got a fatal signal, we >> + * raced with the last user. Pretend we didn't notice >> + * the signal >> + */ >> + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); >> + goto acquired; > > Wait, why do we consider this as a successful acquisition? The > vm_refcnt is 0, so this is similar situation to an earlier: > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > return 0; But this means "vma is not attached" not "we failed to lock it". > IOW, the vma is not referenced, so we failed to lock it. I think the > fix should be: > > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > + /* Oh cobblers. While we got a fatal signal, we > + * raced with the last user. VMA is not referenced, > + * fail to lock it. > + */ > + err = 0; Returning 0 in this situation therefore wouldn't be correct. AFAIU since we started with attached vma above, it's not possible that the refcount_sub_and_test here will drop the refcnt to zero. We could just WARN_ON_ONCE() on the result (in a way to make also the __must_check happy) and then can return err below. > + } > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } > > >> + } >> rwsem_release(&vma->vmlock_dep_map, _RET_IP_); >> return err; >> } >> +acquired: >> lock_acquired(&vma->vmlock_dep_map, _RET_IP_); >> >> return 1; >> -- >> 2.47.2 >> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 14:36 ` Vlastimil Babka @ 2025-11-26 15:02 ` Lorenzo Stoakes 2025-11-26 15:05 ` Matthew Wilcox 1 sibling, 0 replies; 18+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 15:02 UTC (permalink / raw) To: Vlastimil Babka Cc: Suren Baghdasaryan, Matthew Wilcox (Oracle), Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > On Tue, Nov 25, 2025 at 7:44 PM Matthew Wilcox (Oracle) > > <willy@infradead.org> wrote: > >> > >> If we get a signal, we need to restore the vm_refcnt. The wrinkle in > >> that is that we might be the last reference. If that happens, fix the > >> refcount to look like we weren't interrupted by a fatal signal. > >> > >> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > >> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > >> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > >> Cc: Suren Baghdasaryan <surenb@google.com> > >> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > >> Cc: Vlastimil Babka <vbabka@suse.cz> > >> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > >> --- > >> Andrew, since the vma_start_write_killable() patch is in mm-stable, > >> I don't think you can put this in as a fixup, right? > >> > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > Doh! This is embarassing... > > Hand-rolled synchronization primitives are wonderful, aren't they? I do wonder why we reworked the VMA locks again, I understood the previous iteration :) Not sure we fully justified that. But it's water under the bridge now. > > >> Any other stupid thing I've done? And am I doing the right thing > >> with refcount_set()? > > I think it's not wrong, but because it's dead code anyway. So it's > unnecessary. > > >> mm/mmap_lock.c | 9 +++++++++ > >> 1 file changed, 9 insertions(+) > >> > >> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > >> index e6e5570d1ec7..71af7f0a5fe1 100644 > >> --- a/mm/mmap_lock.c > >> +++ b/mm/mmap_lock.c > >> @@ -74,9 +74,18 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > >> refcount_read(&vma->vm_refcnt) == tgt_refcnt, > >> state); > >> if (err) { OK so the sequence of events as I undersatnd it are: - VMA_LOCK_OFFSET added to vma->vm_refcnt (assuming not zero) - we try to wait on vma->vm_mm->vma_writer_wait - this fails because ___rcuwait_wait_event() calls signal_pending_state() which (if task is in interruptible mode) indicates a fatal signal, so -EINTR is returned. - Oopsies we need to subtract VMA_LOCK_OFFSET before we're done. - If the resulting reference count is zero, i.e. we held the last reference, we then... take the lock anyway? > >> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > >> + /* Oh cobblers. While we got a fatal signal, we Well for one this comment is broken ;) Plus 'cobblers'... when 'Gosh darn it' was there for the taking... > >> + * raced with the last user. Pretend we didn't notice > >> + * the signal > >> + */ I don't love that we just assume that rcu_wait_event() can and forever more only error out on fatal signal. I mean I guess probably it can but... maybe worth rewording to say we failed to wait so have to subtract but raced and probably if -EINTR this means... > >> + refcount_set(&vma->vm_refcnt, VMA_LOCK_OFFSET); > >> + goto acquired; > > > > Wait, why do we consider this as a successful acquisition? The > > vm_refcnt is 0, so this is similar situation to an earlier: Right yeah, same question? > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > return 0; > > But this means "vma is not attached" not "we failed to lock it". Yup as vma_mark_detached() always takes the last refcount afaict. > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > fix should be: > > > > if (err) { > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > + /* Oh cobblers. While we got a fatal signal, we > > + * raced with the last user. VMA is not referenced, > > + * fail to lock it. > > + */ > > + err = 0; > > Returning 0 in this situation therefore wouldn't be correct. > > AFAIU since we started with attached vma above, it's not possible that > the refcount_sub_and_test here will drop the refcnt to zero. We could > just WARN_ON_ONCE() on the result (in a way to make also the > __must_check happy) and then can return err below. Yeah we are only detaching in the case that we _know_ we are the only writer right? So nobody else should be trying to get the write lock and failing like this? So maybe the 'wrinkle' as you say should be a WARN_ON()... or a VM_WARN_ON() since this 'should never happen' (famous last words TM)? > > > + } > > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > > return err; > > } > > > > > >> + } > >> rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > >> return err; > >> } > >> +acquired: > >> lock_acquired(&vma->vmlock_dep_map, _RET_IP_); > >> > >> return 1; > >> -- > >> 2.47.2 > >> > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 14:36 ` Vlastimil Babka 2025-11-26 15:02 ` Lorenzo Stoakes @ 2025-11-26 15:05 ` Matthew Wilcox 2025-11-26 15:20 ` Lorenzo Stoakes 2025-11-26 15:53 ` Vlastimil Babka 1 sibling, 2 replies; 18+ messages in thread From: Matthew Wilcox @ 2025-11-26 15:05 UTC (permalink / raw) To: Vlastimil Babka Cc: Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > Doh! This is embarassing... > > Hand-rolled synchronization primitives are wonderful, aren't they? That's why I liked the original approach of just using rwsems. I mst admit to having not paid attention to this recently so I don't know what motivated the change. > > Wait, why do we consider this as a successful acquisition? The > > vm_refcnt is 0, so this is similar situation to an earlier: > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > return 0; > > But this means "vma is not attached" not "we failed to lock it". > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > fix should be: > > > > if (err) { > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > + /* Oh cobblers. While we got a fatal signal, we > > + * raced with the last user. VMA is not referenced, > > + * fail to lock it. > > + */ > > + err = 0; > > Returning 0 in this situation therefore wouldn't be correct. > > AFAIU since we started with attached vma above, it's not possible that > the refcount_sub_and_test here will drop the refcnt to zero. We could > just WARN_ON_ONCE() on the result (in a way to make also the > __must_check happy) and then can return err below. But how do we know that we started with an attached VMA? Maybe the VMA was in the process of being detached and still has readers? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 15:05 ` Matthew Wilcox @ 2025-11-26 15:20 ` Lorenzo Stoakes 2025-11-26 15:49 ` Suren Baghdasaryan 2025-11-26 16:04 ` Vlastimil Babka 2025-11-26 15:53 ` Vlastimil Babka 1 sibling, 2 replies; 18+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 15:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Vlastimil Babka, Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > > > Doh! This is embarassing... > > > > Hand-rolled synchronization primitives are wonderful, aren't they? > > That's why I liked the original approach of just using rwsems. I > mst admit to having not paid attention to this recently so I don't > know what motivated the change. > > > > Wait, why do we consider this as a successful acquisition? The > > > vm_refcnt is 0, so this is similar situation to an earlier: > > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > > return 0; > > > > But this means "vma is not attached" not "we failed to lock it". > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > > fix should be: > > > > > > if (err) { > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > + /* Oh cobblers. While we got a fatal signal, we > > > + * raced with the last user. VMA is not referenced, > > > + * fail to lock it. > > > + */ > > > + err = 0; > > > > Returning 0 in this situation therefore wouldn't be correct. > > > > AFAIU since we started with attached vma above, it's not possible that > > the refcount_sub_and_test here will drop the refcnt to zero. We could > > just WARN_ON_ONCE() on the result (in a way to make also the > > __must_check happy) and then can return err below. > > But how do we know that we started with an attached VMA? Maybe the VMA > was in the process of being detached and still has readers? So we're talking about: vma_mark_deteched() -> refcount_dec_and_test() [ ref count is zero ] -> __vma_enter_locked() (meanwhile...) -> reader attempts to read -> optimistic check doesn't successfully find write locked VMA -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments (??? how) (back to vma_mark_attached() -> __vma_enter_locked()) -> refcount_add_not_zero() returns true [ process gets fatal signal ] -> rcuwait_wait_event() errors out -> oopsies need to do something, maybe [VM_]WARN_ON() not right? Correct me if the above is wrong. I mean is any of this actually possible...? Seems dubious. But I guess right now we assume it _is_ possible. What a mess! (Again I wonder why we made our lives so difficult here) Anyway even if we are midway through a detach, the detach is ostensibly waiting for the readers to go away, and our reader is about to go away anyway, but the process has a fatal signal so do we even care? I actually wonder if a WARN_ON() is warranted to see if this even ever happens... OK just going to reattach... my head which just exploded from the above :P Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 15:20 ` Lorenzo Stoakes @ 2025-11-26 15:49 ` Suren Baghdasaryan 2025-11-26 16:00 ` Lorenzo Stoakes 2025-11-26 16:04 ` Vlastimil Babka 1 sibling, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 15:49 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > > > > > Doh! This is embarassing... > > > > > > Hand-rolled synchronization primitives are wonderful, aren't they? > > > > That's why I liked the original approach of just using rwsems. I > > mst admit to having not paid attention to this recently so I don't > > know what motivated the change. That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs which improved the performance of allocating these structs during calls like mmap and fork. > > > > > > Wait, why do we consider this as a successful acquisition? The > > > > vm_refcnt is 0, so this is similar situation to an earlier: > > > > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > > > return 0; > > > > > > But this means "vma is not attached" not "we failed to lock it". > > > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > > > fix should be: > > > > > > > > if (err) { > > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > > + /* Oh cobblers. While we got a fatal signal, we > > > > + * raced with the last user. VMA is not referenced, > > > > + * fail to lock it. > > > > + */ > > > > + err = 0; > > > > > > Returning 0 in this situation therefore wouldn't be correct. > > > > > > AFAIU since we started with attached vma above, it's not possible that > > > the refcount_sub_and_test here will drop the refcnt to zero. We could > > > just WARN_ON_ONCE() on the result (in a way to make also the > > > __must_check happy) and then can return err below. > > > > But how do we know that we started with an attached VMA? Maybe the VMA > > was in the process of being detached and still has readers? > > So we're talking about: > > vma_mark_deteched() > -> refcount_dec_and_test() [ ref count is zero ] > -> __vma_enter_locked() > > (meanwhile...) > > -> reader attempts to read > -> optimistic check doesn't successfully find write locked VMA > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > (??? how) > > (back to vma_mark_attached() -> __vma_enter_locked()) > > -> refcount_add_not_zero() returns true > > [ process gets fatal signal ] > > -> rcuwait_wait_event() errors out > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > Correct me if the above is wrong. > > I mean is any of this actually possible...? > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > (Again I wonder why we made our lives so difficult here) > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > for the readers to go away, and our reader is about to go away anyway, but the > process has a fatal signal so do we even care? > > I actually wonder if a WARN_ON() is warranted to see if this even ever > happens... > > OK just going to reattach... my head which just exploded from the above :P We are overthinking this. Vlastimil is right. If we entered with an attached VMA (which is the case due to this check: https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60) then the only function that can detach the VMA from under us is vma_mark_detached() but that function can't race with __vma_enter_locked() because both functions are required to hold mmap_write_lock. vma_mark_detached() has an explicit vma_assert_write_locked(vma) check (which implies mmap_write_lock) and vma_start_write() calls __is_vma_write_locked() which does mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an mmap_assert_write_locked(vma->vm_mm); at the beginning of the __vma_enter_locked() to make that obvious. > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 15:49 ` Suren Baghdasaryan @ 2025-11-26 16:00 ` Lorenzo Stoakes 2025-11-26 16:11 ` Suren Baghdasaryan 0 siblings, 1 reply; 18+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 16:00 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 07:49:13AM -0800, Suren Baghdasaryan wrote: > On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > > > > > > > Doh! This is embarassing... > > > > > > > > Hand-rolled synchronization primitives are wonderful, aren't they? > > > > > > That's why I liked the original approach of just using rwsems. I > > > mst admit to having not paid attention to this recently so I don't > > > know what motivated the change. > > That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs > which improved the performance of allocating these structs during > calls like mmap and fork. Hmm, we doubled down :) Hope the wins were worth the complexity, but obviously the horse has bolted and I should have asked more about it at the time... > > > > > > > > > Wait, why do we consider this as a successful acquisition? The > > > > > vm_refcnt is 0, so this is similar situation to an earlier: > > > > > > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > > > > return 0; > > > > > > > > But this means "vma is not attached" not "we failed to lock it". > > > > > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > > > > fix should be: > > > > > > > > > > if (err) { > > > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > > > + /* Oh cobblers. While we got a fatal signal, we > > > > > + * raced with the last user. VMA is not referenced, > > > > > + * fail to lock it. > > > > > + */ > > > > > + err = 0; > > > > > > > > Returning 0 in this situation therefore wouldn't be correct. > > > > > > > > AFAIU since we started with attached vma above, it's not possible that > > > > the refcount_sub_and_test here will drop the refcnt to zero. We could > > > > just WARN_ON_ONCE() on the result (in a way to make also the > > > > __must_check happy) and then can return err below. > > > > > > But how do we know that we started with an attached VMA? Maybe the VMA > > > was in the process of being detached and still has readers? > > > > So we're talking about: > > > > vma_mark_deteched() > > -> refcount_dec_and_test() [ ref count is zero ] > > -> __vma_enter_locked() > > > > (meanwhile...) > > > > -> reader attempts to read > > -> optimistic check doesn't successfully find write locked VMA > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > > (??? how) > > > > (back to vma_mark_attached() -> __vma_enter_locked()) > > > > -> refcount_add_not_zero() returns true > > > > [ process gets fatal signal ] > > > > -> rcuwait_wait_event() errors out > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > > > Correct me if the above is wrong. > > > > I mean is any of this actually possible...? > > > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > > > (Again I wonder why we made our lives so difficult here) > > > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > > for the readers to go away, and our reader is about to go away anyway, but the > > process has a fatal signal so do we even care? > > > > I actually wonder if a WARN_ON() is warranted to see if this even ever > > happens... > > > > OK just going to reattach... my head which just exploded from the above :P > > We are overthinking this. Vlastimil is right. If we entered with an Watch Vlastimil frame this... ;) > attached VMA (which is the case due to this check: > https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60) > then the only function that can detach the VMA from under us is > vma_mark_detached() but that function can't race with We hold the mmap/VMA write lock yes, but as per the comment in vma_mark_detached() + above list readers can spuriously increment the refcount (ostensibly) as per: /* Wait until vma is detached with no readers. */ So doesn't that mean we can hit the issue Matthew raised? Otherwise there would be no need to do this dance in vma_mark_detached() anyway? Maybe I'm missing something. > __vma_enter_locked() because both functions are required to hold > mmap_write_lock. vma_mark_detached() has an explicit > vma_assert_write_locked(vma) check (which implies mmap_write_lock) and > vma_start_write() calls __is_vma_write_locked() which does > mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an > mmap_assert_write_locked(vma->vm_mm); at the beginning of the > __vma_enter_locked() to make that obvious. > > > Cheers, Lorenzo > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 16:00 ` Lorenzo Stoakes @ 2025-11-26 16:11 ` Suren Baghdasaryan 0 siblings, 0 replies; 18+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 16:11 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Matthew Wilcox, Vlastimil Babka, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 8:01 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Wed, Nov 26, 2025 at 07:49:13AM -0800, Suren Baghdasaryan wrote: > > On Wed, Nov 26, 2025 at 7:20 AM Lorenzo Stoakes > > <lorenzo.stoakes@oracle.com> wrote: > > > > > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > > > > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > > > > > On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > > > > >> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > > > > > > > > > > > Doh! This is embarassing... > > > > > > > > > > Hand-rolled synchronization primitives are wonderful, aren't they? > > > > > > > > That's why I liked the original approach of just using rwsems. I > > > > mst admit to having not paid attention to this recently so I don't > > > > know what motivated the change. > > > > That made it simpler to add SLAB_TYPESAFE_BY_RCU for vm_area_structs > > which improved the performance of allocating these structs during > > calls like mmap and fork. > > Hmm, we doubled down :) > > Hope the wins were worth the complexity, but obviously the horse has bolted > and I should have asked more about it at the time... > > > > > > > > > > > > > Wait, why do we consider this as a successful acquisition? The > > > > > > vm_refcnt is 0, so this is similar situation to an earlier: > > > > > > > > > > > > if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > > > > > return 0; > > > > > > > > > > But this means "vma is not attached" not "we failed to lock it". > > > > > > > > > > > IOW, the vma is not referenced, so we failed to lock it. I think the > > > > > > fix should be: > > > > > > > > > > > > if (err) { > > > > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > > > > + /* Oh cobblers. While we got a fatal signal, we > > > > > > + * raced with the last user. VMA is not referenced, > > > > > > + * fail to lock it. > > > > > > + */ > > > > > > + err = 0; > > > > > > > > > > Returning 0 in this situation therefore wouldn't be correct. > > > > > > > > > > AFAIU since we started with attached vma above, it's not possible that > > > > > the refcount_sub_and_test here will drop the refcnt to zero. We could > > > > > just WARN_ON_ONCE() on the result (in a way to make also the > > > > > __must_check happy) and then can return err below. > > > > > > > > But how do we know that we started with an attached VMA? Maybe the VMA > > > > was in the process of being detached and still has readers? > > > > > > So we're talking about: > > > > > > vma_mark_deteched() > > > -> refcount_dec_and_test() [ ref count is zero ] > > > -> __vma_enter_locked() > > > > > > (meanwhile...) > > > > > > -> reader attempts to read > > > -> optimistic check doesn't successfully find write locked VMA > > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > > > (??? how) > > > > > > (back to vma_mark_attached() -> __vma_enter_locked()) > > > > > > -> refcount_add_not_zero() returns true > > > > > > [ process gets fatal signal ] > > > > > > -> rcuwait_wait_event() errors out > > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > > > > > Correct me if the above is wrong. > > > > > > I mean is any of this actually possible...? > > > > > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > > > > > (Again I wonder why we made our lives so difficult here) > > > > > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > > > for the readers to go away, and our reader is about to go away anyway, but the > > > process has a fatal signal so do we even care? > > > > > > I actually wonder if a WARN_ON() is warranted to see if this even ever > > > happens... > > > > > > OK just going to reattach... my head which just exploded from the above :P > > > > We are overthinking this. Vlastimil is right. If we entered with an > > Watch Vlastimil frame this... ;) > > > attached VMA (which is the case due to this check: > > https://elixir.bootlin.com/linux/v6.18-rc7/source/mm/mmap_lock.c#L60) > > then the only function that can detach the VMA from under us is > > vma_mark_detached() but that function can't race with > > We hold the mmap/VMA write lock yes, but as per the comment in > vma_mark_detached() + above list readers can spuriously increment the > refcount (ostensibly) as per: > > /* Wait until vma is detached with no readers. */ > > So doesn't that mean we can hit the issue Matthew raised? > > Otherwise there would be no need to do this dance in vma_mark_detached() > anyway? > > Maybe I'm missing something. No, you are right. I realized after sending the reply this temporary refcount from the reader can be dropped from under the writer. Was in the process of analyzing these paths when Vlastimil's reply came in. I'll spend some more time stairing at it to see if we are missing anything else. > > > __vma_enter_locked() because both functions are required to hold > > mmap_write_lock. vma_mark_detached() has an explicit > > vma_assert_write_locked(vma) check (which implies mmap_write_lock) and > > vma_start_write() calls __is_vma_write_locked() which does > > mmap_assert_write_locked(vma->vm_mm). We can add a comment here or an > > mmap_assert_write_locked(vma->vm_mm); at the beginning of the > > __vma_enter_locked() to make that obvious. > > > > > Cheers, Lorenzo > > > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 15:20 ` Lorenzo Stoakes 2025-11-26 15:49 ` Suren Baghdasaryan @ 2025-11-26 16:04 ` Vlastimil Babka 2025-11-26 16:06 ` Matthew Wilcox 2025-11-26 16:18 ` Lorenzo Stoakes 1 sibling, 2 replies; 18+ messages in thread From: Vlastimil Babka @ 2025-11-26 16:04 UTC (permalink / raw) To: Lorenzo Stoakes, Matthew Wilcox Cc: Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On 11/26/25 4:20 PM, Lorenzo Stoakes wrote: > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. >>>> >>>> Doh! This is embarassing... >>> >>> Hand-rolled synchronization primitives are wonderful, aren't they? >> >> That's why I liked the original approach of just using rwsems. I >> mst admit to having not paid attention to this recently so I don't >> know what motivated the change. >> >>>> Wait, why do we consider this as a successful acquisition? The >>>> vm_refcnt is 0, so this is similar situation to an earlier: >>>> >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) >>>> return 0; >>> >>> But this means "vma is not attached" not "we failed to lock it". >>> >>>> IOW, the vma is not referenced, so we failed to lock it. I think the >>>> fix should be: >>>> >>>> if (err) { >>>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >>>> + /* Oh cobblers. While we got a fatal signal, we >>>> + * raced with the last user. VMA is not referenced, >>>> + * fail to lock it. >>>> + */ >>>> + err = 0; >>> >>> Returning 0 in this situation therefore wouldn't be correct. >>> >>> AFAIU since we started with attached vma above, it's not possible that >>> the refcount_sub_and_test here will drop the refcnt to zero. We could >>> just WARN_ON_ONCE() on the result (in a way to make also the >>> __must_check happy) and then can return err below. >> >> But how do we know that we started with an attached VMA? Maybe the VMA >> was in the process of being detached and still has readers? > > So we're talking about: > > vma_mark_deteched() > -> refcount_dec_and_test() [ ref count is zero ] > -> __vma_enter_locked() I think it's refcount is NOT zero to continue with __vma_enter_locked(). > (meanwhile...) > > -> reader attempts to read > -> optimistic check doesn't successfully find write locked VMA > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > (??? how) That shouldn't be possible, yeah. But per above, it's actually not zero. > (back to vma_mark_attached() -> __vma_enter_locked()) Back to _attached()? but it's _detached() above? You mean _detached() right? Just to be sure > -> refcount_add_not_zero() returns true Ack. > [ process gets fatal signal ] > > -> rcuwait_wait_event() errors out > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant so this path can't error due to the fatal signal. > Correct me if the above is wrong. > > I mean is any of this actually possible...? > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > (Again I wonder why we made our lives so difficult here) > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > for the readers to go away, and our reader is about to go away anyway, but the > process has a fatal signal so do we even care? Yeah I guess it's for the best to keep vma_mark_detached() use the TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching would be counter productive. > I actually wonder if a WARN_ON() is warranted to see if this even ever > happens... Not for this path, but for vma_start_write_killable -> __vma_start_write -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't trigger, but since we need to check result of the refcount_sub_and_test() anyway, we might as well WARN_ON it. > OK just going to reattach... my head which just exploded from the above :P > > Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 16:04 ` Vlastimil Babka @ 2025-11-26 16:06 ` Matthew Wilcox 2025-11-26 16:18 ` Lorenzo Stoakes 1 sibling, 0 replies; 18+ messages in thread From: Matthew Wilcox @ 2025-11-26 16:06 UTC (permalink / raw) To: Vlastimil Babka Cc: Lorenzo Stoakes, Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote: > > [ process gets fatal signal ] > > > > -> rcuwait_wait_event() errors out > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant > so this path can't error due to the fatal signal. Well, today. That's a nasty little landmine to leave for someone else to stumble over in the future. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 16:04 ` Vlastimil Babka 2025-11-26 16:06 ` Matthew Wilcox @ 2025-11-26 16:18 ` Lorenzo Stoakes 2025-11-26 18:06 ` Suren Baghdasaryan 1 sibling, 1 reply; 18+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 16:18 UTC (permalink / raw) To: Vlastimil Babka Cc: Matthew Wilcox, Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote: > > > On 11/26/25 4:20 PM, Lorenzo Stoakes wrote: > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > >>>> > >>>> Doh! This is embarassing... > >>> > >>> Hand-rolled synchronization primitives are wonderful, aren't they? > >> > >> That's why I liked the original approach of just using rwsems. I > >> mst admit to having not paid attention to this recently so I don't > >> know what motivated the change. > >> > >>>> Wait, why do we consider this as a successful acquisition? The > >>>> vm_refcnt is 0, so this is similar situation to an earlier: > >>>> > >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > >>>> return 0; > >>> > >>> But this means "vma is not attached" not "we failed to lock it". > >>> > >>>> IOW, the vma is not referenced, so we failed to lock it. I think the > >>>> fix should be: > >>>> > >>>> if (err) { > >>>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > >>>> + /* Oh cobblers. While we got a fatal signal, we > >>>> + * raced with the last user. VMA is not referenced, > >>>> + * fail to lock it. > >>>> + */ > >>>> + err = 0; > >>> > >>> Returning 0 in this situation therefore wouldn't be correct. > >>> > >>> AFAIU since we started with attached vma above, it's not possible that > >>> the refcount_sub_and_test here will drop the refcnt to zero. We could > >>> just WARN_ON_ONCE() on the result (in a way to make also the > >>> __must_check happy) and then can return err below. > >> > >> But how do we know that we started with an attached VMA? Maybe the VMA > >> was in the process of being detached and still has readers? > > > > So we're talking about: > > > > vma_mark_deteched() > > -> refcount_dec_and_test() [ ref count is zero ] > > -> __vma_enter_locked() > > I think it's refcount is NOT zero to continue with __vma_enter_locked(). Yup sorry, misread the ! clearly... Which makes a lot more sense when it comes to picking up spurious reader refcount increases :) > > > (meanwhile...) > > > > -> reader attempts to read > > -> optimistic check doesn't successfully find write locked VMA > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > > (??? how) > > That shouldn't be possible, yeah. But per above, it's actually not zero. Yup so this just makes it more likely to happen... > > > (back to vma_mark_attached() -> __vma_enter_locked()) > > Back to _attached()? but it's _detached() above? You mean _detached() > right? Just to be sure Yup, I typed this all a bit too quick... > > > -> refcount_add_not_zero() returns true > > Ack. > > > [ process gets fatal signal ] > > > > -> rcuwait_wait_event() errors out > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant > so this path can't error due to the fatal signal. Right good point. I hate that we make this so 'gosh darned' implicit. We are now assuming that: 1. the only way that RCU wait can fail is due to pending fatal signal 2. and that we're fine here because it's uninterruptible. I mean very doubtful we'll ever change that but it's still gross. And as Willy says we're paving the road with good intent^Wlandmines. > > > Correct me if the above is wrong. Yeah I was wrong thankfully :) The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh. > > > > I mean is any of this actually possible...? > > > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > > > (Again I wonder why we made our lives so difficult here) > > > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > > for the readers to go away, and our reader is about to go away anyway, but the > > process has a fatal signal so do we even care? > > Yeah I guess it's for the best to keep vma_mark_detached() use the > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching > would be counter productive. > > > I actually wonder if a WARN_ON() is warranted to see if this even ever > > happens... > > Not for this path, but for vma_start_write_killable -> __vma_start_write > -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can add a comment. > trigger, but since we need to check result of the > refcount_sub_and_test() anyway, we might as well WARN_ON it. Probably it can't no. > > > OK just going to reattach... my head which just exploded from the above :P > > > > Cheers, Lorenzo > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 16:18 ` Lorenzo Stoakes @ 2025-11-26 18:06 ` Suren Baghdasaryan 2025-11-26 18:11 ` Lorenzo Stoakes 0 siblings, 1 reply; 18+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 18:06 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Vlastimil Babka, Matthew Wilcox, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > > On Wed, Nov 26, 2025 at 05:04:09PM +0100, Vlastimil Babka wrote: > > > > > > On 11/26/25 4:20 PM, Lorenzo Stoakes wrote: > > > On Wed, Nov 26, 2025 at 03:05:44PM +0000, Matthew Wilcox wrote: > > >> On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: > > >>> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: > > >>>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. > > >>>> > > >>>> Doh! This is embarassing... > > >>> > > >>> Hand-rolled synchronization primitives are wonderful, aren't they? > > >> > > >> That's why I liked the original approach of just using rwsems. I > > >> mst admit to having not paid attention to this recently so I don't > > >> know what motivated the change. > > >> > > >>>> Wait, why do we consider this as a successful acquisition? The > > >>>> vm_refcnt is 0, so this is similar situation to an earlier: > > >>>> > > >>>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) > > >>>> return 0; > > >>> > > >>> But this means "vma is not attached" not "we failed to lock it". > > >>> > > >>>> IOW, the vma is not referenced, so we failed to lock it. I think the > > >>>> fix should be: > > >>>> > > >>>> if (err) { > > >>>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > >>>> + /* Oh cobblers. While we got a fatal signal, we > > >>>> + * raced with the last user. VMA is not referenced, > > >>>> + * fail to lock it. > > >>>> + */ > > >>>> + err = 0; > > >>> > > >>> Returning 0 in this situation therefore wouldn't be correct. > > >>> > > >>> AFAIU since we started with attached vma above, it's not possible that > > >>> the refcount_sub_and_test here will drop the refcnt to zero. We could > > >>> just WARN_ON_ONCE() on the result (in a way to make also the > > >>> __must_check happy) and then can return err below. > > >> > > >> But how do we know that we started with an attached VMA? Maybe the VMA > > >> was in the process of being detached and still has readers? > > > > > > So we're talking about: > > > > > > vma_mark_deteched() > > > -> refcount_dec_and_test() [ ref count is zero ] > > > -> __vma_enter_locked() > > > > I think it's refcount is NOT zero to continue with __vma_enter_locked(). > > Yup sorry, misread the ! clearly... > > Which makes a lot more sense when it comes to picking up spurious reader > refcount increases :) > > > > > > (meanwhile...) > > > > > > -> reader attempts to read > > > -> optimistic check doesn't successfully find write locked VMA > > > -> __refcount_inc_not_zero_limited_acqure() somehow doesn't notice 0 refcount and increments > > > (??? how) > > > > That shouldn't be possible, yeah. But per above, it's actually not zero. > > Yup so this just makes it more likely to happen... > > > > > > (back to vma_mark_attached() -> __vma_enter_locked()) > > > > Back to _attached()? but it's _detached() above? You mean _detached() > > right? Just to be sure > > Yup, I typed this all a bit too quick... > > > > > > -> refcount_add_not_zero() returns true > > > > Ack. > > > > > [ process gets fatal signal ] > > > > > > -> rcuwait_wait_event() errors out > > > -> oopsies need to do something, maybe [VM_]WARN_ON() not right? > > > > AFAICS from vma_mark_detached() we use the TASK_UNINTERRUPTIBLE variant > > so this path can't error due to the fatal signal. > > Right good point. > > I hate that we make this so 'gosh darned' implicit. > > We are now assuming that: > > 1. the only way that RCU wait can fail is due to pending fatal signal > 2. and that we're fine here because it's uninterruptible. > > I mean very doubtful we'll ever change that but it's still gross. > > And as Willy says we're paving the road with good intent^Wlandmines. > > > > > > Correct me if the above is wrong. > > Yeah I was wrong thankfully :) > > The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh. I went through different scenaros and I think the race Lorenzo described would look something like this: READER WRITER //recnt=1 (attached, no readers, not write-locked) vma_start_read() //vma->vm_lock_seq != mm->mm_lock_seq vma_start_write() __vma_enter_locked(TASK_INTERRUPTIBLE) refcount_add_not_zero(VMA_LOCK_OFFSET) //refcnt=1+VMA_LOCK_OFFSET WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); //vma->vm_lock_seq == mm->mm_lock_seq __vma_exit_locked() refcount_sub_and_test(VMA_LOCK_OFFSET) //refcnt=1 __refcount_inc_not_zero_limited_acquire() //refcnt = 2 vma_mark_detached() if (!refcount_dec_and_test()) //refcnt=1 __vma_enter_locked(TASK_UNINTERRUPTIBLE) if (refcount_add_not_zero(VMA_LOCK_OFFSET)) //refcnt=1+VMA_LOCK_OFFSET rcuwait_wait_event(TASK_UNINTERRUPTIBLE) if (vma->vm_lock_seq == mm->mm_lock_seq) vma_refcount_put(vma); __refcount_dec_and_test() //refcnt=VMA_LOCK_OFFSET rcuwait_wake_up() __vma_exit_locked() refcount_sub_and_test(VMA_LOCK_OFFSET) //refcnt=0 (detached) This seems to be fine with vma_mark_detached() using TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to use TASK_INTERRUPTIBLE I think we need to handle the possible error from __vma_enter_locked() inside vma_mark_detached() and allow for the fact that refcnt can drop to 0 after the wait. > > > > > > > I mean is any of this actually possible...? > > > > > > Seems dubious. But I guess right now we assume it _is_ possible. What a mess! > > > > > > (Again I wonder why we made our lives so difficult here) > > > > > > Anyway even if we are midway through a detach, the detach is ostensibly waiting > > > for the readers to go away, and our reader is about to go away anyway, but the > > > process has a fatal signal so do we even care? > > > > Yeah I guess it's for the best to keep vma_mark_detached() use the > > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching > > would be counter productive. > > > > > I actually wonder if a WARN_ON() is warranted to see if this even ever > > > happens... > > > > Not for this path, but for vma_start_write_killable -> __vma_start_write > > -> __vma_enter_locked(... TASK_KILLABLE). I think it still can't > > Well if it's impossible for TASK_UNINTERRUPTIBLE no harm in adding it right? Can > add a comment. > > > trigger, but since we need to check result of the > > refcount_sub_and_test() anyway, we might as well WARN_ON it. > > Probably it can't no. > > > > > > OK just going to reattach... my head which just exploded from the above :P > > > > > > Cheers, Lorenzo > > > > > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:06 ` Suren Baghdasaryan @ 2025-11-26 18:11 ` Lorenzo Stoakes 0 siblings, 0 replies; 18+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 18:11 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Vlastimil Babka, Matthew Wilcox, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 10:06:28AM -0800, Suren Baghdasaryan wrote: > On Wed, Nov 26, 2025 at 8:18 AM Lorenzo Stoakes > > > > The TASK_UNINTERRUPTIBLE saves us, but it's all still a bit ugh. > > I went through different scenaros and I think the race Lorenzo > described would look something like this: > > READER WRITER > //recnt=1 (attached, no readers, not write-locked) > vma_start_read() > //vma->vm_lock_seq != mm->mm_lock_seq > vma_start_write() > __vma_enter_locked(TASK_INTERRUPTIBLE) > refcount_add_not_zero(VMA_LOCK_OFFSET) > //refcnt=1+VMA_LOCK_OFFSET > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > //vma->vm_lock_seq == mm->mm_lock_seq > __vma_exit_locked() > refcount_sub_and_test(VMA_LOCK_OFFSET) > //refcnt=1 > __refcount_inc_not_zero_limited_acquire() > //refcnt = 2 > vma_mark_detached() > if (!refcount_dec_and_test()) > //refcnt=1 > __vma_enter_locked(TASK_UNINTERRUPTIBLE) > if (refcount_add_not_zero(VMA_LOCK_OFFSET)) > //refcnt=1+VMA_LOCK_OFFSET > rcuwait_wait_event(TASK_UNINTERRUPTIBLE) > if (vma->vm_lock_seq == mm->mm_lock_seq) > vma_refcount_put(vma); > __refcount_dec_and_test() > //refcnt=VMA_LOCK_OFFSET > rcuwait_wake_up() > __vma_exit_locked() > refcount_sub_and_test(VMA_LOCK_OFFSET) > //refcnt=0 (detached) > > This seems to be fine with vma_mark_detached() using > TASK_UNINTERRUPTIBLE. If we decide to change vma_mark_detached() to > use TASK_INTERRUPTIBLE I think we need to handle the possible error > from __vma_enter_locked() inside vma_mark_detached() and allow for the > fact that refcnt can drop to 0 after the wait. Thanks for going deeper into this :) Vlasta also pointed out the TASK_UNINTERRUPTIBLE saves us, but definitely something to think about in the future... Cheers, Lorenzo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH] mm: fix vma_start_write_killable() signal handling 2025-11-26 15:05 ` Matthew Wilcox 2025-11-26 15:20 ` Lorenzo Stoakes @ 2025-11-26 15:53 ` Vlastimil Babka 1 sibling, 0 replies; 18+ messages in thread From: Vlastimil Babka @ 2025-11-26 15:53 UTC (permalink / raw) To: Matthew Wilcox Cc: Suren Baghdasaryan, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Lorenzo Stoakes On 11/26/25 4:05 PM, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 03:36:46PM +0100, Vlastimil Babka wrote: >> On 11/26/25 5:28 AM, Suren Baghdasaryan wrote: >>>> Suren, Liam, Vlastimil, Lorenzo ... none of you spotted this bug. >>> >>> Doh! This is embarassing... >> >> Hand-rolled synchronization primitives are wonderful, aren't they? > > That's why I liked the original approach of just using rwsems. I > mst admit to having not paid attention to this recently so I don't > know what motivated the change. > >>> Wait, why do we consider this as a successful acquisition? The >>> vm_refcnt is 0, so this is similar situation to an earlier: >>> >>> if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt)) >>> return 0; >> >> But this means "vma is not attached" not "we failed to lock it". >> >>> IOW, the vma is not referenced, so we failed to lock it. I think the >>> fix should be: >>> >>> if (err) { >>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >>> + /* Oh cobblers. While we got a fatal signal, we >>> + * raced with the last user. VMA is not referenced, >>> + * fail to lock it. >>> + */ >>> + err = 0; >> >> Returning 0 in this situation therefore wouldn't be correct. >> >> AFAIU since we started with attached vma above, it's not possible that >> the refcount_sub_and_test here will drop the refcnt to zero. We could >> just WARN_ON_ONCE() on the result (in a way to make also the >> __must_check happy) and then can return err below. > > But how do we know that we started with an attached VMA? Maybe the VMA > was in the process of being detached and still has readers? Yeah I didn't write it properly. What I mean is we passed the "if (!refcount_add_not_zero()" check above, which AFAIU is a "is the vma attached?" test. If it's not attached, then also no reader could have incremented the refcount, and we can be the only writer thanks to mmap sem. And only a writer can also detach it, so I believe we can't see the refcount being decremented to zero by our refcount_sub_and_test(). But I might be wrong. I'll go through Lorenzo's scenario... ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-11-26 18:12 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-11-26 3:42 [PATCH] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle) 2025-11-26 4:28 ` Suren Baghdasaryan 2025-11-26 14:26 ` Suren Baghdasaryan 2025-11-26 14:40 ` Vlastimil Babka 2025-11-26 15:01 ` Matthew Wilcox 2025-11-26 14:36 ` Vlastimil Babka 2025-11-26 15:02 ` Lorenzo Stoakes 2025-11-26 15:05 ` Matthew Wilcox 2025-11-26 15:20 ` Lorenzo Stoakes 2025-11-26 15:49 ` Suren Baghdasaryan 2025-11-26 16:00 ` Lorenzo Stoakes 2025-11-26 16:11 ` Suren Baghdasaryan 2025-11-26 16:04 ` Vlastimil Babka 2025-11-26 16:06 ` Matthew Wilcox 2025-11-26 16:18 ` Lorenzo Stoakes 2025-11-26 18:06 ` Suren Baghdasaryan 2025-11-26 18:11 ` Lorenzo Stoakes 2025-11-26 15:53 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox