* [PATCH v2] mm: fix vma_start_write_killable() signal handling
@ 2025-11-26 17:44 Matthew Wilcox (Oracle)
2025-11-26 18:06 ` Lorenzo Stoakes
0 siblings, 1 reply; 14+ messages in thread
From: Matthew Wilcox (Oracle) @ 2025-11-26 17:44 UTC (permalink / raw)
To: Andrew Morton
Cc: Matthew Wilcox (Oracle),
linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan,
Liam R. Howlett, Vlastimil Babka, Lorenzo Stoakes
If we get a signal, we need to restore the vm_refcnt. We don't think
that the refcount can actually be decremented to zero here as it
requires the VMA to be detached, and the vma_mark_detached() uses
TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it
as if the refcount was zero at the start of this function.
Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com
Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap_lock.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index e6e5570d1ec7..3c9bf2f96280 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
if (err) {
+ if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+ /*
+ * We got a fatal signal, but the last reader went
+ * away as well. Resolve the race in favour of
+ * the vma being detached.
+ */
+ err = 0;
+ }
rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
return err;
}
--
2.47.2
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 17:44 [PATCH v2] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle) @ 2025-11-26 18:06 ` Lorenzo Stoakes 2025-11-26 18:28 ` Matthew Wilcox 0 siblings, 1 reply; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 18:06 UTC (permalink / raw) To: Matthew Wilcox (Oracle) Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: > If we get a signal, we need to restore the vm_refcnt. We don't think > that the refcount can actually be decremented to zero here as it > requires the VMA to be detached, and the vma_mark_detached() uses > TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it > as if the refcount was zero at the start of this function. > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Suren Baghdasaryan <surenb@google.com> > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > mm/mmap_lock.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > index e6e5570d1ec7..3c9bf2f96280 100644 > --- a/mm/mmap_lock.c > +++ b/mm/mmap_lock.c > @@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > state); > if (err) { > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { Really think we should WARN_ON_ONCE() as Vlasta suggested. It's an 'impossible' situation so we should make that clear. And we should find out about it if the impossible happens... :) > + /* > + * We got a fatal signal, but the last reader went > + * away as well. Resolve the race in favour of This is very subtle, I don't think this really explains this clearly enough. Maybe put something like: /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ Before the refcount_sub_and_test() And: /* Shouldn't be possible - VMA entirely detached, so treat it as such. */ Before err = 0? > + * the vma being detached. > + */ > + err = 0; < pre-existing issue > As discussed off-list, this name is horrible. It's a value that returns 1 if attached and now locked, 0 if detached, or an error code if the RCU waiter fails. It's not an error code and this actually tripped me up reading this code. We can address it elsewhere however. > + } > rwsem_release(&vma->vmlock_dep_map, _RET_IP_); > return err; > } > -- > 2.47.2 > > Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:06 ` Lorenzo Stoakes @ 2025-11-26 18:28 ` Matthew Wilcox 2025-11-26 18:43 ` Suren Baghdasaryan 2025-11-26 18:55 ` Lorenzo Stoakes 0 siblings, 2 replies; 14+ messages in thread From: Matthew Wilcox @ 2025-11-26 18:28 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 06:06:53PM +0000, Lorenzo Stoakes wrote: > On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: > > If we get a signal, we need to restore the vm_refcnt. We don't think > > that the refcount can actually be decremented to zero here as it > > requires the VMA to be detached, and the vma_mark_detached() uses > > TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it > > as if the refcount was zero at the start of this function. > > > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > Cc: Suren Baghdasaryan <surenb@google.com> > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > Cc: Vlastimil Babka <vbabka@suse.cz> > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > --- > > mm/mmap_lock.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > > index e6e5570d1ec7..3c9bf2f96280 100644 > > --- a/mm/mmap_lock.c > > +++ b/mm/mmap_lock.c > > @@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > > state); > > if (err) { > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > Really think we should WARN_ON_ONCE() as Vlasta suggested. > > It's an 'impossible' situation so we should make that clear. And we should > find out about it if the impossible happens... :) It's only "impossible" currently due to some fairly esoteric reasoning. As far as _this_ function is concerned, it's entirely possible. I don't want to leave this trap for the next person who calls __vma_enter_locked(TASK_KILLABLE). > > + /* > > + * We got a fatal signal, but the last reader went > > + * away as well. Resolve the race in favour of > > This is very subtle, I don't think this really explains this clearly > enough. > > Maybe put something like: > > /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ > > Before the refcount_sub_and_test() I think this falls into the "saying what you're doing, not why you're doing it" trap. Whereas my comment is at a higher level -- there's a race where both exit conditions are true at the same time. The rcuwait_wait_event() picked one option, but we would rather resolve the race in the opposite direction. > And: > > /* Shouldn't be possible - VMA entirely detached, so treat it as such. */ > > Before err = 0? Again though, saying it's "not possible" relies on knowing all the callers of this function behave a particular way, and there's no guarantee they'll continue to do so. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:28 ` Matthew Wilcox @ 2025-11-26 18:43 ` Suren Baghdasaryan 2025-11-26 18:53 ` Vlastimil Babka 2025-11-26 19:00 ` Lorenzo Stoakes 2025-11-26 18:55 ` Lorenzo Stoakes 1 sibling, 2 replies; 14+ messages in thread From: Suren Baghdasaryan @ 2025-11-26 18:43 UTC (permalink / raw) To: Matthew Wilcox Cc: Lorenzo Stoakes, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 6:28 PM Matthew Wilcox <willy@infradead.org> wrote: > > On Wed, Nov 26, 2025 at 06:06:53PM +0000, Lorenzo Stoakes wrote: > > On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: > > > If we get a signal, we need to restore the vm_refcnt. We don't think > > > that the refcount can actually be decremented to zero here as it > > > requires the VMA to be detached, and the vma_mark_detached() uses > > > TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it > > > as if the refcount was zero at the start of this function. > > > > > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > > > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> > > > --- > > > mm/mmap_lock.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > > > index e6e5570d1ec7..3c9bf2f96280 100644 > > > --- a/mm/mmap_lock.c > > > +++ b/mm/mmap_lock.c > > > @@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > > > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > > > state); > > > if (err) { > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > > Really think we should WARN_ON_ONCE() as Vlasta suggested. > > > > It's an 'impossible' situation so we should make that clear. And we should > > find out about it if the impossible happens... :) > > It's only "impossible" currently due to some fairly esoteric reasoning. > As far as _this_ function is concerned, it's entirely possible. > I don't want to leave this trap for the next person who calls > __vma_enter_locked(TASK_KILLABLE). > > > > + /* > > > + * We got a fatal signal, but the last reader went > > > + * away as well. Resolve the race in favour of > > > > This is very subtle, I don't think this really explains this clearly > > enough. > > > > Maybe put something like: > > > > /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ > > > > Before the refcount_sub_and_test() > > I think this falls into the "saying what you're doing, not why > you're doing it" trap. Whereas my comment is at a higher level -- > there's a race where both exit conditions are true at the same time. > The rcuwait_wait_event() picked one option, but we would rather resolve > the race in the opposite direction. > > > And: > > > > /* Shouldn't be possible - VMA entirely detached, so treat it as such. */ > > > > Before err = 0? > > Again though, saying it's "not possible" relies on knowing all the > callers of this function behave a particular way, and there's no > guarantee they'll continue to do so. I agree with Matthew. Returning 0 here would work correctly even if vma_mark_detached() starts using TASK_INTERRUPTIBLE and 0 becomes possible. The comment also seems appropriate to me. Thanks! > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:43 ` Suren Baghdasaryan @ 2025-11-26 18:53 ` Vlastimil Babka 2025-11-26 19:34 ` Matthew Wilcox 2025-11-26 19:00 ` Lorenzo Stoakes 1 sibling, 1 reply; 14+ messages in thread From: Vlastimil Babka @ 2025-11-26 18:53 UTC (permalink / raw) To: Suren Baghdasaryan, Matthew Wilcox Cc: Lorenzo Stoakes, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On 11/26/25 7:43 PM, Suren Baghdasaryan wrote: > On Wed, Nov 26, 2025 at 6:28 PM Matthew Wilcox <willy@infradead.org> wrote: >> >> On Wed, Nov 26, 2025 at 06:06:53PM +0000, Lorenzo Stoakes wrote: >>> On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: >>>> If we get a signal, we need to restore the vm_refcnt. We don't think >>>> that the refcount can actually be decremented to zero here as it >>>> requires the VMA to be detached, and the vma_mark_detached() uses >>>> TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it >>>> as if the refcount was zero at the start of this function. >>>> >>>> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69252076.a70a0220.d98e3.009b.GAE@google.com/ Unfortunately the reported-by code doesn't work as message id >>>> Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") >>>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> >>>> Cc: Suren Baghdasaryan <surenb@google.com> >>>> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> >>>> Cc: Vlastimil Babka <vbabka@suse.cz> >>>> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > Reviewed-by: Suren Baghdasaryan <surenb@google.com> > >>>> --- >>>> mm/mmap_lock.c | 8 ++++++++ >>>> 1 file changed, 8 insertions(+) >>>> >>>> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c >>>> index e6e5570d1ec7..3c9bf2f96280 100644 >>>> --- a/mm/mmap_lock.c >>>> +++ b/mm/mmap_lock.c >>>> @@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, >>>> refcount_read(&vma->vm_refcnt) == tgt_refcnt, >>>> state); >>>> if (err) { >>>> + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >>> >>> Really think we should WARN_ON_ONCE() as Vlasta suggested. Thanks Lorenzo! Can we at least add a "WARN_ON_ONCE(!detaching);" before "err = 0;". >>> It's an 'impossible' situation so we should make that clear. And we should >>> find out about it if the impossible happens... :) >> >> It's only "impossible" currently due to some fairly esoteric reasoning. >> As far as _this_ function is concerned, it's entirely possible. >> I don't want to leave this trap for the next person who calls >> __vma_enter_locked(TASK_KILLABLE). >> >>>> + /* >>>> + * We got a fatal signal, but the last reader went >>>> + * away as well. Resolve the race in favour of >>> >>> This is very subtle, I don't think this really explains this clearly >>> enough. >>> >>> Maybe put something like: >>> >>> /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ >>> >>> Before the refcount_sub_and_test() >> >> I think this falls into the "saying what you're doing, not why >> you're doing it" trap. Whereas my comment is at a higher level -- >> there's a race where both exit conditions are true at the same time. >> The rcuwait_wait_event() picked one option, but we would rather resolve >> the race in the opposite direction. >> >>> And: >>> >>> /* Shouldn't be possible - VMA entirely detached, so treat it as such. */ >>> >>> Before err = 0? >> >> Again though, saying it's "not possible" relies on knowing all the >> callers of this function behave a particular way, and there's no >> guarantee they'll continue to do so. > > I agree with Matthew. Returning 0 here would work correctly even if > vma_mark_detached() starts using TASK_INTERRUPTIBLE and 0 becomes > possible. The comment also seems appropriate to me. > Thanks! > >> ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:53 ` Vlastimil Babka @ 2025-11-26 19:34 ` Matthew Wilcox 0 siblings, 0 replies; 14+ messages in thread From: Matthew Wilcox @ 2025-11-26 19:34 UTC (permalink / raw) To: Vlastimil Babka Cc: Suren Baghdasaryan, Lorenzo Stoakes, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett On Wed, Nov 26, 2025 at 07:53:13PM +0100, Vlastimil Babka wrote: > On 11/26/25 7:43 PM, Suren Baghdasaryan wrote: > > On Wed, Nov 26, 2025 at 6:28 PM Matthew Wilcox <willy@infradead.org> wrote: > >> > >> On Wed, Nov 26, 2025 at 06:06:53PM +0000, Lorenzo Stoakes wrote: > >>> On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: > >>>> If we get a signal, we need to restore the vm_refcnt. We don't think > >>>> that the refcount can actually be decremented to zero here as it > >>>> requires the VMA to be detached, and the vma_mark_detached() uses > >>>> TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it > >>>> as if the refcount was zero at the start of this function. > >>>> > >>>> Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > > Closes: https://lore.kernel.org/all/69252076.a70a0220.d98e3.009b.GAE@google.com/ > > Unfortunately the reported-by code doesn't work as message id Hm, that seems like a bug in syzbot. The mail it sends doesn't use the word "Closes:" anywhere; it just asks for Reported-by: and Fixes: If that's not enough, then it should also ask for Closes. > >>> Really think we should WARN_ON_ONCE() as Vlasta suggested. > > Thanks Lorenzo! > > Can we at least add a "WARN_ON_ONCE(!detaching);" before "err = 0;". That makes sense; it mirrors the similar calls in the callers of __vma_exit_locked(). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:43 ` Suren Baghdasaryan 2025-11-26 18:53 ` Vlastimil Babka @ 2025-11-26 19:00 ` Lorenzo Stoakes 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 19:00 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Matthew Wilcox, Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 06:43:15PM +0000, Suren Baghdasaryan wrote: > I agree with Matthew. Returning 0 here would work correctly even if > vma_mark_detached() starts using TASK_INTERRUPTIBLE and 0 becomes > possible. The comment also seems appropriate to me. See my reply to Matthew. There's a lot of implicit knowledge that you perhaps aren't aware you are assuming _everybody_ will just happen to be aware of. This is natural as the author-of-the-thing ;) I know I've missed things I've happened to assume in my own series before - wood-for-the-trees etc. The fact that 3 (or was it 4?) people missed this bug is a hint as to the subtlety here. Let's just make life easier for ourselves by spelling things out. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:28 ` Matthew Wilcox 2025-11-26 18:43 ` Suren Baghdasaryan @ 2025-11-26 18:55 ` Lorenzo Stoakes 2025-11-26 19:44 ` Matthew Wilcox 1 sibling, 1 reply; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 18:55 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 06:28:13PM +0000, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 06:06:53PM +0000, Lorenzo Stoakes wrote: > > On Wed, Nov 26, 2025 at 05:44:58PM +0000, Matthew Wilcox (Oracle) wrote: > > > If we get a signal, we need to restore the vm_refcnt. We don't think > > > that the refcount can actually be decremented to zero here as it > > > requires the VMA to be detached, and the vma_mark_detached() uses > > > TASK_UNINTERRUPTIBLE. However, that's a bit subtle, so handle it > > > as if the refcount was zero at the start of this function. > > > > > > Reported-by: syzbot+5b19bad23ac7f44bf8b8@syzkaller.appspotmail.com > > > Fixes: 2197bb60f890 ("mm: add vma_start_write_killable()") > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> > > > Cc: Suren Baghdasaryan <surenb@google.com> > > > Cc: Liam R. Howlett <Liam.Howlett@oracle.com> > > > Cc: Vlastimil Babka <vbabka@suse.cz> > > > Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > > > --- > > > mm/mmap_lock.c | 8 ++++++++ > > > 1 file changed, 8 insertions(+) > > > > > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c > > > index e6e5570d1ec7..3c9bf2f96280 100644 > > > --- a/mm/mmap_lock.c > > > +++ b/mm/mmap_lock.c > > > @@ -74,6 +74,14 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma, > > > refcount_read(&vma->vm_refcnt) == tgt_refcnt, > > > state); > > > if (err) { > > > + if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > > > Really think we should WARN_ON_ONCE() as Vlasta suggested. > > > > It's an 'impossible' situation so we should make that clear. And we should > > find out about it if the impossible happens... :) > > It's only "impossible" currently due to some fairly esoteric reasoning. > As far as _this_ function is concerned, it's entirely possible. > I don't want to leave this trap for the next person who calls > __vma_enter_locked(TASK_KILLABLE). Calls __vma_enter_locked(TASK_KILLABLE) _when detaching_, otherwise refcount will always be >0. So we're only looking at us changing vma_mark_detached() to use TASK_KILLABLE. As this is such a subtle corner case I still think it warrants a warning. Or at least a VM_WARN_ON_ONCE(1). A killable detacher is, as Vlasta points out, kind of an unwise thing to do anyway right? > > > > + /* > > > + * We got a fatal signal, but the last reader went > > > + * away as well. Resolve the race in favour of > > > > This is very subtle, I don't think this really explains this clearly > > enough. > > > > Maybe put something like: > > > > /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ > > > > Before the refcount_sub_and_test() > > I think this falls into the "saying what you're doing, not why > you're doing it" trap. Whereas my comment is at a higher level -- > there's a race where both exit conditions are true at the same time. > The rcuwait_wait_event() picked one option, but we would rather resolve > the race in the opposite direction. I find your comment unclear, and I think it's too succinct. I was trying to provide the most succinct-yet-still-clear example, but if you prefer higher level you're going to need more detail here. It assumes you 'just know' that: - refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt) means unlock - err can only be set due to a fatal signal in a non-uninterruptible task mode - spurious readers can cause an incremented reference count - that a race can exist between a spuriously raised reference count and the previous reference count check between read above and refcount subtract here - a reference count of 0 means detached - err = 0 means we are treating this VMA as detached resolving this race 'in favour of' the VMA being detached. Let's get some of this information in here please. > > > And: > > > > /* Shouldn't be possible - VMA entirely detached, so treat it as such. */ > > > > Before err = 0? > > Again though, saying it's "not possible" relies on knowing all the > callers of this function behave a particular way, and there's no > guarantee they'll continue to do so. Again I think we'd be better off with at least a VM_WARN_ON_ONCE() given this is a rather obscure corner case. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 18:55 ` Lorenzo Stoakes @ 2025-11-26 19:44 ` Matthew Wilcox 2025-11-26 20:33 ` Lorenzo Stoakes 0 siblings, 1 reply; 14+ messages in thread From: Matthew Wilcox @ 2025-11-26 19:44 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 06:55:52PM +0000, Lorenzo Stoakes wrote: > > It's only "impossible" currently due to some fairly esoteric reasoning. > > As far as _this_ function is concerned, it's entirely possible. > > I don't want to leave this trap for the next person who calls > > __vma_enter_locked(TASK_KILLABLE). > > Calls __vma_enter_locked(TASK_KILLABLE) _when detaching_, otherwise > refcount will always be >0. > > So we're only looking at us changing vma_mark_detached() to use > TASK_KILLABLE. > > As this is such a subtle corner case I still think it warrants a > warning. Or at least a VM_WARN_ON_ONCE(1). > > A killable detacher is, as Vlasta points out, kind of an unwise thing to do > anyway right? I missed where that was said? > > > > > > + /* > > > > + * We got a fatal signal, but the last reader went > > > > + * away as well. Resolve the race in favour of > > > > > > This is very subtle, I don't think this really explains this clearly > > > enough. > > > > > > Maybe put something like: > > > > > > /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ > > > > > > Before the refcount_sub_and_test() > > > > I think this falls into the "saying what you're doing, not why > > you're doing it" trap. Whereas my comment is at a higher level -- > > there's a race where both exit conditions are true at the same time. > > The rcuwait_wait_event() picked one option, but we would rather resolve > > the race in the opposite direction. > > I find your comment unclear, and I think it's too succinct. I was trying to > provide the most succinct-yet-still-clear example, but if you prefer higher > level you're going to need more detail here. > > It assumes you 'just know' that: > > - refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt) means unlock Actually, I don't know that. All I know is local to this function -- that's the value we added earlier before waiting; now we need to subtract it since we're no longer waiting. > - err can only be set due to a fatal signal in a non-uninterruptible task > mode The comment says that in the first five words! > - spurious readers can cause an incremented reference count I don't know what a "spurious reader" is. There was a reader when we started waiting. Now there isn't one. > - that a race can exist between a spuriously raised reference count and the > previous reference count check between read above and refcount subtract here > > - a reference count of 0 means detached > > - err = 0 means we are treating this VMA as detached resolving this race > 'in favour of' the VMA being detached. > > Let's get some of this information in here please. I don't think that here is the place to document these things! And certainly not in a patch that we're trying to get applied five days before the merge window opens. There's plenty of time to get the comments and the variable names sorted out; can we focus on the right way to fix this bug? > Again I think we'd be better off with at least a VM_WARN_ON_ONCE() given > this is a rather obscure corner case. Are you satisfied with the WARN_ON(!detaching)? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 19:44 ` Matthew Wilcox @ 2025-11-26 20:33 ` Lorenzo Stoakes 2025-11-26 20:35 ` Lorenzo Stoakes 2025-11-26 22:09 ` Matthew Wilcox 0 siblings, 2 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 20:33 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 07:44:17PM +0000, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 06:55:52PM +0000, Lorenzo Stoakes wrote: > > > It's only "impossible" currently due to some fairly esoteric reasoning. > > > As far as _this_ function is concerned, it's entirely possible. > > > I don't want to leave this trap for the next person who calls > > > __vma_enter_locked(TASK_KILLABLE). > > > > Calls __vma_enter_locked(TASK_KILLABLE) _when detaching_, otherwise > > refcount will always be >0. > > > > So we're only looking at us changing vma_mark_detached() to use > > TASK_KILLABLE. > > > > As this is such a subtle corner case I still think it warrants a > > warning. Or at least a VM_WARN_ON_ONCE(1). > > > > A killable detacher is, as Vlasta points out, kind of an unwise thing to do > > anyway right? > > I missed where that was said? "Yeah I guess it's for the best to keep vma_mark_detached() use the TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching would be counter productive." https://lore.kernel.org/all/058f5858-f508-40f8-adfe-e5de78621d64@suse.cz/ > > > > > > > > > + /* > > > > > + * We got a fatal signal, but the last reader went > > > > > + * away as well. Resolve the race in favour of > > > > > > > > This is very subtle, I don't think this really explains this clearly > > > > enough. > > > > > > > > Maybe put something like: > > > > > > > > /* Couldn't wait on readers probably due to a fatal signal, so unlock. */ > > > > > > > > Before the refcount_sub_and_test() > > > > > > I think this falls into the "saying what you're doing, not why > > > you're doing it" trap. Whereas my comment is at a higher level -- > > > there's a race where both exit conditions are true at the same time. > > > The rcuwait_wait_event() picked one option, but we would rather resolve > > > the race in the opposite direction. > > > > I find your comment unclear, and I think it's too succinct. I was trying to > > provide the most succinct-yet-still-clear example, but if you prefer higher > > level you're going to need more detail here. > > > > It assumes you 'just know' that: > > > > - refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt) means unlock > > Actually, I don't know that. All I know is local to this function -- > that's the value we added earlier before waiting; now we need to > subtract it since we're no longer waiting. But why would you add/subtract VMA_LOCK_OFFSET? To, as the name suggests, lock/unlock the VMA. If we want to go with 'why' instead of 'what' that's useful information. I guess you can cutely surmise that 'yes we're undoing what we did'. I don't think it's going to hurt to explain what that is. > > > - err can only be set due to a fatal signal in a non-uninterruptible task > > mode > > The comment says that in the first five words! You didn't say that err can only be non-zero because of a fatal signal in rcuwait_wait_event()? You said 'we got a fatal signal'. I had to go dig into that code to see where that came from... It's super succinct. That's cute, sure. But it's not clear. > > > - spurious readers can cause an incremented reference count > > I don't know what a "spurious reader" is. There was a reader when we > started waiting. Now there isn't one. Hmm actually there are two routes here... one with real + spurious reader refcount increments + one with only spurious. You see this is why I think clarity is needed, there's _so much going on_. Anyway, there are two routes to __vma_enter_locked(): __vma_start_write(): [ mmap write lock held ] -> __vma_enter_locked() In this case, you can be waiting on actual readers. vma_mark_detached(): [ mmap write lock, vma write lock held ] -> __vma_enter_locked() In this case, as per the comment: /* * We are the only writer, so no need to use vma_refcount_put(). * The condition below is unlikely because the vma has been already * write-locked and readers can increment vm_refcnt only temporarily * before they check vm_lock_seq, realize the vma is locked and drop * back the vm_refcnt. That is a narrow window for observing a raised * vm_refcnt. */ So there's a narrow window in which readers 'spuriously' or you could say temporarily increment vma->vm_refcnt before realising the seqcount indicates a write lock and decrementing again. The _only way_ we encounter the issue you are writing defensive code against here is: - There was 1 spurious or non-spurious VMA reader. - A fatal signal arose (assuming nobody ever goes and changes rcuwait_wait_event() to add more errors - very likely, not entirely certain though, so perhaps 'an error that meant we couldn't wait'.) - Despite the waiter aborting, the readers finished. - The VMA is detached (i.e. vma->vm_refcnt is 0 or refcount_sub_and_test() wouldn't return true) - Setting err = 0 indicates that we are now resolving this by treating the VMA as detached. And sure you're mentioning a couple words in reference to fatal signal, a mention of last reader went away, and detach - but does any of that help clarify what any of this actually does? In practice I read this comment and absolutely understood nothing. I don't think it even provides a hint. Without reverse engineering the whole thing I wouldn't know what this meant, it just assumes too much. > > > - that a race can exist between a spuriously raised reference count and the > > previous reference count check between read above and refcount subtract here > > > > - a reference count of 0 means detached > > > > - err = 0 means we are treating this VMA as detached resolving this race > > 'in favour of' the VMA being detached. > > > > Let's get some of this information in here please. > > I don't think that here is the place to document these things! And > certainly not in a patch that we're trying to get applied five days > before the merge window opens. There's plenty of time to get the I mean I could argue being stubborn about reasonable requests rather contradicts the rush to get this in, so that goes both ways... ...But I propose a compromise below to speed this up. > comments and the variable names sorted out; can we focus on the right > way to fix this bug? Since you're concerned about the urgency, let me suggest a compromise: /* * We tried waiting on readers, but failed, likely due to a fatal * signal arising. Unlock the VMA and check whether the VMA is * detached. */ if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { /* * If the VMA is now detached which means we lost a race. * Let the caller know the VMA is detached. */ err = 0; } That gives a _lot_ more information, keeps it relatively top-level, doesn't make undue assumptions etc. It also puts the broader comment about what you're doing in the right place, and makes the 'weird thing that should never happen' comment more specific. > > > Again I think we'd be better off with at least a VM_WARN_ON_ONCE() given > > this is a rather obscure corner case. > > Are you satisfied with the WARN_ON(!detaching)? > It'd be super weird to reach that code when not detaching so sure, think it should be VM_WARN_ON() though since the code would be horribly broken if that was not the case already no? For the sake of not dragging this out longer this I guess we can do without the broader WARN_ON() or leave it until later. But that comment needs updating. Thanks, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 20:33 ` Lorenzo Stoakes @ 2025-11-26 20:35 ` Lorenzo Stoakes 2025-11-26 22:09 ` Matthew Wilcox 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-26 20:35 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 08:33:47PM +0000, Lorenzo Stoakes wrote: > /* > * We tried waiting on readers, but failed, likely due to a fatal > * signal arising. Unlock the VMA and check whether the VMA is > * detached. > */ > if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > /* > * If the VMA is now detached which means we lost a race. Obviously should read 'The VMA is now detached, ...' :) > * Let the caller know the VMA is detached. > */ > err = 0; > } ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 20:33 ` Lorenzo Stoakes 2025-11-26 20:35 ` Lorenzo Stoakes @ 2025-11-26 22:09 ` Matthew Wilcox 2025-11-27 6:26 ` Lorenzo Stoakes 2025-11-27 9:05 ` Vlastimil Babka 1 sibling, 2 replies; 14+ messages in thread From: Matthew Wilcox @ 2025-11-26 22:09 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka On Wed, Nov 26, 2025 at 08:33:47PM +0000, Lorenzo Stoakes wrote: > On Wed, Nov 26, 2025 at 07:44:17PM +0000, Matthew Wilcox wrote: > > On Wed, Nov 26, 2025 at 06:55:52PM +0000, Lorenzo Stoakes wrote: > > > > It's only "impossible" currently due to some fairly esoteric reasoning. > > > > As far as _this_ function is concerned, it's entirely possible. > > > > I don't want to leave this trap for the next person who calls > > > > __vma_enter_locked(TASK_KILLABLE). > > > > > > Calls __vma_enter_locked(TASK_KILLABLE) _when detaching_, otherwise > > > refcount will always be >0. > > > > > > So we're only looking at us changing vma_mark_detached() to use > > > TASK_KILLABLE. > > > > > > As this is such a subtle corner case I still think it warrants a > > > warning. Or at least a VM_WARN_ON_ONCE(1). > > > > > > A killable detacher is, as Vlasta points out, kind of an unwise thing to do > > > anyway right? > > > > I missed where that was said? > > "Yeah I guess it's for the best to keep vma_mark_detached() use the > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching > would be counter productive." > > https://lore.kernel.org/all/058f5858-f508-40f8-adfe-e5de78621d64@suse.cz/ I'm not entirely clear on why aborting a detach is always a bad idea, but that's part of the MM I don't really understand. > - A fatal signal arose (assuming nobody ever goes and changes > rcuwait_wait_event() to add more errors - very likely, not entirely certain > though, so perhaps 'an error that meant we couldn't wait'.) It actually doesn't matter why we got an error. We got an error. But also the last reader went away. So we're now in a state where we would not have needed to sleep had we got here half a nanosecond later than we did. > Since you're concerned about the urgency, let me suggest a compromise: > > /* > * We tried waiting on readers, but failed, likely due to a fatal > * signal arising. Unlock the VMA and check whether the VMA is > * detached. > */ I think the 'if (err)' is enough to tell the reader that we failed! > if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > /* > * If the VMA is now detached which means we lost a race. > * Let the caller know the VMA is detached. > */ > err = 0; > } > > That gives a _lot_ more information, keeps it relatively top-level, doesn't > make undue assumptions etc. Here's what I now have: if (err) { if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { /* * The wait failed, but the last reader went away * as well. Tell the caller the VMA is detached. */ WARN_ON_ONCE(!detaching); err = 0; } > > Are you satisfied with the WARN_ON(!detaching)? > > > > It'd be super weird to reach that code when not detaching so sure, think it > should be VM_WARN_ON() though since the code would be horribly broken if > that was not the case already no? The other places in this file are WARN_ON_ONCE rather than VM_WARN*, so keep it consistent. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 22:09 ` Matthew Wilcox @ 2025-11-27 6:26 ` Lorenzo Stoakes 2025-11-27 9:05 ` Vlastimil Babka 1 sibling, 0 replies; 14+ messages in thread From: Lorenzo Stoakes @ 2025-11-27 6:26 UTC (permalink / raw) To: Matthew Wilcox Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett, Vlastimil Babka Feel free to send what you have below and you can add: Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> On Wed, Nov 26, 2025 at 10:09:36PM +0000, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 08:33:47PM +0000, Lorenzo Stoakes wrote: > > On Wed, Nov 26, 2025 at 07:44:17PM +0000, Matthew Wilcox wrote: > > > On Wed, Nov 26, 2025 at 06:55:52PM +0000, Lorenzo Stoakes wrote: > > > > > It's only "impossible" currently due to some fairly esoteric reasoning. > > > > > As far as _this_ function is concerned, it's entirely possible. > > > > > I don't want to leave this trap for the next person who calls > > > > > __vma_enter_locked(TASK_KILLABLE). > > > > > > > > Calls __vma_enter_locked(TASK_KILLABLE) _when detaching_, otherwise > > > > refcount will always be >0. > > > > > > > > So we're only looking at us changing vma_mark_detached() to use > > > > TASK_KILLABLE. > > > > > > > > As this is such a subtle corner case I still think it warrants a > > > > warning. Or at least a VM_WARN_ON_ONCE(1). > > > > > > > > A killable detacher is, as Vlasta points out, kind of an unwise thing to do > > > > anyway right? > > > > > > I missed where that was said? > > > > "Yeah I guess it's for the best to keep vma_mark_detached() use the > > TASK_UNINTERRUPTIBLE variant, maybe document why. Aborting the detaching > > would be counter productive." > > > > https://lore.kernel.org/all/058f5858-f508-40f8-adfe-e5de78621d64@suse.cz/ > > I'm not entirely clear on why aborting a detach is always a bad idea, > but that's part of the MM I don't really understand. Ack insert moany waffle about this impl. being confusing for _everybody_ here :) > > > - A fatal signal arose (assuming nobody ever goes and changes > > rcuwait_wait_event() to add more errors - very likely, not entirely certain > > though, so perhaps 'an error that meant we couldn't wait'.) > > It actually doesn't matter why we got an error. We got an error. > But also the last reader went away. So we're now in a state where we > would not have needed to sleep had we got here half a nanosecond later > than we did. Right, sure. > > > Since you're concerned about the urgency, let me suggest a compromise: > > > > /* > > * We tried waiting on readers, but failed, likely due to a fatal > > * signal arising. Unlock the VMA and check whether the VMA is > > * detached. > > */ > > I think the 'if (err)' is enough to tell the reader that we failed! Yup, but not the unlocking... I can address that later in the series I'm inevitably going to end up sending to improve this file :) > > > if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > > /* > > * If the VMA is now detached which means we lost a race. > > * Let the caller know the VMA is detached. > > */ > > err = 0; > > } > > > > That gives a _lot_ more information, keeps it relatively top-level, doesn't > > make undue assumptions etc. > > Here's what I now have: > > if (err) { > if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > /* > * The wait failed, but the last reader went away > * as well. Tell the caller the VMA is detached. > */ > WARN_ON_ONCE(!detaching); > err = 0; > } OK this is fine, respin or ask Andrew to fix-patch and have a tag :) > > > > Are you satisfied with the WARN_ON(!detaching)? > > > > > > > It'd be super weird to reach that code when not detaching so sure, think it > > should be VM_WARN_ON() though since the code would be horribly broken if > > that was not the case already no? > > The other places in this file are WARN_ON_ONCE rather than VM_WARN*, so > keep it consistent. > Fine, sure. Cheers, Lorenzo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v2] mm: fix vma_start_write_killable() signal handling 2025-11-26 22:09 ` Matthew Wilcox 2025-11-27 6:26 ` Lorenzo Stoakes @ 2025-11-27 9:05 ` Vlastimil Babka 1 sibling, 0 replies; 14+ messages in thread From: Vlastimil Babka @ 2025-11-27 9:05 UTC (permalink / raw) To: Matthew Wilcox, Lorenzo Stoakes Cc: Andrew Morton, linux-mm, syzbot+5b19bad23ac7f44bf8b8, Suren Baghdasaryan, Liam R. Howlett On 11/26/25 23:09, Matthew Wilcox wrote: > On Wed, Nov 26, 2025 at 08:33:47PM +0000, Lorenzo Stoakes wrote: >> if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { >> /* >> * If the VMA is now detached which means we lost a race. >> * Let the caller know the VMA is detached. >> */ >> err = 0; >> } >> >> That gives a _lot_ more information, keeps it relatively top-level, doesn't >> make undue assumptions etc. > > Here's what I now have: > > if (err) { > if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) { > /* > * The wait failed, but the last reader went away > * as well. Tell the caller the VMA is detached. > */ > WARN_ON_ONCE(!detaching); > err = 0; > } Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Thanks. > >> > Are you satisfied with the WARN_ON(!detaching)? >> > >> >> It'd be super weird to reach that code when not detaching so sure, think it >> should be VM_WARN_ON() though since the code would be horribly broken if >> that was not the case already no? > > The other places in this file are WARN_ON_ONCE rather than VM_WARN*, so > keep it consistent. ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-11-27 9:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-11-26 17:44 [PATCH v2] mm: fix vma_start_write_killable() signal handling Matthew Wilcox (Oracle) 2025-11-26 18:06 ` Lorenzo Stoakes 2025-11-26 18:28 ` Matthew Wilcox 2025-11-26 18:43 ` Suren Baghdasaryan 2025-11-26 18:53 ` Vlastimil Babka 2025-11-26 19:34 ` Matthew Wilcox 2025-11-26 19:00 ` Lorenzo Stoakes 2025-11-26 18:55 ` Lorenzo Stoakes 2025-11-26 19:44 ` Matthew Wilcox 2025-11-26 20:33 ` Lorenzo Stoakes 2025-11-26 20:35 ` Lorenzo Stoakes 2025-11-26 22:09 ` Matthew Wilcox 2025-11-27 6:26 ` Lorenzo Stoakes 2025-11-27 9:05 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox