linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper
@ 2026-01-22 13:01 Lorenzo Stoakes
  2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
                   ` (10 more replies)
  0 siblings, 11 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
be changed underneath us. This will be the case if EITHER the VMA lock or
the mmap lock is held.

We already open-code this in two places - anon_vma_name() in mm/madvise.c
and vma_flag_set_atomic() in include/linux/mm.h.

This series adds vma_assert_stablised() which abstract this can be used in
these callsites instead.

This implementation uses lockdep where possible - that is VMA read locks -
which correctly track read lock acquisition/release via:

vma_start_read() ->
rwsem_acquire_read()

vma_start_read_locked() ->
vma_start_read_locked_nested() ->
rwsem_acquire_read()

And:

vma_end_read() ->
vma_refcount_put() ->
rwsem_release()

We don't track the VMA locks using lockdep for VMA write locks, however
these are predicated upon mmap write locks whose lockdep state we do track,
and additionally vma_assert_stabillised() asserts this check if VMA read
lock is not held, so we get lockdep coverage in this case also.

We also add extensive comments to describe what we're doing.

There's some tricky stuff around mmap locking and stabilisation races that
we have to be careful of that I describe in the patch introducing
vma_assert_stabilised().

This change also lays the foundation for future series to add this assert
in further places where we wish to make it clear that we rely upon a
stabilised VMA.

The motivation for this change was precisely this.

Addiitonally, refactor the VMA locks logic to be clearer, less confusing,
self-documenting as far as possible and more easily extendable and
debuggable in future.


v3:
* Added 8 patches of refactoring the VMA lock implementation :)
* Dropped the vma_is_*locked() predicates as too difficult to get entirely
  right.
* Updated vma_assert_locked() to assert what we sensibly can, use lockdep
  if possible and invoke vma_assert_write_locked() to share code as before.
* Took into account extensive feedback received from Vlastimil (thanks! :)

v2:
* Added lockdep as much as possible to the mix as per Peter and Sebastian.
* Added comments to make clear what we're doing in each case.
* I realise I made a mistake in saying the previous duplicative VMA stable
  asserts were wrong - vma_assert_locked() is not a no-op if
  !CONFIG_PER_VMA_LOCK, instead it degrades to asserting that the mmap lock
  is held, so this is correct, though means we'd have checked this twice,
  only triggering an assert the second time.
* Accounted for is_vma_writer_only() case in vma_is_read_locked().
* Accounted for two hideous issues - we cannot check VMA lock first,
  because we may be holding a VMA write lock and be raced by VMA readers of
  _other_ VMA's. If we check the mmap lock first and assert, we may hold a
  VMA read lock and race other threads which hodl the mmap read lock and
  fail an assert. We resolve this by a precise mmap ownership check if
  lockdep is used, and allowing the check to be approximate if no lockdep.
* Added more comments and updated commit logs.
* Dropped Suren's Suggested-by as significant changes in this set (this was for
  the vma_is_read_locked() as a concept).
https://lore.kernel.org/all/cover.1768855783.git.lorenzo.stoakes@oracle.com/

v1:
https://lore.kernel.org/all/cover.1768569863.git.lorenzo.stoakes@oracle.com/


Lorenzo Stoakes (10):
  mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  mm/vma: document possible vma->vm_refcnt values and reference comment
  mm/vma: rename is_vma_write_only(), separate out shared refcount put
  mm/vma: add+use vma lockdep acquire/release defines
  mm/vma: de-duplicate __vma_enter_locked() error path
  mm/vma: clean up __vma_enter/exit_locked()
  mm/vma: introduce helper struct + thread through exclusive lock fns
  mm/vma: improve and document __is_vma_write_locked()
  mm/vma: update vma_assert_locked() to use lockdep
  mm/vma: add and use vma_assert_stabilised()

 include/linux/mm.h        |   5 +-
 include/linux/mm_types.h  |  54 ++++++++-
 include/linux/mmap_lock.h | 223 ++++++++++++++++++++++++++++++++++----
 mm/madvise.c              |   4 +-
 mm/mmap_lock.c            | 180 ++++++++++++++++++++----------
 5 files changed, 373 insertions(+), 93 deletions(-)

--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 16:26   ` Vlastimil Babka
  2026-01-22 16:37   ` Suren Baghdasaryan
  2026-01-22 13:01 ` [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
order to indicate that a VMA is in the process of having VMA read-locks
excluded in __vma_enter_locked() (that is, first checking if there are any
VMA read locks held, and if there are, waiting on them to be released).

This happens when a VMA write lock is being established, or a VMA is being
marked detached and discovers that the VMA reference count is elevated due
to read-locks temporarily elevating the reference count only to discover a
VMA write lock is in place.

The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
from the newly introduced VMA_*_BIT flags).

Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.

Update comments to reflect this.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm_types.h  | 17 +++++++++++++----
 include/linux/mmap_lock.h | 14 ++++++++------
 mm/mmap_lock.c            | 17 ++++++++++-------
 3 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 78950eb8926d..94de392ed3c5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -752,8 +752,17 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
 }
 #endif

-#define VMA_LOCK_OFFSET	0x40000000
-#define VMA_REF_LIMIT	(VMA_LOCK_OFFSET - 1)
+/*
+ * WHile __vma_enter_locked() is working to ensure are no read-locks held on a
+ * VMA (either while acquiring a VMA write lock or marking a VMA detached) we
+ * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
+ * vma_start_read() that the reference count should be left alone.
+ *
+ * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
+ */
+#define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
+#define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
+#define VM_REFCNT_LIMIT			(VM_REFCNT_EXCLUDE_READERS_FLAG - 1)

 struct vma_numab_state {
 	/*
@@ -935,10 +944,10 @@ struct vm_area_struct {
 	/*
 	 * Can only be written (using WRITE_ONCE()) while holding both:
 	 *  - mmap_lock (in write mode)
-	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set
+	 *  - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_FLAG is set
 	 * Can be read reliably while holding one of:
 	 *  - mmap_lock (in read or write mode)
-	 *  - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1
+	 *  - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_BIT is set or vm_refcnt > 1
 	 * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
 	 * while holding nothing (except RCU to keep the VMA struct allocated).
 	 *
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index b50416fbba20..5acbd4ba1b52 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -125,12 +125,14 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
 static inline bool is_vma_writer_only(int refcnt)
 {
 	/*
-	 * With a writer and no readers, refcnt is VMA_LOCK_OFFSET if the vma
-	 * is detached and (VMA_LOCK_OFFSET + 1) if it is attached. Waiting on
-	 * a detached vma happens only in vma_mark_detached() and is a rare
-	 * case, therefore most of the time there will be no unnecessary wakeup.
+	 * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
+	 * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
+	 * attached. Waiting on a detached vma happens only in
+	 * vma_mark_detached() and is a rare case, therefore most of the time
+	 * there will be no unnecessary wakeup.
 	 */
-	return (refcnt & VMA_LOCK_OFFSET) && refcnt <= VMA_LOCK_OFFSET + 1;
+	return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
+		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
 }

 static inline void vma_refcount_put(struct vm_area_struct *vma)
@@ -159,7 +161,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int

 	mmap_assert_locked(vma->vm_mm);
 	if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
-							      VMA_REF_LIMIT)))
+							      VM_REFCNT_LIMIT)))
 		return false;

 	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 7421b7ea8001..1d23b48552e9 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -54,7 +54,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		bool detaching, int state)
 {
 	int err;
-	unsigned int tgt_refcnt = VMA_LOCK_OFFSET;
+	unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;

 	mmap_assert_write_locked(vma->vm_mm);

@@ -66,7 +66,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 	 * If vma is detached then only vma_mark_attached() can raise the
 	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
 	 */
-	if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
+	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
 		return 0;

 	rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
@@ -74,7 +74,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
 	if (err) {
-		if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+		if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
 			/*
 			 * The wait failed, but the last reader went away
 			 * as well.  Tell the caller the VMA is detached.
@@ -92,7 +92,8 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,

 static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
 {
-	*detached = refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt);
+	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					  &vma->vm_refcnt);
 	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
 }

@@ -180,13 +181,15 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
 	}

 	/*
-	 * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
-	 * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
+	 * If VM_REFCNT_EXCLUDE_READERS_FLAG is set,
+	 * __refcount_inc_not_zero_limited_acquire() will fail because
+	 * VM_REFCNT_LIMIT is less than VM_REFCNT_EXCLUDE_READERS_FLAG.
+	 *
 	 * Acquire fence is required here to avoid reordering against later
 	 * vm_lock_seq check and checks inside lock_vma_under_rcu().
 	 */
 	if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
-							      VMA_REF_LIMIT))) {
+							      VM_REFCNT_LIMIT))) {
 		/* return EAGAIN if vma got detached from under us */
 		vma = oldcnt ? NULL : ERR_PTR(-EAGAIN);
 		goto err;
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
  2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 16:48   ` Vlastimil Babka
  2026-01-22 13:01 ` [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

The possible vma->vm_refcnt values are confusing and vague, explain in
detail what these can be in a comment describing the vma->vm_refcnt field
and reference this comment in various places that read/write this field.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
 include/linux/mmap_lock.h |  7 +++++++
 mm/mmap_lock.c            |  6 ++++++
 3 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 94de392ed3c5..e5ee66f84d9a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
  * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
  * vma_start_read() that the reference count should be left alone.
  *
- * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
+ * See the comment describing vm_refcnt in vm_area_struct for details as to
+ * which values the VMA reference count can be.
  */
 #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
 #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
@@ -989,7 +990,41 @@ struct vm_area_struct {
 	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
 #endif
 #ifdef CONFIG_PER_VMA_LOCK
-	/* Unstable RCU readers are allowed to read this. */
+	/*
+	 * Used to keep track of the number of references taken by VMA read or
+	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
+	 * indicating that a thread has entered __vma_enter_locked() and is
+	 * waiting on any outstanding read locks to exit.
+	 *
+	 * This value can be equal to:
+	 *
+	 * 0 - Detached.
+	 *
+	 * 1 - Unlocked or write-locked.
+	 *
+	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
+	 * write-locked with other threads having temporarily incremented the
+	 * reference count prior to determining it is write-locked and
+	 * decrementing it again.
+	 *
+	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
+	 * __vma_exit_locked() completion which will decrement the reference
+	 * count to zero. IMPORTANT - at this stage no further readers can
+	 * increment the reference count. It can only be reduced.
+	 *
+	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
+	 * __vma_exit_locked() completion which will decrement the reference
+	 * count to one, OR a detached VMA waiting on a single spurious reader
+	 * to decrement reference count. IMPORTANT - as above, no further
+	 * readers can increment the reference count.
+	 *
+	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
+	 * whether it is attempting to acquire a write lock or attempting to
+	 * detach. IMPORTANT - as above, no ruther readers can increment the
+	 * reference count.
+	 *
+	 * NOTE: Unstable RCU readers are allowed to read this.
+	 */
 	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 	struct lockdep_map vmlock_dep_map;
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 5acbd4ba1b52..a764439d0276 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
 	 * attached. Waiting on a detached vma happens only in
 	 * vma_mark_detached() and is a rare case, therefore most of the time
 	 * there will be no unnecessary wakeup.
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
 		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
@@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
 	unsigned int mm_lock_seq;

+	/*
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
+	 */
 	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
 		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
 }
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 1d23b48552e9..75dc098aea14 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 	/*
 	 * If vma is detached then only vma_mark_attached() can raise the
 	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
 		return 0;
@@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	 * before they check vm_lock_seq, realize the vma is locked and drop
 	 * back the vm_refcnt. That is a narrow window for observing a raised
 	 * vm_refcnt.
+	 *
+	 * See the comment describing the vm_area_struct->vm_refcnt field for
+	 * details of possible refcnt values.
 	 */
 	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
 		/* Wait until vma is detached with no readers. */
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
  2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
  2026-01-22 13:01 ` [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 17:36   ` Vlastimil Babka
  2026-01-22 13:01 ` [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

The is_vma_writer_only() function is misnamed - this isn't determining if
there is only a write lock, as it checks for the presence of the
VM_REFCNT_EXCLUDE_READERS_FLAG.

Really, it is checking to see whether readers are excluded, with a
possibility of a false positive in the case of a detachment (there we
expect the vma->vm_refcnt to eventually be set to
VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).

Rename the function accordingly.

Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
in vma_refcount_put(), using the old value to determine what the reference
count ought to be after the operation is complete (ignoring racing
reference count adjustments).

Wrap this into a __vma_refcount_put() function, which we can then utilise
in vma_mark_detached() and thus keep the refcount primitive usage
abstracted.

Also adjust comments, removing duplicative comments covered elsewhere and
adding more to aid understanding.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
 mm/mmap_lock.c            | 18 +++++-------
 2 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index a764439d0276..0b3614aadbb4 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
 	vma->vm_lock_seq = UINT_MAX;
 }

-static inline bool is_vma_writer_only(int refcnt)
+/**
+ * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
+ * excluded all VMA read locks.
+ * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
+ *
+ * We may be raced by other readers temporarily incrementing the reference
+ * count, though the race window is very small, this might cause spurious
+ * wakeups.
+ *
+ * In the case of a detached VMA, we may incorrectly indicate that readers are
+ * excluded when one remains, because in that scenario we target a refcount of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
+ *
+ * However, the race window for that is very small so it is unlikely.
+ *
+ * Returns: true if readers are excluded, false otherwise.
+ */
+static inline bool are_readers_excluded(int refcnt)
 {
 	/*
-	 * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
-	 * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
-	 * attached. Waiting on a detached vma happens only in
-	 * vma_mark_detached() and is a rare case, therefore most of the time
-	 * there will be no unnecessary wakeup.
-	 *
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
@@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
 		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
 }

+static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
+{
+	int oldcnt;
+	bool detached;
+
+	detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
+	if (refcnt)
+		*refcnt = oldcnt - 1;
+	return detached;
+}
+
+/**
+ * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
+ * read-lock being dropped.
+ * @vma: The VMA whose reference count we wish to decrement.
+ *
+ * If we were the last reader, wake up threads waiting to obtain an exclusive
+ * lock.
+ */
 static inline void vma_refcount_put(struct vm_area_struct *vma)
 {
-	/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+	/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
 	struct mm_struct *mm = vma->vm_mm;
-	int oldcnt;
+	int refcnt;
+	bool detached;

 	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
-	if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {

-		if (is_vma_writer_only(oldcnt - 1))
-			rcuwait_wake_up(&mm->vma_writer_wait);
-	}
+	detached = __vma_refcount_put(vma, &refcnt);
+	/*
+	 * __vma_enter_locked() may be sleeping waiting for readers to drop
+	 * their reference count, so wake it up if we were the last reader
+	 * blocking it from being acquired.
+	 */
+	if (!detached && are_readers_excluded(refcnt))
+		rcuwait_wake_up(&mm->vma_writer_wait);
 }

 /*
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 75dc098aea14..ebacb57e5f16 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);

 void vma_mark_detached(struct vm_area_struct *vma)
 {
+	bool detached;
+
 	vma_assert_write_locked(vma);
 	vma_assert_attached(vma);

 	/*
-	 * We are the only writer, so no need to use vma_refcount_put().
-	 * The condition below is unlikely because the vma has been already
-	 * write-locked and readers can increment vm_refcnt only temporarily
-	 * before they check vm_lock_seq, realize the vma is locked and drop
-	 * back the vm_refcnt. That is a narrow window for observing a raised
-	 * vm_refcnt.
-	 *
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
+	detached = __vma_refcount_put(vma, NULL);
+	if (unlikely(!detached)) {
 		/* Wait until vma is detached with no readers. */
 		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
-			bool detached;
-
+			/*
+			 * Once this is complete, no readers can increment the
+			 * reference count, and the VMA is marked detached.
+			 */
 			__vma_exit_locked(vma, &detached);
 			WARN_ON_ONCE(!detached);
 		}
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (2 preceding siblings ...)
  2026-01-22 13:01 ` [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 19:32   ` Suren Baghdasaryan
  2026-01-23  8:48   ` Vlastimil Babka
  2026-01-22 13:01 ` [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

The code is littered with inscrutable and duplicative lockdep incantations,
replace these with defines which explain what is going on and add
commentary to explain what we're doing.

If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
remains meaningful.

These are self-documenting and aid readability of the code.

Additionally, instead of using the confusing rwsem_*() form for something
that is emphatically not an rwsem, we instead explicitly use
lock_[acquired, release]_shared/exclusive() lockdep invocations since we
are doing something rather custom here and these make more sense to use.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
 mm/mmap_lock.c            | 10 +++++-----
 2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 0b3614aadbb4..da63b1be6ec0 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)

 #ifdef CONFIG_PER_VMA_LOCK

+/*
+ * VMA locks do not behave like most ordinary locks found in the kernel, so we
+ * cannot quite have full lockdep tracking in the way we would ideally prefer.
+ *
+ * Read locks act as shared locks which exclude an exclusive lock being
+ * taken. We therefore mark these accordingly on read lock acquire/release.
+ *
+ * Write locks are acquired exclusively per-VMA, but released in a shared
+ * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
+ * that write lock is de-acquired.
+ *
+ * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
+ * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
+ *
+ * We do, however, want to indicate that during either acquisition of a VMA
+ * write lock or detachment of a VMA that we require the lock held be exclusive,
+ * so we utilise lockdep to do so.
+ */
+#define __vma_lockdep_acquire_read(vma) \
+	lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
+#define __vma_lockdep_release_read(vma) \
+	lock_release(&vma->vmlock_dep_map, _RET_IP_)
+#define __vma_lockdep_acquire_exclusive(vma) \
+	lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
+#define __vma_lockdep_release_exclusive(vma) \
+	lock_release(&vma->vmlock_dep_map, _RET_IP_)
+/* Only meaningful if CONFIG_LOCK_STAT is defined. */
+#define __vma_lockdep_stat_mark_acquired(vma) \
+	lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
+
 static inline void mm_lock_seqcount_init(struct mm_struct *mm)
 {
 	seqcount_init(&mm->mm_lock_seq);
@@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
 	int refcnt;
 	bool detached;

-	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
-
+	__vma_lockdep_release_read(vma);
 	detached = __vma_refcount_put(vma, &refcnt);
 	/*
 	 * __vma_enter_locked() may be sleeping waiting for readers to drop
@@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
 							      VM_REFCNT_LIMIT)))
 		return false;

-	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+	__vma_lockdep_acquire_read(vma);
 	return true;
 }

diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index ebacb57e5f16..9563bfb051f4 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
 		return 0;

-	rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
+	__vma_lockdep_acquire_exclusive(vma);
 	err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
@@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 			WARN_ON_ONCE(!detaching);
 			err = 0;
 		}
-		rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+		__vma_lockdep_release_exclusive(vma);
 		return err;
 	}
-	lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
+	__vma_lockdep_stat_mark_acquired(vma);

 	return 1;
 }
@@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
 {
 	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
 					  &vma->vm_refcnt);
-	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+	__vma_lockdep_release_exclusive(vma);
 }

 int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
@@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
 		goto err;
 	}

-	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+	__vma_lockdep_acquire_read(vma);

 	if (unlikely(vma->vm_mm != mm))
 		goto err_unstable;
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (3 preceding siblings ...)
  2026-01-22 13:01 ` [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 19:39   ` Suren Baghdasaryan
  2026-01-23  8:54   ` Vlastimil Babka
  2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
                   ` (5 subsequent siblings)
  10 siblings, 2 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

We're doing precisely the same thing that __vma_exit_locked() does, so
de-duplicate this code and keep the refcount primitive in one place.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap_lock.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 9563bfb051f4..7a0361cff6db 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -45,6 +45,14 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);

 #ifdef CONFIG_MMU
 #ifdef CONFIG_PER_VMA_LOCK
+
+static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
+{
+	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					  &vma->vm_refcnt);
+	__vma_lockdep_release_exclusive(vma);
+}
+
 /*
  * __vma_enter_locked() returns 0 immediately if the vma is not
  * attached, otherwise it waits for any current readers to finish and
@@ -77,7 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
 	if (err) {
-		if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
+		bool detached;
+
+		__vma_exit_locked(vma, &detached);
+		if (detached) {
 			/*
 			 * The wait failed, but the last reader went away
 			 * as well.  Tell the caller the VMA is detached.
@@ -85,7 +96,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 			WARN_ON_ONCE(!detaching);
 			err = 0;
 		}
-		__vma_lockdep_release_exclusive(vma);
 		return err;
 	}
 	__vma_lockdep_stat_mark_acquired(vma);
@@ -93,13 +103,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 	return 1;
 }

-static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
-{
-	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
-					  &vma->vm_refcnt);
-	__vma_lockdep_release_exclusive(vma);
-}
-
 int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 		int state)
 {
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (4 preceding siblings ...)
  2026-01-22 13:01 ` [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 13:08   ` Lorenzo Stoakes
                     ` (2 more replies)
  2026-01-22 13:01 ` [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
                   ` (4 subsequent siblings)
  10 siblings, 3 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

These functions are very confusing indeed. 'Entering' a lock could be
interpreted as acquiring it, but this is not what these functions are
interacting with.

Equally they don't indicate at all what kind of lock we are 'entering' or
'exiting'. Finally they are misleading as we invoke these functions when we
already hold a write lock to detach a VMA.

These functions are explicitly simply 'entering' and 'exiting' a state in
which we hold the EXCLUSIVE lock in order that we can either mark the VMA
as being write-locked, or mark the VMA detached.

Rename the functions accordingly, and also update
__vma_exit_exclusive_locked() to return detached state with a __must_check
directive, as it is simply clumsy to pass an output pointer here to
detached state and inconsistent vs. __vma_enter_exclusive_locked().

Finally, remove the unnecessary 'inline' directives.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h |  4 +--
 mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
 2 files changed, 41 insertions(+), 23 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index da63b1be6ec0..873bc5f3c97c 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
 	__vma_lockdep_release_read(vma);
 	detached = __vma_refcount_put(vma, &refcnt);
 	/*
-	 * __vma_enter_locked() may be sleeping waiting for readers to drop
-	 * their reference count, so wake it up if we were the last reader
+	 * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
+	 * drop their reference count, so wake it up if we were the last reader
 	 * blocking it from being acquired.
 	 */
 	if (!detached && are_readers_excluded(refcnt))
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 7a0361cff6db..f73221174a8b 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
 #ifdef CONFIG_MMU
 #ifdef CONFIG_PER_VMA_LOCK
 
-static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
+/*
+ * Now that all readers have been evicted, mark the VMA as being out of the
+ * 'exclude readers' state.
+ *
+ * Returns true if the VMA is now detached, otherwise false.
+ */
+static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
 {
-	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
-					  &vma->vm_refcnt);
+	bool detached;
+
+	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					 &vma->vm_refcnt);
 	__vma_lockdep_release_exclusive(vma);
+	return detached;
 }
 
 /*
- * __vma_enter_locked() returns 0 immediately if the vma is not
- * attached, otherwise it waits for any current readers to finish and
- * returns 1.  Returns -EINTR if a signal is received while waiting.
+ * Mark the VMA as being in a state of excluding readers, check to see if any
+ * VMA read locks are indeed held, and if so wait for them to be released.
+ *
+ * Note that this function pairs with vma_refcount_put() which will wake up this
+ * thread when it detects that the last reader has released its lock.
+ *
+ * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
+ * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
+ * is permitted to kill it.
+ *
+ * The function will return 0 immediately if the VMA is detached, and 1 once the
+ * VMA has evicted all readers, leaving the VMA exclusively locked.
+ *
+ * If the function returns 1, the caller is required to invoke
+ * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ *
+ * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
+ * may also return -EINTR to indicate a fatal signal was received while waiting.
  */
-static inline int __vma_enter_locked(struct vm_area_struct *vma,
+static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
 		bool detaching, int state)
 {
 	int err;
@@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
 		   state);
 	if (err) {
-		bool detached;
-
-		__vma_exit_locked(vma, &detached);
-		if (detached) {
+		if (__vma_exit_exclusive_locked(vma)) {
 			/*
 			 * The wait failed, but the last reader went away
-			 * as well.  Tell the caller the VMA is detached.
+			 * as well. Tell the caller the VMA is detached.
 			 */
 			WARN_ON_ONCE(!detaching);
 			err = 0;
@@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 {
 	int locked;
 
-	locked = __vma_enter_locked(vma, false, state);
+	locked = __vma_enter_exclusive_locked(vma, false, state);
 	if (locked < 0)
 		return locked;
 
@@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
 
-	if (locked) {
-		bool detached;
-
-		__vma_exit_locked(vma, &detached);
-		WARN_ON_ONCE(detached); /* vma should remain attached */
-	}
+	/* vma should remain attached. */
+	if (locked)
+		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
 
 	return 0;
 }
@@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	detached = __vma_refcount_put(vma, NULL);
 	if (unlikely(!detached)) {
 		/* Wait until vma is detached with no readers. */
-		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
+		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
 			/*
 			 * Once this is complete, no readers can increment the
 			 * reference count, and the VMA is marked detached.
 			 */
-			__vma_exit_locked(vma, &detached);
+			detached = __vma_exit_exclusive_locked(vma);
 			WARN_ON_ONCE(!detached);
 		}
 	}
-- 
2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (5 preceding siblings ...)
  2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
@ 2026-01-22 13:01 ` Lorenzo Stoakes
  2026-01-22 21:41   ` Suren Baghdasaryan
  2026-01-23 10:02   ` Vlastimil Babka
  2026-01-22 13:02 ` [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
                   ` (3 subsequent siblings)
  10 siblings, 2 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
error (but only when waiting for readers in TASK_KILLABLE state), and
having the return value be stored in a stack variable called 'locked' is
further confusion.

More generally, we are doing a lock of rather finnicky things during the
acquisition of a state in which readers are excluded and moving out of this
state, including tracking whether we are detached or not or whether an
error occurred.

We are implementing logic in __vma_enter_exclusive_locked() that
effectively acts as if 'if one caller calls us do X, if another then do Y',
which is very confusing from a control flow perspective.

Introducing the shared helper object state helps us avoid this, as we can
now handle the 'an error arose but we're detached' condition correctly in
both callers - a warning if not detaching, and treating the situation as if
no error arose in the case of a VMA detaching.

This also acts to help document what's going on and allows us to add some
more logical debug asserts.

Also update vma_mark_detached() to add a guard clause for the likely
'already detached' state (given we hold the mmap write lock), and add a
comment about ephemeral VMA read lock reference count increments to clarify
why we are entering/exiting an exclusive locked state here.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
 1 file changed, 91 insertions(+), 53 deletions(-)

diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index f73221174a8b..75166a43ffa4 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
 #ifdef CONFIG_MMU
 #ifdef CONFIG_PER_VMA_LOCK

+/* State shared across __vma_[enter, exit]_exclusive_locked(). */
+struct vma_exclude_readers_state {
+	/* Input parameters. */
+	struct vm_area_struct *vma;
+	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
+	bool detaching;
+
+	bool detached;
+	bool exclusive; /* Are we exclusively locked? */
+};
+
 /*
  * Now that all readers have been evicted, mark the VMA as being out of the
  * 'exclude readers' state.
  *
  * Returns true if the VMA is now detached, otherwise false.
  */
-static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
+static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
 {
-	bool detached;
+	struct vm_area_struct *vma = ves->vma;
+
+	VM_WARN_ON_ONCE(ves->detached);
+	VM_WARN_ON_ONCE(!ves->exclusive);

-	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
-					 &vma->vm_refcnt);
+	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+					      &vma->vm_refcnt);
 	__vma_lockdep_release_exclusive(vma);
-	return detached;
+}
+
+static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
+{
+	const unsigned int tgt = ves->detaching ? 0 : 1;
+
+	return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
 }

 /*
@@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
  * Note that this function pairs with vma_refcount_put() which will wake up this
  * thread when it detects that the last reader has released its lock.
  *
- * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
- * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
- * is permitted to kill it.
+ * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
+ * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
+ * signal is permitted to kill it.
  *
- * The function will return 0 immediately if the VMA is detached, and 1 once the
- * VMA has evicted all readers, leaving the VMA exclusively locked.
+ * The function sets the ves->locked parameter to true if an exclusive lock was
+ * acquired, or false if the VMA was detached or an error arose on wait.
  *
- * If the function returns 1, the caller is required to invoke
- * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ * If the function indicates an exclusive lock was acquired via ves->exclusive
+ * (or equivalently, returning 0 with !ves->detached), the caller is required to
+ * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
+ * required.
  *
- * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
- * may also return -EINTR to indicate a fatal signal was received while waiting.
+ * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
+ * function may also return -EINTR to indicate a fatal signal was received while
+ * waiting.
  */
-static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
-		bool detaching, int state)
+static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
 {
-	int err;
-	unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
+	struct vm_area_struct *vma = ves->vma;
+	unsigned int tgt_refcnt = get_target_refcnt(ves);
+	int err = 0;

 	mmap_assert_write_locked(vma->vm_mm);
-
-	/* Additional refcnt if the vma is attached. */
-	if (!detaching)
-		tgt_refcnt++;
+	VM_WARN_ON_ONCE(ves->detached);
+	VM_WARN_ON_ONCE(ves->exclusive);

 	/*
 	 * If vma is detached then only vma_mark_attached() can raise the
@@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
+	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
+		ves->detached = true;
 		return 0;
+	}

 	__vma_lockdep_acquire_exclusive(vma);
 	err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
 		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
-		   state);
+		   ves->state);
 	if (err) {
-		if (__vma_exit_exclusive_locked(vma)) {
-			/*
-			 * The wait failed, but the last reader went away
-			 * as well. Tell the caller the VMA is detached.
-			 */
-			WARN_ON_ONCE(!detaching);
-			err = 0;
-		}
+		__vma_exit_exclusive_locked(ves);
 		return err;
 	}
-	__vma_lockdep_stat_mark_acquired(vma);

-	return 1;
+	__vma_lockdep_stat_mark_acquired(vma);
+	ves->exclusive = true;
+	return 0;
 }

 int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 		int state)
 {
-	int locked;
+	int err;
+	struct vma_exclude_readers_state ves = {
+		.vma = vma,
+		.state = state,
+	};

-	locked = __vma_enter_exclusive_locked(vma, false, state);
-	if (locked < 0)
-		return locked;
+	err = __vma_enter_exclusive_locked(&ves);
+	if (err) {
+		WARN_ON_ONCE(ves.detached);
+		return err;
+	}

 	/*
 	 * We should use WRITE_ONCE() here because we can have concurrent reads
@@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
 	 */
 	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);

-	/* vma should remain attached. */
-	if (locked)
-		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
+	if (!ves.detached) {
+		__vma_exit_exclusive_locked(&ves);
+		/* VMA should remain attached. */
+		WARN_ON_ONCE(ves.detached);
+	}

 	return 0;
 }
@@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);

 void vma_mark_detached(struct vm_area_struct *vma)
 {
-	bool detached;
+	struct vma_exclude_readers_state ves = {
+		.vma = vma,
+		.state = TASK_UNINTERRUPTIBLE,
+		.detaching = true,
+	};
+	int err;

 	vma_assert_write_locked(vma);
 	vma_assert_attached(vma);
@@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	detached = __vma_refcount_put(vma, NULL);
-	if (unlikely(!detached)) {
-		/* Wait until vma is detached with no readers. */
-		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
-			/*
-			 * Once this is complete, no readers can increment the
-			 * reference count, and the VMA is marked detached.
-			 */
-			detached = __vma_exit_exclusive_locked(vma);
-			WARN_ON_ONCE(!detached);
-		}
+	if (likely(__vma_refcount_put(vma, NULL)))
+		return;
+
+	/*
+	 * Wait until the VMA is detached with no readers. Since we hold the VMA
+	 * write lock, the only read locks that might be present are those from
+	 * threads trying to acquire the read lock and incrementing the
+	 * reference count before realising the write lock is held and
+	 * decrementing it.
+	 */
+	err = __vma_enter_exclusive_locked(&ves);
+	if (!err && !ves.detached) {
+		/*
+		 * Once this is complete, no readers can increment the
+		 * reference count, and the VMA is marked detached.
+		 */
+		__vma_exit_exclusive_locked(&ves);
 	}
+	/* If an error arose but we were detached anyway, we don't care. */
+	WARN_ON_ONCE(!ves.detached);
 }

 /*
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked()
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (6 preceding siblings ...)
  2026-01-22 13:01 ` [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
@ 2026-01-22 13:02 ` Lorenzo Stoakes
  2026-01-22 21:55   ` Suren Baghdasaryan
  2026-01-22 13:02 ` [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

The function is a little confusing, clean it up a little then add a
descriptive comment.

No functional change intended.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 873bc5f3c97c..b00d34b5ad10 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
 	vma_refcount_put(vma);
 }

-/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
-static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
+/*
+ * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
+ * write lock is held.
+ *
+ * Returns true if write-locked, otherwise false.
+ *
+ * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.
+ */
+static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
+					 unsigned int *mm_lock_seq)
 {
-	mmap_assert_write_locked(vma->vm_mm);
+	struct mm_struct *mm = vma->vm_mm;
+	const unsigned int seq = mm->mm_lock_seq.sequence;
+
+	mmap_assert_write_locked(mm);

 	/*
 	 * current task is holding mmap_write_lock, both vma->vm_lock_seq and
 	 * mm->mm_lock_seq can't be concurrently modified.
 	 */
-	*mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
-	return (vma->vm_lock_seq == *mm_lock_seq);
+	if (vma->vm_lock_seq == seq)
+		return true;
+	*mm_lock_seq = seq;
+	return false;
 }

 int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (7 preceding siblings ...)
  2026-01-22 13:02 ` [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
@ 2026-01-22 13:02 ` Lorenzo Stoakes
  2026-01-22 22:02   ` Suren Baghdasaryan
  2026-01-23 16:55   ` Vlastimil Babka
  2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
  2026-01-22 15:48 ` [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Andrew Morton
  10 siblings, 2 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

We can use lockdep to avoid unnecessary work here, otherwise update the
code to logically evaluate all pertinent cases and share code with
vma_assert_write_locked().

Make it clear here that we treat the VMA being detached at this point as a
bug, this was only implicit before.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mmap_lock.h | 42 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 39 insertions(+), 3 deletions(-)

diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index b00d34b5ad10..92ea07f0da4e 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -319,6 +319,10 @@ int vma_start_write_killable(struct vm_area_struct *vma)
 	return __vma_start_write(vma, mm_lock_seq, TASK_KILLABLE);
 }

+/**
+ * vma_assert_write_locked() - assert that @vma holds a VMA write lock.
+ * @vma: The VMA to assert.
+ */
 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 {
 	unsigned int mm_lock_seq;
@@ -326,16 +330,48 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
 	VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma);
 }

+/**
+ * vma_assert_locked() - assert that @vma holds either a VMA read or a VMA write
+ * lock and is not detached.
+ * @vma: The VMA to assert.
+ */
 static inline void vma_assert_locked(struct vm_area_struct *vma)
 {
-	unsigned int mm_lock_seq;
+	unsigned int refs;

 	/*
 	 * See the comment describing the vm_area_struct->vm_refcnt field for
 	 * details of possible refcnt values.
 	 */
-	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
-		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
+
+	/*
+	 * If read-locked or currently excluding readers, then the VMA is
+	 * locked.
+	 */
+#ifdef CONFIG_LOCKDEP
+	if (lock_is_held(&vma->vmlock_dep_map))
+		return;
+#endif
+
+	refs = refcount_read(&vma->vm_refcnt);
+
+	/*
+	 * In this case we're either read-locked, write-locked with temporary
+	 * readers, or in the midst of excluding readers, all of which means
+	 * we're locked.
+	 */
+	if (refs > 1)
+		return;
+
+	/* It is a bug for the VMA to be detached here. */
+	VM_BUG_ON_VMA(!refs, vma);
+
+	/*
+	 * OK, the VMA has a reference count of 1 which means it is either
+	 * unlocked and attached or write-locked, so assert that it is
+	 * write-locked.
+	 */
+	vma_assert_write_locked(vma);
 }

 static inline bool vma_is_attached(struct vm_area_struct *vma)
--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (8 preceding siblings ...)
  2026-01-22 13:02 ` [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
@ 2026-01-22 13:02 ` Lorenzo Stoakes
  2026-01-22 22:12   ` Suren Baghdasaryan
                     ` (2 more replies)
  2026-01-22 15:48 ` [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Andrew Morton
  10 siblings, 3 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
be changed underneath us. This will be the case if EITHER the VMA lock or
the mmap lock is held.

In order to do so, we introduce a new assert vma_assert_stablised() - this
will make a lockdep assert if lockdep is enabled AND the VMA is
read-locked.

Currently lockdep tracking for VMA write locks is not implemented, so it
suffices to check in this case that we have either an mmap read or write
semaphore held.

Note that because the VMA lock uses the non-standard vmlock_dep_map naming
convention, we cannot use lockdep_assert_is_write_held() so have to open
code this ourselves via lockdep-asserting that
lock_is_held_type(&vma->vmlock_dep_map, 0).

We have to be careful here - for instance when merging a VMA, we use the
mmap write lock to stabilise the examination of adjacent VMAs which might
be simultaneously VMA read-locked whilst being faulted in.

If we were to assert VMA read lock using lockdep we would encounter an
incorrect lockdep assert.

Also, we have to be careful about asserting mmap locks are held - if we try
to address the above issue by first checking whether mmap lock is held and
if so asserting it via lockdep, we may find that we were raced by another
thread acquiring an mmap read lock simultaneously that either we don't
own (and thus can be released any time - so we are not stable) or was
indeed released since we last checked.

So to deal with these complexities we end up with either a precise (if
lockdep is enabled) or imprecise (if not) approach - in the first instance
we assert the lock is held using lockdep and thus whether we own it.

If we do own it, then the check is complete, otherwise we must check for
the VMA read lock being held (VMA write lock implies mmap write lock so the
mmap lock suffices for this).

If lockdep is not enabled we simply check if the mmap lock is held and risk
a false negative (i.e. not asserting when we should do).

There are a couple places in the kernel where we already do this
stabliisation check - the anon_vma_name() helper in mm/madvise.c and
vma_flag_set_atomic() in include/linux/mm.h, which we update to use
vma_assert_stabilised().

This change abstracts these into vma_assert_stabilised(), uses lockdep if
possible, and avoids a duplicate check of whether the mmap lock is held.

This is also self-documenting and lays the foundations for further VMA
stability checks in the code.

Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
 include/linux/mm.h        |  5 +---
 include/linux/mmap_lock.h | 52 +++++++++++++++++++++++++++++++++++++++
 mm/madvise.c              |  4 +--
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6029a71a6908..d7ca837dd8a5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1008,10 +1008,7 @@ static inline void vma_flag_set_atomic(struct vm_area_struct *vma,
 {
 	unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags);

-	/* mmap read lock/VMA read lock must be held. */
-	if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
-		vma_assert_locked(vma);
-
+	vma_assert_stabilised(vma);
 	if (__vma_flag_atomic_valid(vma, bit))
 		set_bit((__force int)bit, bitmap);
 }
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 92ea07f0da4e..e01161560608 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -374,6 +374,52 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
 	vma_assert_write_locked(vma);
 }

+/**
+ * vma_assert_stabilised() - assert that this VMA cannot be changed from
+ * underneath us either by having a VMA or mmap lock held.
+ * @vma: The VMA whose stability we wish to assess.
+ *
+ * If lockdep is enabled we can precisely ensure stability via either an mmap
+ * lock owned by us or a specific VMA lock.
+ *
+ * With lockdep disabled we may sometimes race with other threads acquiring the
+ * mmap read lock simultaneous with our VMA read lock.
+ */
+static inline void vma_assert_stabilised(struct vm_area_struct *vma)
+{
+	/*
+	 * If another thread owns an mmap lock, it may go away at any time, and
+	 * thus is no guarantee of stability.
+	 *
+	 * If lockdep is enabled we can accurately determine if an mmap lock is
+	 * held and owned by us. Otherwise we must approximate.
+	 *
+	 * It doesn't necessarily mean we are not stabilised however, as we may
+	 * hold a VMA read lock (not a write lock as this would require an owned
+	 * mmap lock).
+	 *
+	 * If (assuming lockdep is not enabled) we were to assert a VMA read
+	 * lock first we may also run into issues, as other threads can hold VMA
+	 * read locks simlutaneous to us.
+	 *
+	 * Therefore if lockdep is not enabled we risk a false negative (i.e. no
+	 * assert fired). If accurate checking is required, enable lockdep.
+	 */
+	if (IS_ENABLED(CONFIG_LOCKDEP)) {
+		if (lockdep_is_held(&vma->vm_mm->mmap_lock))
+			return;
+	} else {
+		if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
+			return;
+	}
+
+	/*
+	 * We're not stabilised by the mmap lock, so assert that we're
+	 * stabilised by a VMA lock.
+	 */
+	vma_assert_locked(vma);
+}
+
 static inline bool vma_is_attached(struct vm_area_struct *vma)
 {
 	return refcount_read(&vma->vm_refcnt);
@@ -455,6 +501,12 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
 	mmap_assert_locked(vma->vm_mm);
 }

+static inline void vma_assert_stabilised(struct vm_area_struct *vma)
+{
+	/* If no VMA locks, then either mmap lock suffices to stabilise. */
+	mmap_assert_locked(vma->vm_mm);
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */

 static inline void mmap_write_lock(struct mm_struct *mm)
diff --git a/mm/madvise.c b/mm/madvise.c
index 4bf4c8c38fd3..1f3040688f04 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -109,9 +109,7 @@ void anon_vma_name_free(struct kref *kref)

 struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma)
 {
-	if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
-		vma_assert_locked(vma);
-
+	vma_assert_stabilised(vma);
 	return vma->anon_name;
 }

--
2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
@ 2026-01-22 13:08   ` Lorenzo Stoakes
  2026-01-22 20:15   ` Suren Baghdasaryan
  2026-01-23  9:16   ` Vlastimil Babka
  2 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 13:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

I'm clearly cursed today... but obviously this is the RESEND 6/10 :)

Andrew - can you check to make sure I didn't confuse your scripts? Apologies for
this!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper
  2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
                   ` (9 preceding siblings ...)
  2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
@ 2026-01-22 15:48 ` Andrew Morton
  2026-01-22 15:57   ` Lorenzo Stoakes
  10 siblings, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2026-01-22 15:48 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, 22 Jan 2026 13:01:52 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:

> Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> be changed underneath us. This will be the case if EITHER the VMA lock or
> the mmap lock is held.
> 
> We already open-code this in two places - anon_vma_name() in mm/madvise.c
> and vma_flag_set_atomic() in include/linux/mm.h.
> 
> This series adds vma_assert_stablised() which abstract this can be used in
> these callsites instead.

Thanks, I added this to mm,git's mm-new branch.

It conflicts somewhat with your series "mm: add bitmap VMA flag helpers
and convert all mmap_prepare to use them".  I believe that a new
version of that series is in the works so I removed it instead of
attempting to fix things up.  Please lmk if I should attempt to perform
the repairs.



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper
  2026-01-22 15:48 ` [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Andrew Morton
@ 2026-01-22 15:57   ` Lorenzo Stoakes
  2026-01-22 16:01     ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 15:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 07:48:07AM -0800, Andrew Morton wrote:
> On Thu, 22 Jan 2026 13:01:52 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
>
> > Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> > be changed underneath us. This will be the case if EITHER the VMA lock or
> > the mmap lock is held.
> >
> > We already open-code this in two places - anon_vma_name() in mm/madvise.c
> > and vma_flag_set_atomic() in include/linux/mm.h.
> >
> > This series adds vma_assert_stablised() which abstract this can be used in
> > these callsites instead.
>
> Thanks, I added this to mm,git's mm-new branch.
>
> It conflicts somewhat with your series "mm: add bitmap VMA flag helpers
> and convert all mmap_prepare to use them".  I believe that a new
> version of that series is in the works so I removed it instead of
> attempting to fix things up.  Please lmk if I should attempt to perform
> the repairs.
>

Thanks, I'm about to send that.

The conflicts hopefully shouldn't be too bad, let me know if you need any
help with conflict resolution!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper
  2026-01-22 15:57   ` Lorenzo Stoakes
@ 2026-01-22 16:01     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 16:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 03:57:35PM +0000, Lorenzo Stoakes wrote:
> On Thu, Jan 22, 2026 at 07:48:07AM -0800, Andrew Morton wrote:
> > On Thu, 22 Jan 2026 13:01:52 +0000 Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote:
> >
> > > Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> > > be changed underneath us. This will be the case if EITHER the VMA lock or
> > > the mmap lock is held.
> > >
> > > We already open-code this in two places - anon_vma_name() in mm/madvise.c
> > > and vma_flag_set_atomic() in include/linux/mm.h.
> > >
> > > This series adds vma_assert_stablised() which abstract this can be used in
> > > these callsites instead.
> >
> > Thanks, I added this to mm,git's mm-new branch.
> >
> > It conflicts somewhat with your series "mm: add bitmap VMA flag helpers
> > and convert all mmap_prepare to use them".  I believe that a new
> > version of that series is in the works so I removed it instead of
> > attempting to fix things up.  Please lmk if I should attempt to perform
> > the repairs.
> >
>
> Thanks, I'm about to send that.
>
> The conflicts hopefully shouldn't be too bad, let me know if you need any
> help with conflict resolution!
>
> Cheers, Lorenzo

Actually let me rebase that series on this one to save you the trouble! :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
@ 2026-01-22 16:26   ` Vlastimil Babka
  2026-01-22 16:29     ` Lorenzo Stoakes
  2026-01-22 16:37   ` Suren Baghdasaryan
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-22 16:26 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
> order to indicate that a VMA is in the process of having VMA read-locks
> excluded in __vma_enter_locked() (that is, first checking if there are any
> VMA read locks held, and if there are, waiting on them to be released).
> 
> This happens when a VMA write lock is being established, or a VMA is being
> marked detached and discovers that the VMA reference count is elevated due
> to read-locks temporarily elevating the reference count only to discover a
> VMA write lock is in place.
> 
> The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
> VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
> from the newly introduced VMA_*_BIT flags).
> 
> Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
> 
> Update comments to reflect this.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

git grep tells me VMA_LOCK_OFFSET is still used in
tools/testing/vma/vma_internal.h but I guess it doesn't break the tests?



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 16:26   ` Vlastimil Babka
@ 2026-01-22 16:29     ` Lorenzo Stoakes
  2026-01-23 13:52       ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 16:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 05:26:13PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
> > order to indicate that a VMA is in the process of having VMA read-locks
> > excluded in __vma_enter_locked() (that is, first checking if there are any
> > VMA read locks held, and if there are, waiting on them to be released).
> >
> > This happens when a VMA write lock is being established, or a VMA is being
> > marked detached and discovers that the VMA reference count is elevated due
> > to read-locks temporarily elevating the reference count only to discover a
> > VMA write lock is in place.
> >
> > The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
> > VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
> > from the newly introduced VMA_*_BIT flags).
> >
> > Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
> >
> > Update comments to reflect this.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>
> git grep tells me VMA_LOCK_OFFSET is still used in
> tools/testing/vma/vma_internal.h but I guess it doesn't break the tests?
>

No :) I update it later in the series but it doesn't break the tests so no
bisection hazard.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
  2026-01-22 16:26   ` Vlastimil Babka
@ 2026-01-22 16:37   ` Suren Baghdasaryan
  2026-01-23 13:26     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 16:37 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
> order to indicate that a VMA is in the process of having VMA read-locks
> excluded in __vma_enter_locked() (that is, first checking if there are any
> VMA read locks held, and if there are, waiting on them to be released).
>
> This happens when a VMA write lock is being established, or a VMA is being
> marked detached and discovers that the VMA reference count is elevated due
> to read-locks temporarily elevating the reference count only to discover a
> VMA write lock is in place.
>
> The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
> VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
> from the newly introduced VMA_*_BIT flags).
>
> Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
>
> Update comments to reflect this.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks for the cleanup Lorenzo, and sorry for the delay in reviewing
your patches. I finally have some time and will try to finish my
review today.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  include/linux/mm_types.h  | 17 +++++++++++++----
>  include/linux/mmap_lock.h | 14 ++++++++------
>  mm/mmap_lock.c            | 17 ++++++++++-------
>  3 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 78950eb8926d..94de392ed3c5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -752,8 +752,17 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
>  }
>  #endif
>
> -#define VMA_LOCK_OFFSET        0x40000000
> -#define VMA_REF_LIMIT  (VMA_LOCK_OFFSET - 1)
> +/*
> + * WHile __vma_enter_locked() is working to ensure are no read-locks held on a

s/WHile/While

> + * VMA (either while acquiring a VMA write lock or marking a VMA detached) we
> + * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> + * vma_start_read() that the reference count should be left alone.
> + *
> + * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> + */
> +#define VM_REFCNT_EXCLUDE_READERS_BIT  (30)
> +#define VM_REFCNT_EXCLUDE_READERS_FLAG (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> +#define VM_REFCNT_LIMIT                        (VM_REFCNT_EXCLUDE_READERS_FLAG - 1)
>
>  struct vma_numab_state {
>         /*
> @@ -935,10 +944,10 @@ struct vm_area_struct {
>         /*
>          * Can only be written (using WRITE_ONCE()) while holding both:
>          *  - mmap_lock (in write mode)
> -        *  - vm_refcnt bit at VMA_LOCK_OFFSET is set
> +        *  - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_FLAG is set
>          * Can be read reliably while holding one of:
>          *  - mmap_lock (in read or write mode)
> -        *  - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1
> +        *  - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_BIT is set or vm_refcnt > 1
>          * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
>          * while holding nothing (except RCU to keep the VMA struct allocated).
>          *
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index b50416fbba20..5acbd4ba1b52 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -125,12 +125,14 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
>  static inline bool is_vma_writer_only(int refcnt)
>  {
>         /*
> -        * With a writer and no readers, refcnt is VMA_LOCK_OFFSET if the vma
> -        * is detached and (VMA_LOCK_OFFSET + 1) if it is attached. Waiting on
> -        * a detached vma happens only in vma_mark_detached() and is a rare
> -        * case, therefore most of the time there will be no unnecessary wakeup.
> +        * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
> +        * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
> +        * attached. Waiting on a detached vma happens only in
> +        * vma_mark_detached() and is a rare case, therefore most of the time
> +        * there will be no unnecessary wakeup.
>          */
> -       return (refcnt & VMA_LOCK_OFFSET) && refcnt <= VMA_LOCK_OFFSET + 1;
> +       return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
> +               refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
>  }
>
>  static inline void vma_refcount_put(struct vm_area_struct *vma)
> @@ -159,7 +161,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
>
>         mmap_assert_locked(vma->vm_mm);
>         if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
> -                                                             VMA_REF_LIMIT)))
> +                                                             VM_REFCNT_LIMIT)))
>                 return false;
>
>         rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 7421b7ea8001..1d23b48552e9 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -54,7 +54,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                 bool detaching, int state)
>  {
>         int err;
> -       unsigned int tgt_refcnt = VMA_LOCK_OFFSET;
> +       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
>
>         mmap_assert_write_locked(vma->vm_mm);
>
> @@ -66,7 +66,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>          * If vma is detached then only vma_mark_attached() can raise the
>          * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
>          */
> -       if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
> +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
>                 return 0;
>
>         rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
> @@ -74,7 +74,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
>         if (err) {
> -               if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
> +               if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
>                         /*
>                          * The wait failed, but the last reader went away
>                          * as well.  Tell the caller the VMA is detached.
> @@ -92,7 +92,8 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>
>  static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
>  {
> -       *detached = refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt);
> +       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                         &vma->vm_refcnt);
>         rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
>  }
>
> @@ -180,13 +181,15 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
>         }
>
>         /*
> -        * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
> -        * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
> +        * If VM_REFCNT_EXCLUDE_READERS_FLAG is set,
> +        * __refcount_inc_not_zero_limited_acquire() will fail because
> +        * VM_REFCNT_LIMIT is less than VM_REFCNT_EXCLUDE_READERS_FLAG.
> +        *
>          * Acquire fence is required here to avoid reordering against later
>          * vm_lock_seq check and checks inside lock_vma_under_rcu().
>          */
>         if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
> -                                                             VMA_REF_LIMIT))) {
> +                                                             VM_REFCNT_LIMIT))) {
>                 /* return EAGAIN if vma got detached from under us */
>                 vma = oldcnt ? NULL : ERR_PTR(-EAGAIN);
>                 goto err;
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
  2026-01-22 13:01 ` [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
@ 2026-01-22 16:48   ` Vlastimil Babka
  2026-01-22 17:28     ` Suren Baghdasaryan
  2026-01-23 13:45     ` Lorenzo Stoakes
  0 siblings, 2 replies; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-22 16:48 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> The possible vma->vm_refcnt values are confusing and vague, explain in
> detail what these can be in a comment describing the vma->vm_refcnt field
> and reference this comment in various places that read/write this field.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
try to be as precise as possible, I believe.

> ---
>  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
>  include/linux/mmap_lock.h |  7 +++++++
>  mm/mmap_lock.c            |  6 ++++++
>  3 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 94de392ed3c5..e5ee66f84d9a 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
>   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
>   * vma_start_read() that the reference count should be left alone.
>   *
> - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> + * See the comment describing vm_refcnt in vm_area_struct for details as to
> + * which values the VMA reference count can be.
>   */
>  #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
>  #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> @@ -989,7 +990,41 @@ struct vm_area_struct {
>  	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
>  #endif
>  #ifdef CONFIG_PER_VMA_LOCK
> -	/* Unstable RCU readers are allowed to read this. */
> +	/*
> +	 * Used to keep track of the number of references taken by VMA read or
> +	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set

I wonder about the "or write locks" part. The process of acquiring it uses
VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
refcount? (the sentence could be read it way IMHO) It's vma being attached
that does, AFAIK?

> +	 * indicating that a thread has entered __vma_enter_locked() and is
> +	 * waiting on any outstanding read locks to exit.
> +	 *
> +	 * This value can be equal to:
> +	 *
> +	 * 0 - Detached.

Is it worth saying that readers can't increment the refcount?

> +	 * 1 - Unlocked or write-locked.

"Attached and either unlocked or write-locked." ?

(see how "write-locked" isn't reflected, I argued above)

> +	 *
> +	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> +	 * write-locked with other threads having temporarily incremented the
> +	 * reference count prior to determining it is write-locked and
> +	 * decrementing it again.

Ack.

> +	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> +	 * __vma_exit_locked() completion which will decrement the reference
> +	 * count to zero. IMPORTANT - at this stage no further readers can
> +	 * increment the reference count. It can only be reduced.
> +	 *
> +	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> +	 * __vma_exit_locked() completion which will decrement the reference
> +	 * count to one, OR a detached VMA waiting on a single spurious reader
> +	 * to decrement reference count. IMPORTANT - as above, no further
> +	 * readers can increment the reference count.
> +	 *
> +	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,

"VMA is waiting" sounds weird? a thread might be, but VMA itself?
(similarly in the previous paragraph)

> +	 * whether it is attempting to acquire a write lock or attempting to
> +	 * detach. IMPORTANT - as above, no ruther readers can increment the
> +	 * reference count.
> +	 *
> +	 * NOTE: Unstable RCU readers are allowed to read this.
> +	 */
>  	refcount_t vm_refcnt ____cacheline_aligned_in_smp;
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  	struct lockdep_map vmlock_dep_map;
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 5acbd4ba1b52..a764439d0276 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
>  	 * attached. Waiting on a detached vma happens only in
>  	 * vma_mark_detached() and is a rare case, therefore most of the time
>  	 * there will be no unnecessary wakeup.
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
>  		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> @@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
>  {
>  	unsigned int mm_lock_seq;
> 
> +	/*
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
> +	 */
>  	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
>  		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
>  }
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 1d23b48552e9..75dc098aea14 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  	/*
>  	 * If vma is detached then only vma_mark_attached() can raise the
>  	 * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
>  		return 0;
> @@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	 * before they check vm_lock_seq, realize the vma is locked and drop
>  	 * back the vm_refcnt. That is a narrow window for observing a raised
>  	 * vm_refcnt.
> +	 *
> +	 * See the comment describing the vm_area_struct->vm_refcnt field for
> +	 * details of possible refcnt values.
>  	 */
>  	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
>  		/* Wait until vma is detached with no readers. */
> --
> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
  2026-01-22 16:48   ` Vlastimil Babka
@ 2026-01-22 17:28     ` Suren Baghdasaryan
  2026-01-23 15:06       ` Lorenzo Stoakes
  2026-01-23 13:45     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 17:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 8:48 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The possible vma->vm_refcnt values are confusing and vague, explain in
> > detail what these can be in a comment describing the vma->vm_refcnt field
> > and reference this comment in various places that read/write this field.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> try to be as precise as possible, I believe.

Another thanks from me.

>
> > ---
> >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> >  include/linux/mmap_lock.h |  7 +++++++
> >  mm/mmap_lock.c            |  6 ++++++
> >  3 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 94de392ed3c5..e5ee66f84d9a 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> >   * vma_start_read() that the reference count should be left alone.
> >   *
> > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > + * which values the VMA reference count can be.
> >   */
> >  #define VM_REFCNT_EXCLUDE_READERS_BIT        (30)
> >  #define VM_REFCNT_EXCLUDE_READERS_FLAG       (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > @@ -989,7 +990,41 @@ struct vm_area_struct {
> >       struct vma_numab_state *numab_state;    /* NUMA Balancing state */
> >  #endif
> >  #ifdef CONFIG_PER_VMA_LOCK
> > -     /* Unstable RCU readers are allowed to read this. */
> > +     /*
> > +      * Used to keep track of the number of references taken by VMA read or
> > +      * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
>
> I wonder about the "or write locks" part. The process of acquiring it uses
> VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> refcount? (the sentence could be read it way IMHO) It's vma being attached
> that does, AFAIK?

Yes, since there can be only one write-locker it only has to set
VM_REFCNT_EXCLUDE_READERS_FLAG bit to announce its presence, without
incrementing the refcount.

>
> > +      * indicating that a thread has entered __vma_enter_locked() and is
> > +      * waiting on any outstanding read locks to exit.
> > +      *
> > +      * This value can be equal to:
> > +      *
> > +      * 0 - Detached.
>
> Is it worth saying that readers can't increment the refcount?

Yes, you mention that for VM_REFCNT_EXCLUDE_READERS_FLAG value. The
same IMPORTANT notice applies here.

>
> > +      * 1 - Unlocked or write-locked.
>
> "Attached and either unlocked or write-locked." ?

Agree. That's more specific.
Should we also mention here that unlocked vs write-locked distinction
is determined using the vm_lock_seq member?

>
> (see how "write-locked" isn't reflected, I argued above)
>
> > +      *
> > +      * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > +      * write-locked with other threads having temporarily incremented the
> > +      * reference count prior to determining it is write-locked and
> > +      * decrementing it again.
>
> Ack.
>
> > +      * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > +      * __vma_exit_locked() completion which will decrement the reference
> > +      * count to zero. IMPORTANT - at this stage no further readers can
> > +      * increment the reference count. It can only be reduced.
> > +      *
> > +      * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > +      * __vma_exit_locked() completion which will decrement the reference
> > +      * count to one, OR a detached VMA waiting on a single spurious reader
> > +      * to decrement reference count. IMPORTANT - as above, no further
> > +      * readers can increment the reference count.
> > +      *
> > +      * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
>
> "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> (similarly in the previous paragraph)

Maybe "VMA in the process of being write-locked or detached, which got
blocked due to the spurious readers that temporarily raised the
refcount"?

>
> > +      * whether it is attempting to acquire a write lock or attempting to
> > +      * detach. IMPORTANT - as above, no ruther readers can increment the
> > +      * reference count.
> > +      *
> > +      * NOTE: Unstable RCU readers are allowed to read this.
> > +      */
> >       refcount_t vm_refcnt ____cacheline_aligned_in_smp;
> >  #ifdef CONFIG_DEBUG_LOCK_ALLOC
> >       struct lockdep_map vmlock_dep_map;
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index 5acbd4ba1b52..a764439d0276 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
> >        * attached. Waiting on a detached vma happens only in
> >        * vma_mark_detached() and is a rare case, therefore most of the time
> >        * there will be no unnecessary wakeup.
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
> >               refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> > @@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
> >  {
> >       unsigned int mm_lock_seq;
> >
> > +     /*
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> > +      */
> >       VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
> >                     !__is_vma_write_locked(vma, &mm_lock_seq), vma);
> >  }
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 1d23b48552e9..75dc098aea14 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >       /*
> >        * If vma is detached then only vma_mark_attached() can raise the
> >        * vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> >               return 0;
> > @@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >        * before they check vm_lock_seq, realize the vma is locked and drop
> >        * back the vm_refcnt. That is a narrow window for observing a raised
> >        * vm_refcnt.
> > +      *
> > +      * See the comment describing the vm_area_struct->vm_refcnt field for
> > +      * details of possible refcnt values.
> >        */
> >       if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> >               /* Wait until vma is detached with no readers. */
> > --
> > 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 13:01 ` [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
@ 2026-01-22 17:36   ` Vlastimil Babka
  2026-01-22 19:31     ` Suren Baghdasaryan
  2026-01-23 14:02     ` Lorenzo Stoakes
  0 siblings, 2 replies; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-22 17:36 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> The is_vma_writer_only() function is misnamed - this isn't determining if
> there is only a write lock, as it checks for the presence of the
> VM_REFCNT_EXCLUDE_READERS_FLAG.
> 
> Really, it is checking to see whether readers are excluded, with a
> possibility of a false positive in the case of a detachment (there we
> expect the vma->vm_refcnt to eventually be set to
> VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
> eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
> 
> Rename the function accordingly.
> 
> Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
> in vma_refcount_put(), using the old value to determine what the reference
> count ought to be after the operation is complete (ignoring racing
> reference count adjustments).
> 
> Wrap this into a __vma_refcount_put() function, which we can then utilise
> in vma_mark_detached() and thus keep the refcount primitive usage
> abstracted.
> 
> Also adjust comments, removing duplicative comments covered elsewhere and
> adding more to aid understanding.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Again very useful, thanks!

> ---
>  include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
>  mm/mmap_lock.c            | 18 +++++-------
>  2 files changed, 57 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index a764439d0276..0b3614aadbb4 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
>  	vma->vm_lock_seq = UINT_MAX;
>  }
> 
> -static inline bool is_vma_writer_only(int refcnt)
> +/**
> + * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
> + * excluded all VMA read locks.
> + * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
> + *
> + * We may be raced by other readers temporarily incrementing the reference
> + * count, though the race window is very small, this might cause spurious
> + * wakeups.

I think this part about spurious wakeups belongs more to the usage of the
function in vma_refcount_put()? Because there are no wakeups done here. So
it should be enough to explain how it can be false positive like in the
paragraph below.

> + *
> + * In the case of a detached VMA, we may incorrectly indicate that readers are
> + * excluded when one remains, because in that scenario we target a refcount of
> + * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
> + * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
> + *
> + * However, the race window for that is very small so it is unlikely.
> + *
> + * Returns: true if readers are excluded, false otherwise.
> + */
> +static inline bool are_readers_excluded(int refcnt)

I wonder if a include/linux/ header should have such a generically named
function (I understand it's necessary for it to be here). Maybe prefix the
name and make the comment not a kerneldoc because it's going to be only the
vma locking implementation using it and not the vma locking end-users? (i.e.
it's "intermediate").

>  {
>  	/*
> -	 * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
> -	 * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
> -	 * attached. Waiting on a detached vma happens only in
> -	 * vma_mark_detached() and is a rare case, therefore most of the time
> -	 * there will be no unnecessary wakeup.
> -	 *
>  	 * See the comment describing the vm_area_struct->vm_refcnt field for
>  	 * details of possible refcnt values.
>  	 */
> @@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
>  		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
>  }
> 
> +static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)

Basically change are_readers_excluded() like this, with __vma prefix?

But this one could IMHO use use some comment (also not kerneldoc) saying
what the return value and *refcnt indicate?

> +{
> +	int oldcnt;
> +	bool detached;
> +
> +	detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> +	if (refcnt)
> +		*refcnt = oldcnt - 1;
> +	return detached;
> +}
> +
> +/**
> + * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
> + * read-lock being dropped.
> + * @vma: The VMA whose reference count we wish to decrement.
> + *
> + * If we were the last reader, wake up threads waiting to obtain an exclusive
> + * lock.
> + */
>  static inline void vma_refcount_put(struct vm_area_struct *vma)
>  {
> -	/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> +	/* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
>  	struct mm_struct *mm = vma->vm_mm;
> -	int oldcnt;
> +	int refcnt;
> +	bool detached;
> 
>  	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> -	if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
> 
> -		if (is_vma_writer_only(oldcnt - 1))
> -			rcuwait_wake_up(&mm->vma_writer_wait);
> -	}
> +	detached = __vma_refcount_put(vma, &refcnt);
> +	/*
> +	 * __vma_enter_locked() may be sleeping waiting for readers to drop
> +	 * their reference count, so wake it up if we were the last reader
> +	 * blocking it from being acquired.
> +	 */
> +	if (!detached && are_readers_excluded(refcnt))
> +		rcuwait_wake_up(&mm->vma_writer_wait);
>  }
> 
>  /*
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 75dc098aea14..ebacb57e5f16 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> 
>  void vma_mark_detached(struct vm_area_struct *vma)
>  {
> +	bool detached;
> +
>  	vma_assert_write_locked(vma);
>  	vma_assert_attached(vma);
> 
>  	/*
> -	 * We are the only writer, so no need to use vma_refcount_put().
> -	 * The condition below is unlikely because the vma has been already
> -	 * write-locked and readers can increment vm_refcnt only temporarily
> -	 * before they check vm_lock_seq, realize the vma is locked and drop
> -	 * back the vm_refcnt. That is a narrow window for observing a raised
> -	 * vm_refcnt.
> -	 *
>  	 * See the comment describing the vm_area_struct->vm_refcnt field for
>  	 * details of possible refcnt values.
>  	 */
> -	if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> +	detached = __vma_refcount_put(vma, NULL);
> +	if (unlikely(!detached)) {
>  		/* Wait until vma is detached with no readers. */
>  		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> -			bool detached;
> -
> +			/*
> +			 * Once this is complete, no readers can increment the
> +			 * reference count, and the VMA is marked detached.
> +			 */
>  			__vma_exit_locked(vma, &detached);
>  			WARN_ON_ONCE(!detached);
>  		}
> --
> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 17:36   ` Vlastimil Babka
@ 2026-01-22 19:31     ` Suren Baghdasaryan
  2026-01-23  8:24       ` Vlastimil Babka
  2026-01-23 14:41       ` Lorenzo Stoakes
  2026-01-23 14:02     ` Lorenzo Stoakes
  1 sibling, 2 replies; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 19:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 9:36 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The is_vma_writer_only() function is misnamed - this isn't determining if
> > there is only a write lock, as it checks for the presence of the
> > VM_REFCNT_EXCLUDE_READERS_FLAG.
> >
> > Really, it is checking to see whether readers are excluded, with a
> > possibility of a false positive in the case of a detachment (there we
> > expect the vma->vm_refcnt to eventually be set to
> > VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
> > eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
> >
> > Rename the function accordingly.
> >
> > Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
> > in vma_refcount_put(), using the old value to determine what the reference
> > count ought to be after the operation is complete (ignoring racing
> > reference count adjustments).

Sorry, by mistake I replied to an earlier version here:
https://lore.kernel.org/all/CAJuCfpF-tVr==bCf-PXJFKPn99yRjfONeDnDtPvTkGUfyuvtcw@mail.gmail.com/
Copying my comments here.

IIUC, __refcount_dec_and_test() can decrement the refcount by only 1
and the old value returned (oldcnt) will be the exact value that it
was before this decrement. Therefore oldcnt - 1 must reflect the
refcount value after the decrement. It's possible the refcount gets
manipulated after this operation but that does not make this operation
wrong. I don't quite understand why you think that's racy or finnicky.

> >
> > Wrap this into a __vma_refcount_put() function, which we can then utilise
> > in vma_mark_detached() and thus keep the refcount primitive usage
> > abstracted.
> >
> > Also adjust comments, removing duplicative comments covered elsewhere and
> > adding more to aid understanding.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Again very useful, thanks!
>
> > ---
> >  include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
> >  mm/mmap_lock.c            | 18 +++++-------
> >  2 files changed, 57 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index a764439d0276..0b3614aadbb4 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
> >       vma->vm_lock_seq = UINT_MAX;
> >  }
> >
> > -static inline bool is_vma_writer_only(int refcnt)
> > +/**
> > + * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
> > + * excluded all VMA read locks.
> > + * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
> > + *
> > + * We may be raced by other readers temporarily incrementing the reference
> > + * count, though the race window is very small, this might cause spurious
> > + * wakeups.
>
> I think this part about spurious wakeups belongs more to the usage of the
> function in vma_refcount_put()? Because there are no wakeups done here. So
> it should be enough to explain how it can be false positive like in the
> paragraph below.
>
> > + *
> > + * In the case of a detached VMA, we may incorrectly indicate that readers are
> > + * excluded when one remains, because in that scenario we target a refcount of
> > + * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
> > + * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
> > + *
> > + * However, the race window for that is very small so it is unlikely.
> > + *
> > + * Returns: true if readers are excluded, false otherwise.
> > + */
> > +static inline bool are_readers_excluded(int refcnt)
>
> I wonder if a include/linux/ header should have such a generically named
> function (I understand it's necessary for it to be here). Maybe prefix the
> name and make the comment not a kerneldoc because it's going to be only the
> vma locking implementation using it and not the vma locking end-users? (i.e.
> it's "intermediate").
>
> >  {
> >       /*
> > -      * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
> > -      * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
> > -      * attached. Waiting on a detached vma happens only in
> > -      * vma_mark_detached() and is a rare case, therefore most of the time
> > -      * there will be no unnecessary wakeup.
> > -      *
> >        * See the comment describing the vm_area_struct->vm_refcnt field for
> >        * details of possible refcnt values.
> >        */
> > @@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
> >               refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> >  }
> >
> > +static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
>
> Basically change are_readers_excluded() like this, with __vma prefix?
>
> But this one could IMHO use use some comment (also not kerneldoc) saying
> what the return value and *refcnt indicate?
>
> > +{
> > +     int oldcnt;
> > +     bool detached;
> > +
> > +     detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> > +     if (refcnt)
> > +             *refcnt = oldcnt - 1;
> > +     return detached;

IIUC there is always a connection between detached and *refcnt
resulting value. If detached==true then the resulting *refcnt has to
be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
new count:

static inline int __vma_refcount_put(struct vm_area_struct *vma)
{
       int oldcnt;

       __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
       return oldcnt - 1;
}

And later:

newcnt = __vma_refcount_put(&vma->vm_refcnt);
detached = newcnt == 0;

> > +}
> > +
> > +/**
> > + * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
> > + * read-lock being dropped.
> > + * @vma: The VMA whose reference count we wish to decrement.
> > + *
> > + * If we were the last reader, wake up threads waiting to obtain an exclusive
> > + * lock.
> > + */
> >  static inline void vma_refcount_put(struct vm_area_struct *vma)
> >  {
> > -     /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> > +     /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
> >       struct mm_struct *mm = vma->vm_mm;
> > -     int oldcnt;
> > +     int refcnt;
> > +     bool detached;
> >
> >       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> > -     if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
> >
> > -             if (is_vma_writer_only(oldcnt - 1))
> > -                     rcuwait_wake_up(&mm->vma_writer_wait);
> > -     }
> > +     detached = __vma_refcount_put(vma, &refcnt);
> > +     /*
> > +      * __vma_enter_locked() may be sleeping waiting for readers to drop
> > +      * their reference count, so wake it up if we were the last reader
> > +      * blocking it from being acquired.
> > +      */
> > +     if (!detached && are_readers_excluded(refcnt))
> > +             rcuwait_wake_up(&mm->vma_writer_wait);
> >  }
> >
> >  /*
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 75dc098aea14..ebacb57e5f16 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> >
> >  void vma_mark_detached(struct vm_area_struct *vma)
> >  {
> > +     bool detached;
> > +
> >       vma_assert_write_locked(vma);
> >       vma_assert_attached(vma);
> >
> >       /*
> > -      * We are the only writer, so no need to use vma_refcount_put().
> > -      * The condition below is unlikely because the vma has been already
> > -      * write-locked and readers can increment vm_refcnt only temporarily

I think the above part of the comment is still important and should be
kept intact.

> > -      * before they check vm_lock_seq, realize the vma is locked and drop
> > -      * back the vm_refcnt. That is a narrow window for observing a raised
> > -      * vm_refcnt.
> > -      *
> >        * See the comment describing the vm_area_struct->vm_refcnt field for
> >        * details of possible refcnt values.
> >        */
> > -     if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> > +     detached = __vma_refcount_put(vma, NULL);
> > +     if (unlikely(!detached)) {
> >               /* Wait until vma is detached with no readers. */
> >               if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > -                     bool detached;
> > -
> > +                     /*
> > +                      * Once this is complete, no readers can increment the
> > +                      * reference count, and the VMA is marked detached.
> > +                      */
> >                       __vma_exit_locked(vma, &detached);
> >                       WARN_ON_ONCE(!detached);
> >               }
> > --
> > 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 13:01 ` [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
@ 2026-01-22 19:32   ` Suren Baghdasaryan
  2026-01-22 19:41     ` Suren Baghdasaryan
  2026-01-23 15:00     ` Lorenzo Stoakes
  2026-01-23  8:48   ` Vlastimil Babka
  1 sibling, 2 replies; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 19:32 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The code is littered with inscrutable and duplicative lockdep incantations,
> replace these with defines which explain what is going on and add
> commentary to explain what we're doing.
>
> If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> remains meaningful.
>
> These are self-documenting and aid readability of the code.
>
> Additionally, instead of using the confusing rwsem_*() form for something
> that is emphatically not an rwsem, we instead explicitly use
> lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> are doing something rather custom here and these make more sense to use.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Very nice! Thank you.

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
>  mm/mmap_lock.c            | 10 +++++-----
>  2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 0b3614aadbb4..da63b1be6ec0 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
>
>  #ifdef CONFIG_PER_VMA_LOCK
>
> +/*
> + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> + *
> + * Read locks act as shared locks which exclude an exclusive lock being
> + * taken. We therefore mark these accordingly on read lock acquire/release.
> + *
> + * Write locks are acquired exclusively per-VMA, but released in a shared
> + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> + * that write lock is de-acquired.
> + *
> + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
> + *
> + * We do, however, want to indicate that during either acquisition of a VMA
> + * write lock or detachment of a VMA that we require the lock held be exclusive,
> + * so we utilise lockdep to do so.
> + */
> +#define __vma_lockdep_acquire_read(vma) \
> +       lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
> +#define __vma_lockdep_release_read(vma) \
> +       lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +#define __vma_lockdep_acquire_exclusive(vma) \
> +       lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
> +#define __vma_lockdep_release_exclusive(vma) \
> +       lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +/* Only meaningful if CONFIG_LOCK_STAT is defined. */
> +#define __vma_lockdep_stat_mark_acquired(vma) \
> +       lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
> +
>  static inline void mm_lock_seqcount_init(struct mm_struct *mm)
>  {
>         seqcount_init(&mm->mm_lock_seq);
> @@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>         int refcnt;
>         bool detached;
>
> -       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> -
> +       __vma_lockdep_release_read(vma);
>         detached = __vma_refcount_put(vma, &refcnt);
>         /*
>          * __vma_enter_locked() may be sleeping waiting for readers to drop
> @@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
>                                                               VM_REFCNT_LIMIT)))
>                 return false;
>
> -       rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> +       __vma_lockdep_acquire_read(vma);
>         return true;
>  }
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index ebacb57e5f16..9563bfb051f4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>         if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
>                 return 0;
>
> -       rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
> +       __vma_lockdep_acquire_exclusive(vma);
>         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
> @@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                         WARN_ON_ONCE(!detaching);
>                         err = 0;
>                 }
> -               rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> +               __vma_lockdep_release_exclusive(vma);
>                 return err;
>         }
> -       lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> +       __vma_lockdep_stat_mark_acquired(vma);
>
>         return 1;
>  }
> @@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
>  {
>         *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
>                                           &vma->vm_refcnt);
> -       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> +       __vma_lockdep_release_exclusive(vma);
>  }
>
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> @@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
>                 goto err;
>         }
>
> -       rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> +       __vma_lockdep_acquire_read(vma);
>
>         if (unlikely(vma->vm_mm != mm))
>                 goto err_unstable;
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path
  2026-01-22 13:01 ` [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
@ 2026-01-22 19:39   ` Suren Baghdasaryan
  2026-01-23 15:11     ` Lorenzo Stoakes
  2026-01-23  8:54   ` Vlastimil Babka
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 19:39 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> We're doing precisely the same thing that __vma_exit_locked() does, so
> de-duplicate this code and keep the refcount primitive in one place.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  mm/mmap_lock.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 9563bfb051f4..7a0361cff6db 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -45,6 +45,14 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
> +
> +static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> +{
> +       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                         &vma->vm_refcnt);
> +       __vma_lockdep_release_exclusive(vma);
> +}
> +
>  /*
>   * __vma_enter_locked() returns 0 immediately if the vma is not
>   * attached, otherwise it waits for any current readers to finish and
> @@ -77,7 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
>         if (err) {
> -               if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> +               bool detached;
> +
> +               __vma_exit_locked(vma, &detached);
> +               if (detached) {
>                         /*
>                          * The wait failed, but the last reader went away
>                          * as well.  Tell the caller the VMA is detached.
> @@ -85,7 +96,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                         WARN_ON_ONCE(!detaching);
>                         err = 0;
>                 }
> -               __vma_lockdep_release_exclusive(vma);
>                 return err;
>         }
>         __vma_lockdep_stat_mark_acquired(vma);
> @@ -93,13 +103,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>         return 1;
>  }
>
> -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> -{
> -       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -                                         &vma->vm_refcnt);
> -       __vma_lockdep_release_exclusive(vma);
> -}
> -
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>                 int state)
>  {
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 19:32   ` Suren Baghdasaryan
@ 2026-01-22 19:41     ` Suren Baghdasaryan
  2026-01-23  8:41       ` Vlastimil Babka
  2026-01-23 15:00     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 19:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 11:32 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > The code is littered with inscrutable and duplicative lockdep incantations,
> > replace these with defines which explain what is going on and add
> > commentary to explain what we're doing.
> >
> > If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> > remains meaningful.
> >
> > These are self-documenting and aid readability of the code.
> >
> > Additionally, instead of using the confusing rwsem_*() form for something
> > that is emphatically not an rwsem, we instead explicitly use
> > lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> > are doing something rather custom here and these make more sense to use.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Very nice! Thank you.
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>
> > ---
> >  include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
> >  mm/mmap_lock.c            | 10 +++++-----
> >  2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index 0b3614aadbb4..da63b1be6ec0 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
> >
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/*
> > + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> > + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> > + *
> > + * Read locks act as shared locks which exclude an exclusive lock being
> > + * taken. We therefore mark these accordingly on read lock acquire/release.
> > + *
> > + * Write locks are acquired exclusively per-VMA, but released in a shared
> > + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> > + * that write lock is de-acquired.
> > + *
> > + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> > + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
> > + *
> > + * We do, however, want to indicate that during either acquisition of a VMA
> > + * write lock or detachment of a VMA that we require the lock held be exclusive,
> > + * so we utilise lockdep to do so.
> > + */
> > +#define __vma_lockdep_acquire_read(vma) \

One question I forgot to ask. Are you adding "__" prefix to indicate
no other users should be using them or for some other reason?

> > +       lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
> > +#define __vma_lockdep_release_read(vma) \
> > +       lock_release(&vma->vmlock_dep_map, _RET_IP_)
> > +#define __vma_lockdep_acquire_exclusive(vma) \
> > +       lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
> > +#define __vma_lockdep_release_exclusive(vma) \
> > +       lock_release(&vma->vmlock_dep_map, _RET_IP_)
> > +/* Only meaningful if CONFIG_LOCK_STAT is defined. */
> > +#define __vma_lockdep_stat_mark_acquired(vma) \
> > +       lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
> > +
> >  static inline void mm_lock_seqcount_init(struct mm_struct *mm)
> >  {
> >         seqcount_init(&mm->mm_lock_seq);
> > @@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
> >         int refcnt;
> >         bool detached;
> >
> > -       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> > -
> > +       __vma_lockdep_release_read(vma);
> >         detached = __vma_refcount_put(vma, &refcnt);
> >         /*
> >          * __vma_enter_locked() may be sleeping waiting for readers to drop
> > @@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
> >                                                               VM_REFCNT_LIMIT)))
> >                 return false;
> >
> > -       rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> > +       __vma_lockdep_acquire_read(vma);
> >         return true;
> >  }
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index ebacb57e5f16..9563bfb051f4 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >         if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> >                 return 0;
> >
> > -       rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
> > +       __vma_lockdep_acquire_exclusive(vma);
> >         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> >                    state);
> > @@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >                         WARN_ON_ONCE(!detaching);
> >                         err = 0;
> >                 }
> > -               rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> > +               __vma_lockdep_release_exclusive(vma);
> >                 return err;
> >         }
> > -       lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> > +       __vma_lockdep_stat_mark_acquired(vma);
> >
> >         return 1;
> >  }
> > @@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> >  {
> >         *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> >                                           &vma->vm_refcnt);
> > -       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> > +       __vma_lockdep_release_exclusive(vma);
> >  }
> >
> >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > @@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> >                 goto err;
> >         }
> >
> > -       rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> > +       __vma_lockdep_acquire_read(vma);
> >
> >         if (unlikely(vma->vm_mm != mm))
> >                 goto err_unstable;
> > --
> > 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
  2026-01-22 13:08   ` Lorenzo Stoakes
@ 2026-01-22 20:15   ` Suren Baghdasaryan
  2026-01-22 20:55     ` Andrew Morton
  2026-01-23 16:33     ` Lorenzo Stoakes
  2026-01-23  9:16   ` Vlastimil Babka
  2 siblings, 2 replies; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 20:15 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> These functions are very confusing indeed. 'Entering' a lock could be
> interpreted as acquiring it, but this is not what these functions are
> interacting with.
>
> Equally they don't indicate at all what kind of lock we are 'entering' or
> 'exiting'. Finally they are misleading as we invoke these functions when we
> already hold a write lock to detach a VMA.
>
> These functions are explicitly simply 'entering' and 'exiting' a state in
> which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> as being write-locked, or mark the VMA detached.
>
> Rename the functions accordingly, and also update
> __vma_exit_exclusive_locked() to return detached state with a __must_check
> directive, as it is simply clumsy to pass an output pointer here to
> detached state and inconsistent vs. __vma_enter_exclusive_locked().
>
> Finally, remove the unnecessary 'inline' directives.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mmap_lock.h |  4 +--
>  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index da63b1be6ec0..873bc5f3c97c 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>         __vma_lockdep_release_read(vma);
>         detached = __vma_refcount_put(vma, &refcnt);
>         /*
> -        * __vma_enter_locked() may be sleeping waiting for readers to drop
> -        * their reference count, so wake it up if we were the last reader
> +        * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> +        * drop their reference count, so wake it up if we were the last reader
>          * blocking it from being acquired.
>          */
>         if (!detached && are_readers_excluded(refcnt))
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 7a0361cff6db..f73221174a8b 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>
> -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> +/*
> + * Now that all readers have been evicted, mark the VMA as being out of the
> + * 'exclude readers' state.
> + *
> + * Returns true if the VMA is now detached, otherwise false.
> + */
> +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>  {
> -       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -                                         &vma->vm_refcnt);
> +       bool detached;
> +
> +       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                        &vma->vm_refcnt);
>         __vma_lockdep_release_exclusive(vma);
> +       return detached;
>  }
>
>  /*
> - * __vma_enter_locked() returns 0 immediately if the vma is not
> - * attached, otherwise it waits for any current readers to finish and
> - * returns 1.  Returns -EINTR if a signal is received while waiting.
> + * Mark the VMA as being in a state of excluding readers, check to see if any
> + * VMA read locks are indeed held, and if so wait for them to be released.
> + *
> + * Note that this function pairs with vma_refcount_put() which will wake up this
> + * thread when it detects that the last reader has released its lock.
> + *
> + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> + * is permitted to kill it.
> + *
> + * The function will return 0 immediately if the VMA is detached, and 1 once the
> + * VMA has evicted all readers, leaving the VMA exclusively locked.

The wording here is a bit misleading. We do not evict the readers,
just wait for them to complete and exit.

> + *
> + * If the function returns 1, the caller is required to invoke
> + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + *
> + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> + * may also return -EINTR to indicate a fatal signal was received while waiting.
>   */
> -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>                 bool detaching, int state)
>  {
>         int err;
> @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>                    state);
>         if (err) {
> -               bool detached;
> -
> -               __vma_exit_locked(vma, &detached);
> -               if (detached) {
> +               if (__vma_exit_exclusive_locked(vma)) {
>                         /*
>                          * The wait failed, but the last reader went away
> -                        * as well.  Tell the caller the VMA is detached.
> +                        * as well. Tell the caller the VMA is detached.
>                          */
>                         WARN_ON_ONCE(!detaching);
>                         err = 0;
> @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  {
>         int locked;
>
> -       locked = __vma_enter_locked(vma, false, state);
> +       locked = __vma_enter_exclusive_locked(vma, false, state);
>         if (locked < 0)
>                 return locked;
>
> @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>          */
>         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>
> -       if (locked) {
> -               bool detached;
> -
> -               __vma_exit_locked(vma, &detached);
> -               WARN_ON_ONCE(detached); /* vma should remain attached */
> -       }
> +       /* vma should remain attached. */
> +       if (locked)
> +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));

I'm wary of calling functions from WARN_ON_ONCE() statements. If
someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
but it's because I have been bitten by that before...

>
>         return 0;
>  }
> @@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
>         detached = __vma_refcount_put(vma, NULL);
>         if (unlikely(!detached)) {
>                 /* Wait until vma is detached with no readers. */
> -               if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> +               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
>                         /*
>                          * Once this is complete, no readers can increment the
>                          * reference count, and the VMA is marked detached.
>                          */
> -                       __vma_exit_locked(vma, &detached);
> +                       detached = __vma_exit_exclusive_locked(vma);
>                         WARN_ON_ONCE(!detached);
>                 }
>         }
> --
> 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 20:15   ` Suren Baghdasaryan
@ 2026-01-22 20:55     ` Andrew Morton
  2026-01-23 16:15       ` Lorenzo Stoakes
  2026-01-23 16:33     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Andrew Morton @ 2026-01-22 20:55 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Lorenzo Stoakes, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, 22 Jan 2026 12:15:20 -0800 Suren Baghdasaryan <surenb@google.com> wrote:

> > +       /* vma should remain attached. */
> > +       if (locked)
> > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> 
> I'm wary of calling functions from WARN_ON_ONCE() statements. If
> someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> but it's because I have been bitten by that before...

Yes please.  The elision is desirable if the function has no side-effects, but
__vma_exit_exclusive_locked() changes stuff.

Someone(tm) should check for this.  A pathetically partial grep turns
up plenty of things:

mm/slab_common.c:	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
mm/slab_common.c:	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
mm/page-writeback.c:		WARN_ON_ONCE(atomic_long_add_return(delta,
mm/page_isolation.c:		WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page));
mm/page_alloc.c:	VM_WARN_ONCE(get_pageblock_isolate(page),
mm/numa_memblks.c:	WARN_ON(memblock_clear_hotplug(0, max_addr));
mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.memory, NUMA_NO_NODE));
mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.reserved,
mm/zsmalloc.c:		WARN_ON(!zpdesc_trylock(zpdesc));



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-22 13:01 ` [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
@ 2026-01-22 21:41   ` Suren Baghdasaryan
  2026-01-23 17:59     ` Lorenzo Stoakes
  2026-01-23 10:02   ` Vlastimil Babka
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 21:41 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> error (but only when waiting for readers in TASK_KILLABLE state), and
> having the return value be stored in a stack variable called 'locked' is
> further confusion.
>
> More generally, we are doing a lock of rather finnicky things during the
> acquisition of a state in which readers are excluded and moving out of this
> state, including tracking whether we are detached or not or whether an
> error occurred.
>
> We are implementing logic in __vma_enter_exclusive_locked() that
> effectively acts as if 'if one caller calls us do X, if another then do Y',
> which is very confusing from a control flow perspective.
>
> Introducing the shared helper object state helps us avoid this, as we can
> now handle the 'an error arose but we're detached' condition correctly in
> both callers - a warning if not detaching, and treating the situation as if
> no error arose in the case of a VMA detaching.
>
> This also acts to help document what's going on and allows us to add some
> more logical debug asserts.
>
> Also update vma_mark_detached() to add a guard clause for the likely
> 'already detached' state (given we hold the mmap write lock), and add a
> comment about ephemeral VMA read lock reference count increments to clarify
> why we are entering/exiting an exclusive locked state here.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 91 insertions(+), 53 deletions(-)
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index f73221174a8b..75166a43ffa4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>
> +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> +struct vma_exclude_readers_state {
> +       /* Input parameters. */
> +       struct vm_area_struct *vma;
> +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> +       bool detaching;
> +
Are these:
            /* Output parameters. */
?
> +       bool detached;
> +       bool exclusive; /* Are we exclusively locked? */
> +};
> +
>  /*
>   * Now that all readers have been evicted, mark the VMA as being out of the
>   * 'exclude readers' state.
>   *
>   * Returns true if the VMA is now detached, otherwise false.
>   */
> -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -       bool detached;
> +       struct vm_area_struct *vma = ves->vma;
> +
> +       VM_WARN_ON_ONCE(ves->detached);
> +       VM_WARN_ON_ONCE(!ves->exclusive);
>
> -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -                                        &vma->vm_refcnt);
> +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +                                             &vma->vm_refcnt);
>         __vma_lockdep_release_exclusive(vma);
> -       return detached;
> +}
> +
> +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> +{
> +       const unsigned int tgt = ves->detaching ? 0 : 1;
> +
> +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
>  }
>
>  /*
> @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>   * Note that this function pairs with vma_refcount_put() which will wake up this
>   * thread when it detects that the last reader has released its lock.
>   *
> - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> - * is permitted to kill it.
> + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> + * signal is permitted to kill it.
>   *
> - * The function will return 0 immediately if the VMA is detached, and 1 once the
> - * VMA has evicted all readers, leaving the VMA exclusively locked.
> + * The function sets the ves->locked parameter to true if an exclusive lock was

s/ves->locked/ves->exclusive

> + * acquired, or false if the VMA was detached or an error arose on wait.
>   *
> - * If the function returns 1, the caller is required to invoke
> - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + * If the function indicates an exclusive lock was acquired via ves->exclusive
> + * (or equivalently, returning 0 with !ves->detached),

I would remove the mention of that equivalence because with this
change, return 0 simply indicates that the operation was successful
and should not be used to infer any additional states. To get specific
state the caller should use proper individual ves fields. Using return
value for anything else defeats the whole purpose of this cleanup.

> the caller is required to
> + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> + * required.
>   *
> - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> - * may also return -EINTR to indicate a fatal signal was received while waiting.
> + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> + * function may also return -EINTR to indicate a fatal signal was received while
> + * waiting.
>   */
> -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> -               bool detaching, int state)
> +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -       int err;
> -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> +       struct vm_area_struct *vma = ves->vma;
> +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> +       int err = 0;
>
>         mmap_assert_write_locked(vma->vm_mm);
> -
> -       /* Additional refcnt if the vma is attached. */
> -       if (!detaching)
> -               tgt_refcnt++;
> +       VM_WARN_ON_ONCE(ves->detached);
> +       VM_WARN_ON_ONCE(ves->exclusive);

Aren't these output parameters? If so, why do we stipulate their
initial values instead of setting them appropriately?

>
>         /*
>          * If vma is detached then only vma_mark_attached() can raise the
> @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>          * See the comment describing the vm_area_struct->vm_refcnt field for
>          * details of possible refcnt values.
>          */
> -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> +               ves->detached = true;
>                 return 0;
> +       }
>
>         __vma_lockdep_acquire_exclusive(vma);
>         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> -                  state);
> +                  ves->state);
>         if (err) {
> -               if (__vma_exit_exclusive_locked(vma)) {
> -                       /*
> -                        * The wait failed, but the last reader went away
> -                        * as well. Tell the caller the VMA is detached.
> -                        */
> -                       WARN_ON_ONCE(!detaching);
> -                       err = 0;
> -               }
> +               __vma_exit_exclusive_locked(ves);
>                 return err;

Nice! We preserve both error and detached state information.

>         }
> -       __vma_lockdep_stat_mark_acquired(vma);
>
> -       return 1;
> +       __vma_lockdep_stat_mark_acquired(vma);
> +       ves->exclusive = true;
> +       return 0;
>  }
>
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>                 int state)
>  {
> -       int locked;
> +       int err;
> +       struct vma_exclude_readers_state ves = {
> +               .vma = vma,
> +               .state = state,
> +       };
>
> -       locked = __vma_enter_exclusive_locked(vma, false, state);
> -       if (locked < 0)
> -               return locked;
> +       err = __vma_enter_exclusive_locked(&ves);
> +       if (err) {
> +               WARN_ON_ONCE(ves.detached);

I believe the above WARN_ON_ONCE() should stay inside of
__vma_enter_exclusive_locked(). Its correctness depends on the
implementation details of __vma_enter_exclusive_locked(). More
specifically, it is only correct because
__vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
if there was a pending SIGKILL.

> +               return err;
> +       }
>
>         /*
>          * We should use WRITE_ONCE() here because we can have concurrent reads
> @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>          */
>         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>
> -       /* vma should remain attached. */
> -       if (locked)
> -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> +       if (!ves.detached) {

Strictly speaking the above check should be checking ves->exclusive
instead of !ves.detached. What you have is technically correct but
it's again related to that comment:

"If the function indicates an exclusive lock was acquired via
ves->exclusive (or equivalently, returning 0 with !ves->detached), the
caller is required to invoke __vma_exit_exclusive_locked() once the
exclusive state is no longer required."

So, here you are using returning 0 with !ves->detached as an
indication that the VMA was successfully locked. I think it's less
confusing if we use the field dedicated for that purpose.

> +               __vma_exit_exclusive_locked(&ves);
> +               /* VMA should remain attached. */
> +               WARN_ON_ONCE(ves.detached);
> +       }
>
>         return 0;
>  }
> @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
>
>  void vma_mark_detached(struct vm_area_struct *vma)
>  {
> -       bool detached;
> +       struct vma_exclude_readers_state ves = {
> +               .vma = vma,
> +               .state = TASK_UNINTERRUPTIBLE,
> +               .detaching = true,
> +       };
> +       int err;
>
>         vma_assert_write_locked(vma);
>         vma_assert_attached(vma);
> @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
>          * See the comment describing the vm_area_struct->vm_refcnt field for
>          * details of possible refcnt values.
>          */
> -       detached = __vma_refcount_put(vma, NULL);
> -       if (unlikely(!detached)) {
> -               /* Wait until vma is detached with no readers. */
> -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> -                       /*
> -                        * Once this is complete, no readers can increment the
> -                        * reference count, and the VMA is marked detached.
> -                        */
> -                       detached = __vma_exit_exclusive_locked(vma);
> -                       WARN_ON_ONCE(!detached);
> -               }
> +       if (likely(__vma_refcount_put(vma, NULL)))
> +               return;
> +
> +       /*
> +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> +        * write lock, the only read locks that might be present are those from
> +        * threads trying to acquire the read lock and incrementing the
> +        * reference count before realising the write lock is held and
> +        * decrementing it.
> +        */
> +       err = __vma_enter_exclusive_locked(&ves);
> +       if (!err && !ves.detached) {

Same here, we should be checking ves->exclusive to decide if
__vma_exit_exclusive_locked() should be called or not.

> +               /*
> +                * Once this is complete, no readers can increment the
> +                * reference count, and the VMA is marked detached.
> +                */
> +               __vma_exit_exclusive_locked(&ves);
>         }
> +       /* If an error arose but we were detached anyway, we don't care. */
> +       WARN_ON_ONCE(!ves.detached);
>  }
>
>  /*
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked()
  2026-01-22 13:02 ` [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
@ 2026-01-22 21:55   ` Suren Baghdasaryan
  2026-01-23 16:21     ` Vlastimil Babka
  0 siblings, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 21:55 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The function is a little confusing, clean it up a little then add a
> descriptive comment.

I appreciate the descriptive comment but what exactly was confusing in
this function?

>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 873bc5f3c97c..b00d34b5ad10 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
>         vma_refcount_put(vma);
>  }
>
> -/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
> -static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
> +/*
> + * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
> + * write lock is held.
> + *
> + * Returns true if write-locked, otherwise false.
> + *
> + * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.

True, this does not result in a functional change because we do not
use mm_lock_seq if __is_vma_write_locked() succeeds. However this
seems to add additional gotcha that you need to remember. Any reason
why?

> + */
> +static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
> +                                        unsigned int *mm_lock_seq)
>  {
> -       mmap_assert_write_locked(vma->vm_mm);
> +       struct mm_struct *mm = vma->vm_mm;
> +       const unsigned int seq = mm->mm_lock_seq.sequence;
> +
> +       mmap_assert_write_locked(mm);
>
>         /*
>          * current task is holding mmap_write_lock, both vma->vm_lock_seq and
>          * mm->mm_lock_seq can't be concurrently modified.
>          */
> -       *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
> -       return (vma->vm_lock_seq == *mm_lock_seq);
> +       if (vma->vm_lock_seq == seq)
> +               return true;
> +       *mm_lock_seq = seq;
> +       return false;
>  }
>
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep
  2026-01-22 13:02 ` [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
@ 2026-01-22 22:02   ` Suren Baghdasaryan
  2026-01-23 18:45     ` Lorenzo Stoakes
  2026-01-23 16:55   ` Vlastimil Babka
  1 sibling, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 22:02 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> We can use lockdep to avoid unnecessary work here, otherwise update the
> code to logically evaluate all pertinent cases and share code with
> vma_assert_write_locked().
>
> Make it clear here that we treat the VMA being detached at this point as a
> bug, this was only implicit before.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Looks correct.
Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  include/linux/mmap_lock.h | 42 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 39 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index b00d34b5ad10..92ea07f0da4e 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -319,6 +319,10 @@ int vma_start_write_killable(struct vm_area_struct *vma)
>         return __vma_start_write(vma, mm_lock_seq, TASK_KILLABLE);
>  }
>
> +/**
> + * vma_assert_write_locked() - assert that @vma holds a VMA write lock.
> + * @vma: The VMA to assert.
> + */
>  static inline void vma_assert_write_locked(struct vm_area_struct *vma)
>  {
>         unsigned int mm_lock_seq;
> @@ -326,16 +330,48 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma)
>         VM_BUG_ON_VMA(!__is_vma_write_locked(vma, &mm_lock_seq), vma);
>  }
>
> +/**
> + * vma_assert_locked() - assert that @vma holds either a VMA read or a VMA write
> + * lock and is not detached.
> + * @vma: The VMA to assert.
> + */
>  static inline void vma_assert_locked(struct vm_area_struct *vma)
>  {
> -       unsigned int mm_lock_seq;
> +       unsigned int refs;
>
>         /*
>          * See the comment describing the vm_area_struct->vm_refcnt field for
>          * details of possible refcnt values.
>          */
> -       VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
> -                     !__is_vma_write_locked(vma, &mm_lock_seq), vma);
> +
> +       /*
> +        * If read-locked or currently excluding readers, then the VMA is
> +        * locked.
> +        */
> +#ifdef CONFIG_LOCKDEP
> +       if (lock_is_held(&vma->vmlock_dep_map))
> +               return;
> +#endif
> +
> +       refs = refcount_read(&vma->vm_refcnt);
> +
> +       /*
> +        * In this case we're either read-locked, write-locked with temporary
> +        * readers, or in the midst of excluding readers, all of which means
> +        * we're locked.
> +        */
> +       if (refs > 1)
> +               return;
> +
> +       /* It is a bug for the VMA to be detached here. */
> +       VM_BUG_ON_VMA(!refs, vma);
> +
> +       /*
> +        * OK, the VMA has a reference count of 1 which means it is either
> +        * unlocked and attached or write-locked, so assert that it is
> +        * write-locked.
> +        */
> +       vma_assert_write_locked(vma);
>  }
>
>  static inline bool vma_is_attached(struct vm_area_struct *vma)
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
@ 2026-01-22 22:12   ` Suren Baghdasaryan
  2026-01-23 18:54     ` Lorenzo Stoakes
  2026-01-23 17:10   ` Vlastimil Babka
  2026-01-23 23:35   ` Hillf Danton
  2 siblings, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 22:12 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> be changed underneath us. This will be the case if EITHER the VMA lock or
> the mmap lock is held.
>
> In order to do so, we introduce a new assert vma_assert_stablised() - this

s/vma_assert_stablised/vma_assert_stabilised

> will make a lockdep assert if lockdep is enabled AND the VMA is
> read-locked.
>
> Currently lockdep tracking for VMA write locks is not implemented, so it
> suffices to check in this case that we have either an mmap read or write
> semaphore held.
>
> Note that because the VMA lock uses the non-standard vmlock_dep_map naming
> convention, we cannot use lockdep_assert_is_write_held() so have to open
> code this ourselves via lockdep-asserting that
> lock_is_held_type(&vma->vmlock_dep_map, 0).
>
> We have to be careful here - for instance when merging a VMA, we use the
> mmap write lock to stabilise the examination of adjacent VMAs which might
> be simultaneously VMA read-locked whilst being faulted in.
>
> If we were to assert VMA read lock using lockdep we would encounter an
> incorrect lockdep assert.
>
> Also, we have to be careful about asserting mmap locks are held - if we try
> to address the above issue by first checking whether mmap lock is held and
> if so asserting it via lockdep, we may find that we were raced by another
> thread acquiring an mmap read lock simultaneously that either we don't
> own (and thus can be released any time - so we are not stable) or was
> indeed released since we last checked.
>
> So to deal with these complexities we end up with either a precise (if
> lockdep is enabled) or imprecise (if not) approach - in the first instance
> we assert the lock is held using lockdep and thus whether we own it.
>
> If we do own it, then the check is complete, otherwise we must check for
> the VMA read lock being held (VMA write lock implies mmap write lock so the
> mmap lock suffices for this).
>
> If lockdep is not enabled we simply check if the mmap lock is held and risk
> a false negative (i.e. not asserting when we should do).
>
> There are a couple places in the kernel where we already do this
> stabliisation check - the anon_vma_name() helper in mm/madvise.c and
> vma_flag_set_atomic() in include/linux/mm.h, which we update to use
> vma_assert_stabilised().
>
> This change abstracts these into vma_assert_stabilised(), uses lockdep if
> possible, and avoids a duplicate check of whether the mmap lock is held.
>
> This is also self-documenting and lays the foundations for further VMA
> stability checks in the code.

So, is the lockdep addition the only functional change here?

>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Suren Baghdasaryan <surenb@google.com>

> ---
>  include/linux/mm.h        |  5 +---
>  include/linux/mmap_lock.h | 52 +++++++++++++++++++++++++++++++++++++++
>  mm/madvise.c              |  4 +--
>  3 files changed, 54 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6029a71a6908..d7ca837dd8a5 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1008,10 +1008,7 @@ static inline void vma_flag_set_atomic(struct vm_area_struct *vma,
>  {
>         unsigned long *bitmap = ACCESS_PRIVATE(&vma->flags, __vma_flags);
>
> -       /* mmap read lock/VMA read lock must be held. */
> -       if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
> -               vma_assert_locked(vma);
> -
> +       vma_assert_stabilised(vma);
>         if (__vma_flag_atomic_valid(vma, bit))
>                 set_bit((__force int)bit, bitmap);
>  }
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 92ea07f0da4e..e01161560608 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -374,6 +374,52 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
>         vma_assert_write_locked(vma);
>  }
>
> +/**
> + * vma_assert_stabilised() - assert that this VMA cannot be changed from
> + * underneath us either by having a VMA or mmap lock held.
> + * @vma: The VMA whose stability we wish to assess.
> + *
> + * If lockdep is enabled we can precisely ensure stability via either an mmap
> + * lock owned by us or a specific VMA lock.
> + *
> + * With lockdep disabled we may sometimes race with other threads acquiring the
> + * mmap read lock simultaneous with our VMA read lock.
> + */
> +static inline void vma_assert_stabilised(struct vm_area_struct *vma)
> +{
> +       /*
> +        * If another thread owns an mmap lock, it may go away at any time, and
> +        * thus is no guarantee of stability.
> +        *
> +        * If lockdep is enabled we can accurately determine if an mmap lock is
> +        * held and owned by us. Otherwise we must approximate.
> +        *
> +        * It doesn't necessarily mean we are not stabilised however, as we may
> +        * hold a VMA read lock (not a write lock as this would require an owned
> +        * mmap lock).
> +        *
> +        * If (assuming lockdep is not enabled) we were to assert a VMA read
> +        * lock first we may also run into issues, as other threads can hold VMA
> +        * read locks simlutaneous to us.
> +        *
> +        * Therefore if lockdep is not enabled we risk a false negative (i.e. no
> +        * assert fired). If accurate checking is required, enable lockdep.
> +        */
> +       if (IS_ENABLED(CONFIG_LOCKDEP)) {
> +               if (lockdep_is_held(&vma->vm_mm->mmap_lock))
> +                       return;
> +       } else {
> +               if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
> +                       return;
> +       }
> +
> +       /*
> +        * We're not stabilised by the mmap lock, so assert that we're
> +        * stabilised by a VMA lock.
> +        */
> +       vma_assert_locked(vma);
> +}
> +
>  static inline bool vma_is_attached(struct vm_area_struct *vma)
>  {
>         return refcount_read(&vma->vm_refcnt);
> @@ -455,6 +501,12 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
>         mmap_assert_locked(vma->vm_mm);
>  }
>
> +static inline void vma_assert_stabilised(struct vm_area_struct *vma)
> +{
> +       /* If no VMA locks, then either mmap lock suffices to stabilise. */
> +       mmap_assert_locked(vma->vm_mm);
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>
>  static inline void mmap_write_lock(struct mm_struct *mm)
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4bf4c8c38fd3..1f3040688f04 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -109,9 +109,7 @@ void anon_vma_name_free(struct kref *kref)
>
>  struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma)
>  {
> -       if (!rwsem_is_locked(&vma->vm_mm->mmap_lock))
> -               vma_assert_locked(vma);
> -
> +       vma_assert_stabilised(vma);
>         return vma->anon_name;
>  }
>
> --
> 2.52.0


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 19:31     ` Suren Baghdasaryan
@ 2026-01-23  8:24       ` Vlastimil Babka
  2026-01-23 14:52         ` Lorenzo Stoakes
  2026-01-23 14:41       ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23  8:24 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On 1/22/26 20:31, Suren Baghdasaryan wrote:
>> > +     int oldcnt;
>> > +     bool detached;
>> > +
>> > +     detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
>> > +     if (refcnt)
>> > +             *refcnt = oldcnt - 1;
>> > +     return detached;
> 
> IIUC there is always a connection between detached and *refcnt
> resulting value. If detached==true then the resulting *refcnt has to
> be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
> new count:
> 
> static inline int __vma_refcount_put(struct vm_area_struct *vma)
> {
>        int oldcnt;
> 
>        __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
>        return oldcnt - 1;
> }
> 
> And later:
> 
> newcnt = __vma_refcount_put(&vma->vm_refcnt);
> detached = newcnt == 0;

If we go that way (both ways are fine with me) I'd suggest we rename the
function to __vma_refcount_put_return to make this more obvious. (c.f.
atomic_dec_return, lockref_put_return).



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 19:41     ` Suren Baghdasaryan
@ 2026-01-23  8:41       ` Vlastimil Babka
  2026-01-23 15:08         ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23  8:41 UTC (permalink / raw)
  To: Suren Baghdasaryan, Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Michal Hocko, Shakeel Butt, Jann Horn, linux-mm,
	linux-kernel, linux-rt-devel, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt

On 1/22/26 20:41, Suren Baghdasaryan wrote:
>> > +/*
>> > + * VMA locks do not behave like most ordinary locks found in the kernel, so we
>> > + * cannot quite have full lockdep tracking in the way we would ideally prefer.
>> > + *
>> > + * Read locks act as shared locks which exclude an exclusive lock being
>> > + * taken. We therefore mark these accordingly on read lock acquire/release.
>> > + *
>> > + * Write locks are acquired exclusively per-VMA, but released in a shared
>> > + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
>> > + * that write lock is de-acquired.
>> > + *
>> > + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
>> > + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
>> > + *
>> > + * We do, however, want to indicate that during either acquisition of a VMA
>> > + * write lock or detachment of a VMA that we require the lock held be exclusive,
>> > + * so we utilise lockdep to do so.
>> > + */
>> > +#define __vma_lockdep_acquire_read(vma) \
> 
> One question I forgot to ask. Are you adding "__" prefix to indicate
> no other users should be using them or for some other reason?

I'd say it's the case of 'it has to be in a "public" header but not expected
to be used directly" by the end-users of the header'.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 13:01 ` [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
  2026-01-22 19:32   ` Suren Baghdasaryan
@ 2026-01-23  8:48   ` Vlastimil Babka
  2026-01-23 15:10     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23  8:48 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> The code is littered with inscrutable and duplicative lockdep incantations,
> replace these with defines which explain what is going on and add
> commentary to explain what we're doing.
> 
> If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> remains meaningful.
> 
> These are self-documenting and aid readability of the code.
> 
> Additionally, instead of using the confusing rwsem_*() form for something
> that is emphatically not an rwsem, we instead explicitly use
> lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> are doing something rather custom here and these make more sense to use.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nits:

> ---
>  include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
>  mm/mmap_lock.c            | 10 +++++-----
>  2 files changed, 37 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 0b3614aadbb4..da63b1be6ec0 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
> 
>  #ifdef CONFIG_PER_VMA_LOCK
> 
> +/*
> + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> + *
> + * Read locks act as shared locks which exclude an exclusive lock being
> + * taken. We therefore mark these accordingly on read lock acquire/release.
> + *
> + * Write locks are acquired exclusively per-VMA, but released in a shared
> + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> + * that write lock is de-acquired.

de-acquired -> released?

> + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.

"... which has to be held in order to take a VMA write lock" ?

> + * We do, however, want to indicate that during either acquisition of a VMA
> + * write lock or detachment of a VMA that we require the lock held be exclusive,
> + * so we utilise lockdep to do so.
> + */
> +#define __vma_lockdep_acquire_read(vma) \
> +	lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
> +#define __vma_lockdep_release_read(vma) \
> +	lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +#define __vma_lockdep_acquire_exclusive(vma) \
> +	lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
> +#define __vma_lockdep_release_exclusive(vma) \
> +	lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +/* Only meaningful if CONFIG_LOCK_STAT is defined. */
> +#define __vma_lockdep_stat_mark_acquired(vma) \
> +	lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
> +
>  static inline void mm_lock_seqcount_init(struct mm_struct *mm)
>  {
>  	seqcount_init(&mm->mm_lock_seq);
> @@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>  	int refcnt;
>  	bool detached;
> 
> -	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> -
> +	__vma_lockdep_release_read(vma);
>  	detached = __vma_refcount_put(vma, &refcnt);
>  	/*
>  	 * __vma_enter_locked() may be sleeping waiting for readers to drop
> @@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
>  							      VM_REFCNT_LIMIT)))
>  		return false;
> 
> -	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> +	__vma_lockdep_acquire_read(vma);
>  	return true;
>  }
> 
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index ebacb57e5f16..9563bfb051f4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  	if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
>  		return 0;
> 
> -	rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
> +	__vma_lockdep_acquire_exclusive(vma);
>  	err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
>  		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>  		   state);
> @@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  			WARN_ON_ONCE(!detaching);
>  			err = 0;
>  		}
> -		rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> +		__vma_lockdep_release_exclusive(vma);
>  		return err;
>  	}
> -	lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> +	__vma_lockdep_stat_mark_acquired(vma);
> 
>  	return 1;
>  }
> @@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
>  {
>  	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
>  					  &vma->vm_refcnt);
> -	rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> +	__vma_lockdep_release_exclusive(vma);
>  }
> 
>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> @@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
>  		goto err;
>  	}
> 
> -	rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> +	__vma_lockdep_acquire_read(vma);
> 
>  	if (unlikely(vma->vm_mm != mm))
>  		goto err_unstable;
> --
> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path
  2026-01-22 13:01 ` [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
  2026-01-22 19:39   ` Suren Baghdasaryan
@ 2026-01-23  8:54   ` Vlastimil Babka
  2026-01-23 15:10     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23  8:54 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> We're doing precisely the same thing that __vma_exit_locked() does, so
> de-duplicate this code and keep the refcount primitive in one place.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
  2026-01-22 13:08   ` Lorenzo Stoakes
  2026-01-22 20:15   ` Suren Baghdasaryan
@ 2026-01-23  9:16   ` Vlastimil Babka
  2026-01-23 16:17     ` Lorenzo Stoakes
  2 siblings, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23  9:16 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> These functions are very confusing indeed. 'Entering' a lock could be
> interpreted as acquiring it, but this is not what these functions are
> interacting with.
> 
> Equally they don't indicate at all what kind of lock we are 'entering' or
> 'exiting'. Finally they are misleading as we invoke these functions when we
> already hold a write lock to detach a VMA.
> 
> These functions are explicitly simply 'entering' and 'exiting' a state in
> which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> as being write-locked, or mark the VMA detached.

If we hold a write lock (i.e. in vma_mark_detached()), that normally means
it's also exclusive?
And if we talk about the state between __vma_enter_exclusive_locked and
__vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
exactly the same lock as what we call "VMA write lock" right, so what lock
is it?

Maybe it would help if we stopped calling this internal thing a "lock"?
Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
have any great suggestion.

Maybe call those functions __vma_exclude_readers_start() and
__vma_exclude_readers_end() instead, or something?

> Rename the functions accordingly, and also update
> __vma_exit_exclusive_locked() to return detached state with a __must_check
> directive, as it is simply clumsy to pass an output pointer here to
> detached state and inconsistent vs. __vma_enter_exclusive_locked().
> 
> Finally, remove the unnecessary 'inline' directives.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  include/linux/mmap_lock.h |  4 +--
>  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
>  2 files changed, 41 insertions(+), 23 deletions(-)
> 
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index da63b1be6ec0..873bc5f3c97c 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
>  	__vma_lockdep_release_read(vma);
>  	detached = __vma_refcount_put(vma, &refcnt);
>  	/*
> -	 * __vma_enter_locked() may be sleeping waiting for readers to drop
> -	 * their reference count, so wake it up if we were the last reader
> +	 * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> +	 * drop their reference count, so wake it up if we were the last reader
>  	 * blocking it from being acquired.
>  	 */
>  	if (!detached && are_readers_excluded(refcnt))
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 7a0361cff6db..f73221174a8b 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
>  
> -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> +/*
> + * Now that all readers have been evicted, mark the VMA as being out of the
> + * 'exclude readers' state.
> + *
> + * Returns true if the VMA is now detached, otherwise false.
> + */
> +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
>  {
> -	*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -					  &vma->vm_refcnt);
> +	bool detached;
> +
> +	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +					 &vma->vm_refcnt);
>  	__vma_lockdep_release_exclusive(vma);
> +	return detached;
>  }
>  
>  /*
> - * __vma_enter_locked() returns 0 immediately if the vma is not
> - * attached, otherwise it waits for any current readers to finish and
> - * returns 1.  Returns -EINTR if a signal is received while waiting.
> + * Mark the VMA as being in a state of excluding readers, check to see if any
> + * VMA read locks are indeed held, and if so wait for them to be released.
> + *
> + * Note that this function pairs with vma_refcount_put() which will wake up this
> + * thread when it detects that the last reader has released its lock.
> + *
> + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> + * is permitted to kill it.
> + *
> + * The function will return 0 immediately if the VMA is detached, and 1 once the
> + * VMA has evicted all readers, leaving the VMA exclusively locked.
> + *
> + * If the function returns 1, the caller is required to invoke
> + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> + *
> + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> + * may also return -EINTR to indicate a fatal signal was received while waiting.
>   */
> -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
>  		bool detaching, int state)
>  {
>  	int err;
> @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
>  		   refcount_read(&vma->vm_refcnt) == tgt_refcnt,
>  		   state);
>  	if (err) {
> -		bool detached;
> -
> -		__vma_exit_locked(vma, &detached);
> -		if (detached) {
> +		if (__vma_exit_exclusive_locked(vma)) {
>  			/*
>  			 * The wait failed, but the last reader went away
> -			 * as well.  Tell the caller the VMA is detached.
> +			 * as well. Tell the caller the VMA is detached.
>  			 */
>  			WARN_ON_ONCE(!detaching);
>  			err = 0;
> @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  {
>  	int locked;
>  
> -	locked = __vma_enter_locked(vma, false, state);
> +	locked = __vma_enter_exclusive_locked(vma, false, state);
>  	if (locked < 0)
>  		return locked;
>  
> @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>  	 */
>  	WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>  
> -	if (locked) {
> -		bool detached;
> -
> -		__vma_exit_locked(vma, &detached);
> -		WARN_ON_ONCE(detached); /* vma should remain attached */
> -	}
> +	/* vma should remain attached. */
> +	if (locked)
> +		WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
>  
>  	return 0;
>  }
> @@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	detached = __vma_refcount_put(vma, NULL);
>  	if (unlikely(!detached)) {
>  		/* Wait until vma is detached with no readers. */
> -		if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> +		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
>  			/*
>  			 * Once this is complete, no readers can increment the
>  			 * reference count, and the VMA is marked detached.
>  			 */
> -			__vma_exit_locked(vma, &detached);
> +			detached = __vma_exit_exclusive_locked(vma);
>  			WARN_ON_ONCE(!detached);
>  		}
>  	}



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-22 13:01 ` [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
  2026-01-22 21:41   ` Suren Baghdasaryan
@ 2026-01-23 10:02   ` Vlastimil Babka
  2026-01-23 18:18     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23 10:02 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:01, Lorenzo Stoakes wrote:
> It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> error (but only when waiting for readers in TASK_KILLABLE state), and
> having the return value be stored in a stack variable called 'locked' is
> further confusion.
> 
> More generally, we are doing a lock of rather finnicky things during the
> acquisition of a state in which readers are excluded and moving out of this
> state, including tracking whether we are detached or not or whether an
> error occurred.
> 
> We are implementing logic in __vma_enter_exclusive_locked() that
> effectively acts as if 'if one caller calls us do X, if another then do Y',
> which is very confusing from a control flow perspective.
> 
> Introducing the shared helper object state helps us avoid this, as we can
> now handle the 'an error arose but we're detached' condition correctly in
> both callers - a warning if not detaching, and treating the situation as if
> no error arose in the case of a VMA detaching.
> 
> This also acts to help document what's going on and allows us to add some
> more logical debug asserts.
> 
> Also update vma_mark_detached() to add a guard clause for the likely
> 'already detached' state (given we hold the mmap write lock), and add a
> comment about ephemeral VMA read lock reference count increments to clarify
> why we are entering/exiting an exclusive locked state here.
> 
> No functional change intended.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
>  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
>  1 file changed, 91 insertions(+), 53 deletions(-)
> 
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index f73221174a8b..75166a43ffa4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
>  #ifdef CONFIG_MMU
>  #ifdef CONFIG_PER_VMA_LOCK
> 
> +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> +struct vma_exclude_readers_state {
> +	/* Input parameters. */
> +	struct vm_area_struct *vma;
> +	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> +	bool detaching;
> +
> +	bool detached;
> +	bool exclusive; /* Are we exclusively locked? */
> +};
> +
>  /*
>   * Now that all readers have been evicted, mark the VMA as being out of the
>   * 'exclude readers' state.
>   *
>   * Returns true if the VMA is now detached, otherwise false.
>   */
> -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
>  {
> -	bool detached;
> +	struct vm_area_struct *vma = ves->vma;
> +
> +	VM_WARN_ON_ONCE(ves->detached);
> +	VM_WARN_ON_ONCE(!ves->exclusive);

I think this will triger when called on wait failure from
__vma_enter_exclusive_locked(). Given the other things Suren raised about
the field, I wonder if it's worth keeping it?

> -	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> -					 &vma->vm_refcnt);
> +	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> +					      &vma->vm_refcnt);
>  	__vma_lockdep_release_exclusive(vma);
> -	return detached;
> +}
> +

> @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> 
>  void vma_mark_detached(struct vm_area_struct *vma)
>  {
> -	bool detached;
> +	struct vma_exclude_readers_state ves = {
> +		.vma = vma,
> +		.state = TASK_UNINTERRUPTIBLE,
> +		.detaching = true,
> +	};
> +	int err;
> 
>  	vma_assert_write_locked(vma);
>  	vma_assert_attached(vma);
> @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
>  	 * See the comment describing the vm_area_struct->vm_refcnt field for
>  	 * details of possible refcnt values.
>  	 */
> -	detached = __vma_refcount_put(vma, NULL);
> -	if (unlikely(!detached)) {
> -		/* Wait until vma is detached with no readers. */
> -		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> -			/*
> -			 * Once this is complete, no readers can increment the
> -			 * reference count, and the VMA is marked detached.
> -			 */
> -			detached = __vma_exit_exclusive_locked(vma);
> -			WARN_ON_ONCE(!detached);
> -		}
> +	if (likely(__vma_refcount_put(vma, NULL)))
> +		return;

Seems to me it would be worthwhile splitting this function to an
static-inline-in-header vma_mark_detached() that does only the asserts and
__vma_refcount_put(), and keeping the function here as __vma_mark_detached()
(or maybe differently named since the detaching kinda already happened with
the refcount put... __vma_mark_detached_finish()?) handling the rare case
__vma_refcount_put() returns false.

> +
> +	/*
> +	 * Wait until the VMA is detached with no readers. Since we hold the VMA
> +	 * write lock, the only read locks that might be present are those from
> +	 * threads trying to acquire the read lock and incrementing the
> +	 * reference count before realising the write lock is held and
> +	 * decrementing it.
> +	 */
> +	err = __vma_enter_exclusive_locked(&ves);
> +	if (!err && !ves.detached) {
> +		/*
> +		 * Once this is complete, no readers can increment the
> +		 * reference count, and the VMA is marked detached.
> +		 */
> +		__vma_exit_exclusive_locked(&ves);
>  	}
> +	/* If an error arose but we were detached anyway, we don't care. */
> +	WARN_ON_ONCE(!ves.detached);
>  }
> 
>  /*
> --
> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 16:37   ` Suren Baghdasaryan
@ 2026-01-23 13:26     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 13:26 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 08:37:51AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
> > order to indicate that a VMA is in the process of having VMA read-locks
> > excluded in __vma_enter_locked() (that is, first checking if there are any
> > VMA read locks held, and if there are, waiting on them to be released).
> >
> > This happens when a VMA write lock is being established, or a VMA is being
> > marked detached and discovers that the VMA reference count is elevated due
> > to read-locks temporarily elevating the reference count only to discover a
> > VMA write lock is in place.
> >
> > The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
> > VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
> > from the newly introduced VMA_*_BIT flags).
> >
> > Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
> >
> > Update comments to reflect this.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks for the cleanup Lorenzo, and sorry for the delay in reviewing
> your patches. I finally have some time and will try to finish my
> review today.
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks.

>
> > ---
> >  include/linux/mm_types.h  | 17 +++++++++++++----
> >  include/linux/mmap_lock.h | 14 ++++++++------
> >  mm/mmap_lock.c            | 17 ++++++++++-------
> >  3 files changed, 31 insertions(+), 17 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 78950eb8926d..94de392ed3c5 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -752,8 +752,17 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> >  }
> >  #endif
> >
> > -#define VMA_LOCK_OFFSET        0x40000000
> > -#define VMA_REF_LIMIT  (VMA_LOCK_OFFSET - 1)
> > +/*
> > + * WHile __vma_enter_locked() is working to ensure are no read-locks held on a
>
> s/WHile/While

Oops thanks. Will fix on respin.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
  2026-01-22 16:48   ` Vlastimil Babka
  2026-01-22 17:28     ` Suren Baghdasaryan
@ 2026-01-23 13:45     ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 13:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 05:48:07PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The possible vma->vm_refcnt values are confusing and vague, explain in
> > detail what these can be in a comment describing the vma->vm_refcnt field
> > and reference this comment in various places that read/write this field.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> try to be as precise as possible, I believe.

Ack.

>
> > ---
> >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> >  include/linux/mmap_lock.h |  7 +++++++
> >  mm/mmap_lock.c            |  6 ++++++
> >  3 files changed, 50 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 94de392ed3c5..e5ee66f84d9a 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> >   * vma_start_read() that the reference count should be left alone.
> >   *
> > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > + * which values the VMA reference count can be.
> >   */
> >  #define VM_REFCNT_EXCLUDE_READERS_BIT	(30)
> >  #define VM_REFCNT_EXCLUDE_READERS_FLAG	(1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > @@ -989,7 +990,41 @@ struct vm_area_struct {
> >  	struct vma_numab_state *numab_state;	/* NUMA Balancing state */
> >  #endif
> >  #ifdef CONFIG_PER_VMA_LOCK
> > -	/* Unstable RCU readers are allowed to read this. */
> > +	/*
> > +	 * Used to keep track of the number of references taken by VMA read or
> > +	 * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
>
> I wonder about the "or write locks" part. The process of acquiring it uses
> VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> refcount? (the sentence could be read it way IMHO) It's vma being attached
> that does, AFAIK?

Right the intent is to say that, in the process of excluding readers, a write
lock can vary the reference count.

It's a pity we can't describe the refcnt in some sensible, logical way as it's
really being overloaded quite a bit for multiple things.

It really isn't actually keeping track of references (other than read locks
taken).

OK so I have updated this to say:

	 * Used to keep track of firstly, whether the VMA is attached, secondly,
	 * if attached, how many read locks are taken, and thirdly, if the
	 * VM_REFCNT_EXCLUDE_READERS_FLAG is set, whether any read locks held
	 * are currently in the process of being excluded.

>
> > +	 * indicating that a thread has entered __vma_enter_locked() and is
> > +	 * waiting on any outstanding read locks to exit.
> > +	 *
> > +	 * This value can be equal to:
> > +	 *
> > +	 * 0 - Detached.
>
> Is it worth saying that readers can't increment the refcount?

Added, updated to say:

	 * 0 - Detached. IMPORTANT: when the refcnt is zero, readers cannot
	 * increment it.

>
> > +	 * 1 - Unlocked or write-locked.
>
> "Attached and either unlocked or write-locked." ?
>
> (see how "write-locked" isn't reflected, I argued above)

I'm not sure what you mean, a write lock requires the VMA to be attached (or it
bails out on attempted write lock). So it's kind of implicit right?

But yes better to be explicit, so have replaced with your suggestion.

>
> > +	 *
> > +	 * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > +	 * write-locked with other threads having temporarily incremented the
> > +	 * reference count prior to determining it is write-locked and
> > +	 * decrementing it again.
>
> Ack.
>
> > +	 * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > +	 * __vma_exit_locked() completion which will decrement the reference
> > +	 * count to zero. IMPORTANT - at this stage no further readers can
> > +	 * increment the reference count. It can only be reduced.
> > +	 *
> > +	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > +	 * __vma_exit_locked() completion which will decrement the reference
> > +	 * count to one, OR a detached VMA waiting on a single spurious reader
> > +	 * to decrement reference count. IMPORTANT - as above, no further
> > +	 * readers can increment the reference count.
> > +	 *
> > +	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
>
> "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> (similarly in the previous paragraph)

I was trying to make it a bit more succinct that way I think, but agreed
it's unclear. Have replaced with:

	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either write-locking
	 * an attached VMA and has yet to invoke __vma_exit_locked(), OR a
	 * thread is detaching a VMA and is waiting on a single spurious reader
	 * in order to decrement the reference count. IMPORTANT - as above, no
	 * further readers can increment the reference count.
	 *
	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread either
	 * write-locking or detaching a VMA is waiting on readers to
	 * exit. IMPORTANT - as above, no ruther readers can increment the
	 * reference count.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
  2026-01-22 16:29     ` Lorenzo Stoakes
@ 2026-01-23 13:52       ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 13:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 04:29:35PM +0000, Lorenzo Stoakes wrote:
> On Thu, Jan 22, 2026 at 05:26:13PM +0100, Vlastimil Babka wrote:
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
> > > order to indicate that a VMA is in the process of having VMA read-locks
> > > excluded in __vma_enter_locked() (that is, first checking if there are any
> > > VMA read locks held, and if there are, waiting on them to be released).
> > >
> > > This happens when a VMA write lock is being established, or a VMA is being
> > > marked detached and discovers that the VMA reference count is elevated due
> > > to read-locks temporarily elevating the reference count only to discover a
> > > VMA write lock is in place.
> > >
> > > The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
> > > VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
> > > from the newly introduced VMA_*_BIT flags).
> > >
> > > Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
> > >
> > > Update comments to reflect this.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
> >
> > git grep tells me VMA_LOCK_OFFSET is still used in
> > tools/testing/vma/vma_internal.h but I guess it doesn't break the tests?
> >
>
> No :) I update it later in the series but it doesn't break the tests so no
> bisection hazard.

Apologies I was mistaken, however there is no impact on the tests and the VMA
flags/mmap_prepare series I sent makes radical changes there so I will update
this at a later date I think to keep everything sane.

todo++; :)

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 17:36   ` Vlastimil Babka
  2026-01-22 19:31     ` Suren Baghdasaryan
@ 2026-01-23 14:02     ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 14:02 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 06:36:03PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The is_vma_writer_only() function is misnamed - this isn't determining if
> > there is only a write lock, as it checks for the presence of the
> > VM_REFCNT_EXCLUDE_READERS_FLAG.
> >
> > Really, it is checking to see whether readers are excluded, with a
> > possibility of a false positive in the case of a detachment (there we
> > expect the vma->vm_refcnt to eventually be set to
> > VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
> > eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
> >
> > Rename the function accordingly.
> >
> > Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
> > in vma_refcount_put(), using the old value to determine what the reference
> > count ought to be after the operation is complete (ignoring racing
> > reference count adjustments).
> >
> > Wrap this into a __vma_refcount_put() function, which we can then utilise
> > in vma_mark_detached() and thus keep the refcount primitive usage
> > abstracted.
> >
> > Also adjust comments, removing duplicative comments covered elsewhere and
> > adding more to aid understanding.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Again very useful, thanks!

Thanks.

>
> > ---
> >  include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
> >  mm/mmap_lock.c            | 18 +++++-------
> >  2 files changed, 57 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index a764439d0276..0b3614aadbb4 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
> >  	vma->vm_lock_seq = UINT_MAX;
> >  }
> >
> > -static inline bool is_vma_writer_only(int refcnt)
> > +/**
> > + * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
> > + * excluded all VMA read locks.
> > + * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
> > + *
> > + * We may be raced by other readers temporarily incrementing the reference
> > + * count, though the race window is very small, this might cause spurious
> > + * wakeups.
>
> I think this part about spurious wakeups belongs more to the usage of the
> function in vma_refcount_put()? Because there are no wakeups done here. So
> it should be enough to explain how it can be false positive like in the
> paragraph below.

OK moved the paragraph to vma_refcount_put().

>
> > + *
> > + * In the case of a detached VMA, we may incorrectly indicate that readers are
> > + * excluded when one remains, because in that scenario we target a refcount of
> > + * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
> > + * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
> > + *
> > + * However, the race window for that is very small so it is unlikely.
> > + *
> > + * Returns: true if readers are excluded, false otherwise.
> > + */
> > +static inline bool are_readers_excluded(int refcnt)
>
> I wonder if a include/linux/ header should have such a generically named
> function (I understand it's necessary for it to be here). Maybe prefix the
> name and make the comment not a kerneldoc because it's going to be only the
> vma locking implementation using it and not the vma locking end-users? (i.e.
> it's "intermediate").

OK, renamed to __vma_are_readers_excluded() and dropped the kdoc.

>
> >  {
> >  	/*
> > -	 * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
> > -	 * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
> > -	 * attached. Waiting on a detached vma happens only in
> > -	 * vma_mark_detached() and is a rare case, therefore most of the time
> > -	 * there will be no unnecessary wakeup.
> > -	 *
> >  	 * See the comment describing the vm_area_struct->vm_refcnt field for
> >  	 * details of possible refcnt values.
> >  	 */
> > @@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
> >  		refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> >  }
> >
> > +static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
>
> Basically change are_readers_excluded() like this, with __vma prefix?

Yup did that already.

>
> But this one could IMHO use use some comment (also not kerneldoc) saying
> what the return value and *refcnt indicate?

I felt doing that would be overdocumenting... I've had people moan about that
before :) but sure makes sense.

Added:

/*
 * Actually decrement the VMA reference count.
 *
 * The functions sets *refcnt to the reference count immediately prior to the
 * decrement if refcnt is not NULL.
 *
 * Returns true if the decrement resulted in the VMA being detached
 * (i.e. reduced it to zero), or false otherwise.
 */

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-22 19:31     ` Suren Baghdasaryan
  2026-01-23  8:24       ` Vlastimil Babka
@ 2026-01-23 14:41       ` Lorenzo Stoakes
  2026-01-26 10:04         ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 14:41 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 11:31:05AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 9:36 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > The is_vma_writer_only() function is misnamed - this isn't determining if
> > > there is only a write lock, as it checks for the presence of the
> > > VM_REFCNT_EXCLUDE_READERS_FLAG.
> > >
> > > Really, it is checking to see whether readers are excluded, with a
> > > possibility of a false positive in the case of a detachment (there we
> > > expect the vma->vm_refcnt to eventually be set to
> > > VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
> > > eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
> > >
> > > Rename the function accordingly.
> > >
> > > Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
> > > in vma_refcount_put(), using the old value to determine what the reference
> > > count ought to be after the operation is complete (ignoring racing
> > > reference count adjustments).
>
> Sorry, by mistake I replied to an earlier version here:
> https://lore.kernel.org/all/CAJuCfpF-tVr==bCf-PXJFKPn99yRjfONeDnDtPvTkGUfyuvtcw@mail.gmail.com/
> Copying my comments here.
>
> IIUC, __refcount_dec_and_test() can decrement the refcount by only 1
> and the old value returned (oldcnt) will be the exact value that it
> was before this decrement. Therefore oldcnt - 1 must reflect the

Yes.

> refcount value after the decrement. It's possible the refcount gets

Well no...

> manipulated after this operation but that does not make this operation
> wrong. I don't quite understand why you think that's racy or finnicky.

...because of this.

I only mentioned in passing that you might get raced, which you agree with
so I think that's fine.

_I_ feel this function is _very_ finnicky given the various caveats
required to understand exactly what's happening here. But clearly that's
distracting, so I'll rephrase it.

Have updated commit msg to say:

"
Relatedly, we use a __refcount_dec_and_test() primitive directly in
vma_refcount_put(), using the old value to determine what the reference
count ought to be after the operation is complete (ignoring racing
reference count adjustments).

Wrap this into a __vma_refcount_put() function, which we can then utilise
in vma_mark_detached() and thus keep the refcount primitive usage
abstracted.

This reduces duplication in the two invocations of this function.
"

(Adding the point about duplication since it wasn't clear before).

> > > +{
> > > +     int oldcnt;
> > > +     bool detached;
> > > +
> > > +     detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> > > +     if (refcnt)
> > > +             *refcnt = oldcnt - 1;
> > > +     return detached;
>
> IIUC there is always a connection between detached and *refcnt
> resulting value. If detached==true then the resulting *refcnt has to
> be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
> new count:
>
> static inline int __vma_refcount_put(struct vm_area_struct *vma)
> {
>        int oldcnt;
>
>        __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);

You can't do this as it's __must_check... :)

So have to replace with __refcount_dec(), which is a void function.

>        return oldcnt - 1;
> }
>
> And later:
>
> newcnt = __vma_refcount_put(&vma->vm_refcnt);
> detached = newcnt == 0;

This is kind of horrible though.

Maybe better to just do:

	newcnt = __vma_refcount_put(vma);
	...
	if (newcnt && __vma_are_readers_excluded(newcnt))
		...

And:

	if (unlikely(__vma_refcount_put(vma))) {
		...
	}

And now we have vma->vm_refcnt documented clearly we can safely assume
developers understand what this means.

>
> > > +}
> > > +
> > > +/**
> > > + * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
> > > + * read-lock being dropped.
> > > + * @vma: The VMA whose reference count we wish to decrement.
> > > + *
> > > + * If we were the last reader, wake up threads waiting to obtain an exclusive
> > > + * lock.
> > > + */
> > >  static inline void vma_refcount_put(struct vm_area_struct *vma)
> > >  {
> > > -     /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> > > +     /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
> > >       struct mm_struct *mm = vma->vm_mm;
> > > -     int oldcnt;
> > > +     int refcnt;
> > > +     bool detached;
> > >
> > >       rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> > > -     if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
> > >
> > > -             if (is_vma_writer_only(oldcnt - 1))
> > > -                     rcuwait_wake_up(&mm->vma_writer_wait);
> > > -     }
> > > +     detached = __vma_refcount_put(vma, &refcnt);
> > > +     /*
> > > +      * __vma_enter_locked() may be sleeping waiting for readers to drop
> > > +      * their reference count, so wake it up if we were the last reader
> > > +      * blocking it from being acquired.
> > > +      */
> > > +     if (!detached && are_readers_excluded(refcnt))
> > > +             rcuwait_wake_up(&mm->vma_writer_wait);
> > >  }
> > >
> > >  /*
> > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > > index 75dc098aea14..ebacb57e5f16 100644
> > > --- a/mm/mmap_lock.c
> > > +++ b/mm/mmap_lock.c
> > > @@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> > >
> > >  void vma_mark_detached(struct vm_area_struct *vma)
> > >  {
> > > +     bool detached;
> > > +
> > >       vma_assert_write_locked(vma);
> > >       vma_assert_attached(vma);
> > >
> > >       /*
> > > -      * We are the only writer, so no need to use vma_refcount_put().
> > > -      * The condition below is unlikely because the vma has been already
> > > -      * write-locked and readers can increment vm_refcnt only temporarily
>
> I think the above part of the comment is still important and should be
> kept intact.

I think it's confusing, because 'we are the only writer' is not clear as to
why it obviates the need to call vma_refcount_put().

In fact vma_refcount_put() explicitly invokes a lockdep read lock drop
which would be incorrect to invoke here. Also I guess the other point here
is we don't do wake ups because nobody else will be waiting.

I think updating this to actually explain the context would be hugely
distracting and not all that useful, but leaving it as-is is confusing.

So I think the best thing to do is to add back the thing about the
condition, which I've done but but placed it above unlikely(!detached) as:

	/*
	 * This condition - that the VMA is still attached (refcnt > 0) - is
	 * unlikely, because the vma has been already write-locked and readers
	 * can increment vm_refcnt only temporarily before they check
	 * vm_lock_seq, realize the vma is locked and drop back the
	 * vm_refcnt. That is a narrow window for observing a raised vm_refcnt.
	 *
	 * See the comment describing the vm_area_struct->vm_refcnt field for
	 * details of possible refcnt values.
	 */

>
> > > -      * before they check vm_lock_seq, realize the vma is locked and drop
> > > -      * back the vm_refcnt. That is a narrow window for observing a raised
> > > -      * vm_refcnt.
> > > -      *
> > >        * See the comment describing the vm_area_struct->vm_refcnt field for
> > >        * details of possible refcnt values.
> > >        */
> > > -     if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> > > +     detached = __vma_refcount_put(vma, NULL);
> > > +     if (unlikely(!detached)) {
> > >               /* Wait until vma is detached with no readers. */
> > >               if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > > -                     bool detached;
> > > -
> > > +                     /*
> > > +                      * Once this is complete, no readers can increment the
> > > +                      * reference count, and the VMA is marked detached.
> > > +                      */
> > >                       __vma_exit_locked(vma, &detached);
> > >                       WARN_ON_ONCE(!detached);
> > >               }
> > > --
> > > 2.52.0
> >


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-23  8:24       ` Vlastimil Babka
@ 2026-01-23 14:52         ` Lorenzo Stoakes
  2026-01-23 15:05           ` Vlastimil Babka
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 14:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 09:24:54AM +0100, Vlastimil Babka wrote:
> On 1/22/26 20:31, Suren Baghdasaryan wrote:
> >> > +     int oldcnt;
> >> > +     bool detached;
> >> > +
> >> > +     detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> >> > +     if (refcnt)
> >> > +             *refcnt = oldcnt - 1;
> >> > +     return detached;
> >
> > IIUC there is always a connection between detached and *refcnt
> > resulting value. If detached==true then the resulting *refcnt has to
> > be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
> > new count:
> >
> > static inline int __vma_refcount_put(struct vm_area_struct *vma)
> > {
> >        int oldcnt;
> >
> >        __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> >        return oldcnt - 1;
> > }
> >
> > And later:
> >
> > newcnt = __vma_refcount_put(&vma->vm_refcnt);
> > detached = newcnt == 0;
>
> If we go that way (both ways are fine with me) I'd suggest we rename the
> function to __vma_refcount_put_return to make this more obvious. (c.f.
> atomic_dec_return, lockref_put_return).
>

That's kind of horrible?

The lockref_put_return() seems to encode even more in it:

/**
 * lockref_put_return - Decrement reference count if possible
 * @lockref: pointer to lockref structure
 *
 * Decrement the reference count and return the new value.
 * If the lockref was dead or locked, return -1.
 */

But I guess it's still returning, it's just a weird convention, and not one
refcount uses, but perhaps because that uses output parameters.

I'll rename it I guess on the atomic basis but I just find the idea of
suffixing 'return' on a function that returns a value really... horrible.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-22 19:32   ` Suren Baghdasaryan
  2026-01-22 19:41     ` Suren Baghdasaryan
@ 2026-01-23 15:00     ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:00 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 11:32:17AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > The code is littered with inscrutable and duplicative lockdep incantations,
> > replace these with defines which explain what is going on and add
> > commentary to explain what we're doing.
> >
> > If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> > remains meaningful.
> >
> > These are self-documenting and aid readability of the code.
> >
> > Additionally, instead of using the confusing rwsem_*() form for something
> > that is emphatically not an rwsem, we instead explicitly use
> > lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> > are doing something rather custom here and these make more sense to use.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Very nice! Thank you.
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks!


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-23 14:52         ` Lorenzo Stoakes
@ 2026-01-23 15:05           ` Vlastimil Babka
  2026-01-23 15:07             ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23 15:05 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Suren Baghdasaryan, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On 1/23/26 15:52, Lorenzo Stoakes wrote:
> On Fri, Jan 23, 2026 at 09:24:54AM +0100, Vlastimil Babka wrote:
> 
> That's kind of horrible?
> 
> The lockref_put_return() seems to encode even more in it:
> 
> /**
>  * lockref_put_return - Decrement reference count if possible
>  * @lockref: pointer to lockref structure
>  *
>  * Decrement the reference count and return the new value.
>  * If the lockref was dead or locked, return -1.
>  */
> 
> But I guess it's still returning, it's just a weird convention, and not one
> refcount uses, but perhaps because that uses output parameters.

Oops I missed that -1 detail in that one, didn't mean to copy that part.

> I'll rename it I guess on the atomic basis but I just find the idea of
> suffixing 'return' on a function that returns a value really... horrible.

It's because I though it's common that things called _put() either return
nothing, or if they return 1/true it means "that was the last ref, we
removed" and this is returning something else.

But I admit it's just my feeling, there's e.g. kref_put() like this but I
haven't done a full research.

> Thanks, Lorenzo



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment
  2026-01-22 17:28     ` Suren Baghdasaryan
@ 2026-01-23 15:06       ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:06 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

Sorry missed this before moving on to 3/10, responses below.

On Thu, Jan 22, 2026 at 09:28:05AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 8:48 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > The possible vma->vm_refcnt values are confusing and vague, explain in
> > > detail what these can be in a comment describing the vma->vm_refcnt field
> > > and reference this comment in various places that read/write this field.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >
> > Thanks, very useful. Forgive my nitpicks :) It's because it's tricky so best
> > try to be as precise as possible, I believe.
>
> Another thanks from me.

Ack

>
> >
> > > ---
> > >  include/linux/mm_types.h  | 39 +++++++++++++++++++++++++++++++++++++--
> > >  include/linux/mmap_lock.h |  7 +++++++
> > >  mm/mmap_lock.c            |  6 ++++++
> > >  3 files changed, 50 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > > index 94de392ed3c5..e5ee66f84d9a 100644
> > > --- a/include/linux/mm_types.h
> > > +++ b/include/linux/mm_types.h
> > > @@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
> > >   * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
> > >   * vma_start_read() that the reference count should be left alone.
> > >   *
> > > - * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
> > > + * See the comment describing vm_refcnt in vm_area_struct for details as to
> > > + * which values the VMA reference count can be.
> > >   */
> > >  #define VM_REFCNT_EXCLUDE_READERS_BIT        (30)
> > >  #define VM_REFCNT_EXCLUDE_READERS_FLAG       (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
> > > @@ -989,7 +990,41 @@ struct vm_area_struct {
> > >       struct vma_numab_state *numab_state;    /* NUMA Balancing state */
> > >  #endif
> > >  #ifdef CONFIG_PER_VMA_LOCK
> > > -     /* Unstable RCU readers are allowed to read this. */
> > > +     /*
> > > +      * Used to keep track of the number of references taken by VMA read or
> > > +      * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
> >
> > I wonder about the "or write locks" part. The process of acquiring it uses
> > VM_REFCNT_EXCLUDE_READERS_FLAG but then the writer doesn't hold a 1
> > refcount? (the sentence could be read it way IMHO) It's vma being attached
> > that does, AFAIK?
>
> Yes, since there can be only one write-locker it only has to set
> VM_REFCNT_EXCLUDE_READERS_FLAG bit to announce its presence, without
> incrementing the refcount.

See reply to Vlastimil, I've reworked this to:

	 * Used to keep track of firstly, whether the VMA is attached, secondly,
	 * if attached, how many read locks are taken, and thirdly, if the
	 * VM_REFCNT_EXCLUDE_READERS_FLAG is set, whether any read locks held
	 * are currently in the process of being excluded.

>
> >
> > > +      * indicating that a thread has entered __vma_enter_locked() and is
> > > +      * waiting on any outstanding read locks to exit.
> > > +      *
> > > +      * This value can be equal to:
> > > +      *
> > > +      * 0 - Detached.
> >
> > Is it worth saying that readers can't increment the refcount?
>
> Yes, you mention that for VM_REFCNT_EXCLUDE_READERS_FLAG value. The
> same IMPORTANT notice applies here.

Yup already done as per reply to Vlastimil.

>
> >
> > > +      * 1 - Unlocked or write-locked.
> >
> > "Attached and either unlocked or write-locked." ?
>
> Agree. That's more specific.
> Should we also mention here that unlocked vs write-locked distinction
> is determined using the vm_lock_seq member?

Good idea. Have updated to:

	 * 1 - Attached and either unlocked or write-locked. Write locks are
	 * identified via __is_vma_write_locked() which checks for equality of
	 * vma->vm_lock_seq and mm->mm_lock_seq.

Note I felt it'd be distracting to say 'vma->vm_mm->mm_lock_seq.sequence' :)
mm->mm_lock_seq and hey go look at the function to see the gory details.

>
> >
> > (see how "write-locked" isn't reflected, I argued above)
> >
> > > +      *
> > > +      * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
> > > +      * write-locked with other threads having temporarily incremented the
> > > +      * reference count prior to determining it is write-locked and
> > > +      * decrementing it again.
> >
> > Ack.
> >
> > > +      * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
> > > +      * __vma_exit_locked() completion which will decrement the reference
> > > +      * count to zero. IMPORTANT - at this stage no further readers can
> > > +      * increment the reference count. It can only be reduced.
> > > +      *
> > > +      * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
> > > +      * __vma_exit_locked() completion which will decrement the reference
> > > +      * count to one, OR a detached VMA waiting on a single spurious reader
> > > +      * to decrement reference count. IMPORTANT - as above, no further
> > > +      * readers can increment the reference count.
> > > +      *
> > > +      * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
> >
> > "VMA is waiting" sounds weird? a thread might be, but VMA itself?
> > (similarly in the previous paragraph)
>
> Maybe "VMA in the process of being write-locked or detached, which got
> blocked due to the spurious readers that temporarily raised the
> refcount"?
>

Well no, because you might have legit read locks held that you're waiting on
right? That are not spurious? Can be a mix of spurious and legit (though
unlikely, since by then you'd hope the READ_ONCE() check would succeed, but hey
there's weird out-of-order arches out there).

As per reply to Vlastimil I updated it to:

	 * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either write-locking
	 * an attached VMA and has yet to invoke __vma_exit_locked(), OR a
	 * thread is detaching a VMA and is waiting on a single spurious reader
	 * in order to decrement the reference count. IMPORTANT - as above, no
	 * further readers can increment the reference count.
	 *
	 * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - A thread is either
	 * write-locking or detaching a VMA is waiting on readers to
	 * exit. IMPORTANT - as above, no ruther readers can increment the
	 * reference count.

Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-23 15:05           ` Vlastimil Babka
@ 2026-01-23 15:07             ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:07 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 04:05:39PM +0100, Vlastimil Babka wrote:
> On 1/23/26 15:52, Lorenzo Stoakes wrote:
> > On Fri, Jan 23, 2026 at 09:24:54AM +0100, Vlastimil Babka wrote:
> >
> > That's kind of horrible?
> >
> > The lockref_put_return() seems to encode even more in it:
> >
> > /**
> >  * lockref_put_return - Decrement reference count if possible
> >  * @lockref: pointer to lockref structure
> >  *
> >  * Decrement the reference count and return the new value.
> >  * If the lockref was dead or locked, return -1.
> >  */
> >
> > But I guess it's still returning, it's just a weird convention, and not one
> > refcount uses, but perhaps because that uses output parameters.
>
> Oops I missed that -1 detail in that one, didn't mean to copy that part.
>
> > I'll rename it I guess on the atomic basis but I just find the idea of
> > suffixing 'return' on a function that returns a value really... horrible.
>
> It's because I though it's common that things called _put() either return
> nothing, or if they return 1/true it means "that was the last ref, we
> removed" and this is returning something else.
>
> But I admit it's just my feeling, there's e.g. kref_put() like this but I
> haven't done a full research.
>
> > Thanks, Lorenzo
>

Yup, on balance I thought probably best to add so did so.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-23  8:41       ` Vlastimil Babka
@ 2026-01-23 15:08         ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:08 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 09:41:24AM +0100, Vlastimil Babka wrote:
> On 1/22/26 20:41, Suren Baghdasaryan wrote:
> >> > +/*
> >> > + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> >> > + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> >> > + *
> >> > + * Read locks act as shared locks which exclude an exclusive lock being
> >> > + * taken. We therefore mark these accordingly on read lock acquire/release.
> >> > + *
> >> > + * Write locks are acquired exclusively per-VMA, but released in a shared
> >> > + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> >> > + * that write lock is de-acquired.
> >> > + *
> >> > + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> >> > + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
> >> > + *
> >> > + * We do, however, want to indicate that during either acquisition of a VMA
> >> > + * write lock or detachment of a VMA that we require the lock held be exclusive,
> >> > + * so we utilise lockdep to do so.
> >> > + */
> >> > +#define __vma_lockdep_acquire_read(vma) \
> >
> > One question I forgot to ask. Are you adding "__" prefix to indicate
> > no other users should be using them or for some other reason?
>
> I'd say it's the case of 'it has to be in a "public" header but not expected
> to be used directly" by the end-users of the header'.

Yup, this is why I did that.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines
  2026-01-23  8:48   ` Vlastimil Babka
@ 2026-01-23 15:10     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 09:48:28AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > The code is littered with inscrutable and duplicative lockdep incantations,
> > replace these with defines which explain what is going on and add
> > commentary to explain what we're doing.
> >
> > If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> > remains meaningful.
> >
> > These are self-documenting and aid readability of the code.
> >
> > Additionally, instead of using the confusing rwsem_*() form for something
> > that is emphatically not an rwsem, we instead explicitly use
> > lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> > are doing something rather custom here and these make more sense to use.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

>
> Nits:
>
> > ---
> >  include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
> >  mm/mmap_lock.c            | 10 +++++-----
> >  2 files changed, 37 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index 0b3614aadbb4..da63b1be6ec0 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
> >
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/*
> > + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> > + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> > + *
> > + * Read locks act as shared locks which exclude an exclusive lock being
> > + * taken. We therefore mark these accordingly on read lock acquire/release.
> > + *
> > + * Write locks are acquired exclusively per-VMA, but released in a shared
> > + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> > + * that write lock is de-acquired.
>
> de-acquired -> released?

Yeah don't know why I said it that way :) Fixed.

>
> > + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> > + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
>
> "... which has to be held in order to take a VMA write lock" ?

Slightly edited to:

 * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
 * is the fact that, of course, we do lockdep-track the mmap lock rwsem which
 * must be held when taking a VMA write lock.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path
  2026-01-23  8:54   ` Vlastimil Babka
@ 2026-01-23 15:10     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 09:54:48AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > We're doing precisely the same thing that __vma_exit_locked() does, so
> > de-duplicate this code and keep the refcount primitive in one place.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>

Thanks!


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path
  2026-01-22 19:39   ` Suren Baghdasaryan
@ 2026-01-23 15:11     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 15:11 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 11:39:34AM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > We're doing precisely the same thing that __vma_exit_locked() does, so
> > de-duplicate this code and keep the refcount primitive in one place.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>
>

Thanks!


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 20:55     ` Andrew Morton
@ 2026-01-23 16:15       ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 16:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Suren Baghdasaryan, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 12:55:29PM -0800, Andrew Morton wrote:
> On Thu, 22 Jan 2026 12:15:20 -0800 Suren Baghdasaryan <surenb@google.com> wrote:
>
> > > +       /* vma should remain attached. */
> > > +       if (locked)
> > > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> >
> > I'm wary of calling functions from WARN_ON_ONCE() statements. If
> > someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> > call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> > but it's because I have been bitten by that before...
>
> Yes please.  The elision is desirable if the function has no side-effects, but
> __vma_exit_exclusive_locked() changes stuff.

Ack will update in this case :)

>
> Someone(tm) should check for this.  A pathetically partial grep turns
> up plenty of things:
>
> mm/slab_common.c:	if (head && !WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&head_gp_snap)))
> mm/slab_common.c:	if (!WARN_ON_ONCE(!poll_state_synchronize_rcu_full(&bnode->gp_snap))) {
> mm/page-writeback.c:		WARN_ON_ONCE(atomic_long_add_return(delta,
> mm/page_isolation.c:		WARN_ON_ONCE(!pageblock_unisolate_and_move_free_pages(zone, page));
> mm/page_alloc.c:	VM_WARN_ONCE(get_pageblock_isolate(page),
> mm/numa_memblks.c:	WARN_ON(memblock_clear_hotplug(0, max_addr));
> mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.memory, NUMA_NO_NODE));
> mm/numa_memblks.c:	WARN_ON(memblock_set_node(0, max_addr, &memblock.reserved,
> mm/zsmalloc.c:		WARN_ON(!zpdesc_trylock(zpdesc));
>

*Adds to todo*

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-23  9:16   ` Vlastimil Babka
@ 2026-01-23 16:17     ` Lorenzo Stoakes
  2026-01-23 16:28       ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 16:17 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 10:16:22AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > These functions are very confusing indeed. 'Entering' a lock could be
> > interpreted as acquiring it, but this is not what these functions are
> > interacting with.
> >
> > Equally they don't indicate at all what kind of lock we are 'entering' or
> > 'exiting'. Finally they are misleading as we invoke these functions when we
> > already hold a write lock to detach a VMA.
> >
> > These functions are explicitly simply 'entering' and 'exiting' a state in
> > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > as being write-locked, or mark the VMA detached.
>
> If we hold a write lock (i.e. in vma_mark_detached()), that normally means
> it's also exclusive?
> And if we talk about the state between __vma_enter_exclusive_locked and
> __vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
> exactly the same lock as what we call "VMA write lock" right, so what lock
> is it?

Well it's not exclusive because spurious reader refcount increments are
possible (unless already detached of course...)

We are _excluding_ readers, including spurious ones realyl.

>
> Maybe it would help if we stopped calling this internal thing a "lock"?
> Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
> have any great suggestion.
>
> Maybe call those functions __vma_exclude_readers_start() and
> __vma_exclude_readers_end() instead, or something?

Yeah that's probably better actually, will rename accordingly.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked()
  2026-01-22 21:55   ` Suren Baghdasaryan
@ 2026-01-23 16:21     ` Vlastimil Babka
  2026-01-23 17:42       ` Suren Baghdasaryan
  2026-01-23 18:44       ` Lorenzo Stoakes
  0 siblings, 2 replies; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23 16:21 UTC (permalink / raw)
  To: Suren Baghdasaryan, Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Michal Hocko, Shakeel Butt, Jann Horn, linux-mm,
	linux-kernel, linux-rt-devel, Peter Zijlstra, Ingo Molnar,
	Will Deacon, Boqun Feng, Waiman Long, Sebastian Andrzej Siewior,
	Clark Williams, Steven Rostedt

On 1/22/26 22:55, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
>>
>> The function is a little confusing, clean it up a little then add a
>> descriptive comment.
> 
> I appreciate the descriptive comment but what exactly was confusing in
> this function?
> 
>>
>> No functional change intended.
>>
>> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>> ---
>>  include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
>>  1 file changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
>> index 873bc5f3c97c..b00d34b5ad10 100644
>> --- a/include/linux/mmap_lock.h
>> +++ b/include/linux/mmap_lock.h
>> @@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
>>         vma_refcount_put(vma);
>>  }
>>
>> -/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
>> -static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
>> +/*
>> + * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
>> + * write lock is held.
>> + *
>> + * Returns true if write-locked, otherwise false.
>> + *
>> + * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.

Could it also say to what it's updated to? Or is it too obvious?

> 
> True, this does not result in a functional change because we do not
> use mm_lock_seq if __is_vma_write_locked() succeeds. However this
> seems to add additional gotcha that you need to remember. Any reason
> why?

Actually I wonder if it's really worth returning the mm_lock_seq and passing
it to __vma_start_write(), which could just determine it on its own. It
would simplify things.

>> + */
>> +static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
>> +                                        unsigned int *mm_lock_seq)
>>  {
>> -       mmap_assert_write_locked(vma->vm_mm);
>> +       struct mm_struct *mm = vma->vm_mm;
>> +       const unsigned int seq = mm->mm_lock_seq.sequence;
>> +
>> +       mmap_assert_write_locked(mm);
>>
>>         /*
>>          * current task is holding mmap_write_lock, both vma->vm_lock_seq and
>>          * mm->mm_lock_seq can't be concurrently modified.
>>          */
>> -       *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
>> -       return (vma->vm_lock_seq == *mm_lock_seq);
>> +       if (vma->vm_lock_seq == seq)
>> +               return true;
>> +       *mm_lock_seq = seq;
>> +       return false;
>>  }
>>
>>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
>> --
>> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-23 16:17     ` Lorenzo Stoakes
@ 2026-01-23 16:28       ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 16:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 04:17:54PM +0000, Lorenzo Stoakes wrote:
> On Fri, Jan 23, 2026 at 10:16:22AM +0100, Vlastimil Babka wrote:
> > On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > > These functions are very confusing indeed. 'Entering' a lock could be
> > > interpreted as acquiring it, but this is not what these functions are
> > > interacting with.
> > >
> > > Equally they don't indicate at all what kind of lock we are 'entering' or
> > > 'exiting'. Finally they are misleading as we invoke these functions when we
> > > already hold a write lock to detach a VMA.
> > >
> > > These functions are explicitly simply 'entering' and 'exiting' a state in
> > > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > > as being write-locked, or mark the VMA detached.
> >
> > If we hold a write lock (i.e. in vma_mark_detached()), that normally means
> > it's also exclusive?
> > And if we talk about the state between __vma_enter_exclusive_locked and
> > __vma_exit_exclusive_locked() as "holding an EXCLUSIVE lock", it's not
> > exactly the same lock as what we call "VMA write lock" right, so what lock
> > is it?
>
> Well it's not exclusive because spurious reader refcount increments are
> possible (unless already detached of course...)
>
> We are _excluding_ readers, including spurious ones realyl.
>
> >
> > Maybe it would help if we stopped calling this internal thing a "lock"?
> > Except we use it for lockdep's lock_acquire_exclusive(). Sigh, sorry I don't
> > have any great suggestion.
> >
> > Maybe call those functions __vma_exclude_readers_start() and
> > __vma_exclude_readers_end() instead, or something?
>
> Yeah that's probably better actually, will rename accordingly.

Actually went with __vma_start_exclude_readers() and
__vma_end_exclude_readers() as felt they read better.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked()
  2026-01-22 20:15   ` Suren Baghdasaryan
  2026-01-22 20:55     ` Andrew Morton
@ 2026-01-23 16:33     ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 16:33 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 12:15:20PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > These functions are very confusing indeed. 'Entering' a lock could be
> > interpreted as acquiring it, but this is not what these functions are
> > interacting with.
> >
> > Equally they don't indicate at all what kind of lock we are 'entering' or
> > 'exiting'. Finally they are misleading as we invoke these functions when we
> > already hold a write lock to detach a VMA.
> >
> > These functions are explicitly simply 'entering' and 'exiting' a state in
> > which we hold the EXCLUSIVE lock in order that we can either mark the VMA
> > as being write-locked, or mark the VMA detached.
> >
> > Rename the functions accordingly, and also update
> > __vma_exit_exclusive_locked() to return detached state with a __must_check
> > directive, as it is simply clumsy to pass an output pointer here to
> > detached state and inconsistent vs. __vma_enter_exclusive_locked().
> >
> > Finally, remove the unnecessary 'inline' directives.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  include/linux/mmap_lock.h |  4 +--
> >  mm/mmap_lock.c            | 60 +++++++++++++++++++++++++--------------
> >  2 files changed, 41 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> > index da63b1be6ec0..873bc5f3c97c 100644
> > --- a/include/linux/mmap_lock.h
> > +++ b/include/linux/mmap_lock.h
> > @@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
> >         __vma_lockdep_release_read(vma);
> >         detached = __vma_refcount_put(vma, &refcnt);
> >         /*
> > -        * __vma_enter_locked() may be sleeping waiting for readers to drop
> > -        * their reference count, so wake it up if we were the last reader
> > +        * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
> > +        * drop their reference count, so wake it up if we were the last reader
> >          * blocking it from being acquired.
> >          */
> >         if (!detached && are_readers_excluded(refcnt))
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index 7a0361cff6db..f73221174a8b 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > -static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> > +/*
> > + * Now that all readers have been evicted, mark the VMA as being out of the
> > + * 'exclude readers' state.
> > + *
> > + * Returns true if the VMA is now detached, otherwise false.
> > + */
> > +static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> >  {
> > -       *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -                                         &vma->vm_refcnt);
> > +       bool detached;
> > +
> > +       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +                                        &vma->vm_refcnt);
> >         __vma_lockdep_release_exclusive(vma);
> > +       return detached;
> >  }
> >
> >  /*
> > - * __vma_enter_locked() returns 0 immediately if the vma is not
> > - * attached, otherwise it waits for any current readers to finish and
> > - * returns 1.  Returns -EINTR if a signal is received while waiting.
> > + * Mark the VMA as being in a state of excluding readers, check to see if any
> > + * VMA read locks are indeed held, and if so wait for them to be released.
> > + *
> > + * Note that this function pairs with vma_refcount_put() which will wake up this
> > + * thread when it detects that the last reader has released its lock.
> > + *
> > + * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > + * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > + * is permitted to kill it.
> > + *
> > + * The function will return 0 immediately if the VMA is detached, and 1 once the
> > + * VMA has evicted all readers, leaving the VMA exclusively locked.
>
> The wording here is a bit misleading. We do not evict the readers,
> just wait for them to complete and exit.

OK updated to:

 * The function will return 0 immediately if the VMA is detached, or wait for
 * readers and return 1 once they have all exited, leaving the VMA exclusively
 * locked.

>
> > + *
> > + * If the function returns 1, the caller is required to invoke
> > + * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > + *
> > + * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > + * may also return -EINTR to indicate a fatal signal was received while waiting.
> >   */
> > -static inline int __vma_enter_locked(struct vm_area_struct *vma,
> > +static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> >                 bool detaching, int state)
> >  {
> >         int err;
> > @@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> >                    state);
> >         if (err) {
> > -               bool detached;
> > -
> > -               __vma_exit_locked(vma, &detached);
> > -               if (detached) {
> > +               if (__vma_exit_exclusive_locked(vma)) {
> >                         /*
> >                          * The wait failed, but the last reader went away
> > -                        * as well.  Tell the caller the VMA is detached.
> > +                        * as well. Tell the caller the VMA is detached.
> >                          */
> >                         WARN_ON_ONCE(!detaching);
> >                         err = 0;
> > @@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >  {
> >         int locked;
> >
> > -       locked = __vma_enter_locked(vma, false, state);
> > +       locked = __vma_enter_exclusive_locked(vma, false, state);
> >         if (locked < 0)
> >                 return locked;
> >
> > @@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >          */
> >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> >
> > -       if (locked) {
> > -               bool detached;
> > -
> > -               __vma_exit_locked(vma, &detached);
> > -               WARN_ON_ONCE(detached); /* vma should remain attached */
> > -       }
> > +       /* vma should remain attached. */
> > +       if (locked)
> > +               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
>
> I'm wary of calling functions from WARN_ON_ONCE() statements. If
> someone decides to replace WARN_ON_ONCE() with VM_WARN_ON_ONCE(), the
> call will disappear when CONFIG_DEBUG_VM=n. Maybe I'm being paranoid
> but it's because I have been bitten by that before...

OK replaced with:

	if (locked) {
		bool detached = __vma_end_exclude_readers(vma);

		/* The VMA should remain attached. */
		WARN_ON_ONCE(detached);
	}

Note that this, and indeed the comment above actually, both get replaced in a
later commit :) but I will action this changes regardless to stay consistent.


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep
  2026-01-22 13:02 ` [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
  2026-01-22 22:02   ` Suren Baghdasaryan
@ 2026-01-23 16:55   ` Vlastimil Babka
  2026-01-23 18:49     ` Lorenzo Stoakes
  1 sibling, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23 16:55 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:02, Lorenzo Stoakes wrote:
> +/**
> + * vma_assert_locked() - assert that @vma holds either a VMA read or a VMA write
> + * lock and is not detached.
> + * @vma: The VMA to assert.
> + */
>  static inline void vma_assert_locked(struct vm_area_struct *vma)
>  {
> -	unsigned int mm_lock_seq;
> +	unsigned int refs;
> 
>  	/*
>  	 * See the comment describing the vm_area_struct->vm_refcnt field for
>  	 * details of possible refcnt values.
>  	 */
> -	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
> -		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
> +
> +	/*
> +	 * If read-locked or currently excluding readers, then the VMA is
> +	 * locked.
> +	 */
> +#ifdef CONFIG_LOCKDEP
> +	if (lock_is_held(&vma->vmlock_dep_map))
> +		return;
> +#endif
> +
> +	refs = refcount_read(&vma->vm_refcnt);
> +
> +	/*
> +	 * In this case we're either read-locked, write-locked with temporary
> +	 * readers, or in the midst of excluding readers, all of which means
> +	 * we're locked.
> +	 */
> +	if (refs > 1)
> +		return;
> +
> +	/* It is a bug for the VMA to be detached here. */
> +	VM_BUG_ON_VMA(!refs, vma);
> +

Yeah previously this function was all VM_BUG_ON() but since that's now
frowned upon, can we not do it anymore?
Seem we do have VM_WARN_ON_ONCE_VMA().

> +	/*
> +	 * OK, the VMA has a reference count of 1 which means it is either
> +	 * unlocked and attached or write-locked, so assert that it is
> +	 * write-locked.
> +	 */
> +	vma_assert_write_locked(vma);
>  }
> 
>  static inline bool vma_is_attached(struct vm_area_struct *vma)
> --
> 2.52.0



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
  2026-01-22 22:12   ` Suren Baghdasaryan
@ 2026-01-23 17:10   ` Vlastimil Babka
  2026-01-23 18:51     ` Lorenzo Stoakes
  2026-01-23 23:35   ` Hillf Danton
  2 siblings, 1 reply; 73+ messages in thread
From: Vlastimil Babka @ 2026-01-23 17:10 UTC (permalink / raw)
  To: Lorenzo Stoakes, Andrew Morton
  Cc: David Hildenbrand, Liam R . Howlett, Mike Rapoport,
	Suren Baghdasaryan, Michal Hocko, Shakeel Butt, Jann Horn,
	linux-mm, linux-kernel, linux-rt-devel, Peter Zijlstra,
	Ingo Molnar, Will Deacon, Boqun Feng, Waiman Long,
	Sebastian Andrzej Siewior, Clark Williams, Steven Rostedt

On 1/22/26 14:02, Lorenzo Stoakes wrote:
> Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> be changed underneath us. This will be the case if EITHER the VMA lock or
> the mmap lock is held.
> 
> In order to do so, we introduce a new assert vma_assert_stablised() - this
> will make a lockdep assert if lockdep is enabled AND the VMA is
> read-locked.
> 
> Currently lockdep tracking for VMA write locks is not implemented, so it
> suffices to check in this case that we have either an mmap read or write
> semaphore held.
> 
> Note that because the VMA lock uses the non-standard vmlock_dep_map naming
> convention, we cannot use lockdep_assert_is_write_held() so have to open
> code this ourselves via lockdep-asserting that
> lock_is_held_type(&vma->vmlock_dep_map, 0).
> 
> We have to be careful here - for instance when merging a VMA, we use the
> mmap write lock to stabilise the examination of adjacent VMAs which might
> be simultaneously VMA read-locked whilst being faulted in.
> 
> If we were to assert VMA read lock using lockdep we would encounter an
> incorrect lockdep assert.
> 
> Also, we have to be careful about asserting mmap locks are held - if we try
> to address the above issue by first checking whether mmap lock is held and
> if so asserting it via lockdep, we may find that we were raced by another
> thread acquiring an mmap read lock simultaneously that either we don't
> own (and thus can be released any time - so we are not stable) or was
> indeed released since we last checked.
> 
> So to deal with these complexities we end up with either a precise (if
> lockdep is enabled) or imprecise (if not) approach - in the first instance
> we assert the lock is held using lockdep and thus whether we own it.
> 
> If we do own it, then the check is complete, otherwise we must check for
> the VMA read lock being held (VMA write lock implies mmap write lock so the
> mmap lock suffices for this).
> 
> If lockdep is not enabled we simply check if the mmap lock is held and risk
> a false negative (i.e. not asserting when we should do).
> 
> There are a couple places in the kernel where we already do this
> stabliisation check - the anon_vma_name() helper in mm/madvise.c and
> vma_flag_set_atomic() in include/linux/mm.h, which we update to use
> vma_assert_stabilised().
> 
> This change abstracts these into vma_assert_stabilised(), uses lockdep if
> possible, and avoids a duplicate check of whether the mmap lock is held.
> 
> This is also self-documenting and lays the foundations for further VMA
> stability checks in the code.
> 
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

LGTM, thanks!

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked()
  2026-01-23 16:21     ` Vlastimil Babka
@ 2026-01-23 17:42       ` Suren Baghdasaryan
  2026-01-23 18:44       ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-23 17:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Lorenzo Stoakes, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 8:21 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 1/22/26 22:55, Suren Baghdasaryan wrote:
> > On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> >>
> >> The function is a little confusing, clean it up a little then add a
> >> descriptive comment.
> >
> > I appreciate the descriptive comment but what exactly was confusing in
> > this function?
> >
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >> ---
> >>  include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
> >>  1 file changed, 18 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> >> index 873bc5f3c97c..b00d34b5ad10 100644
> >> --- a/include/linux/mmap_lock.h
> >> +++ b/include/linux/mmap_lock.h
> >> @@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
> >>         vma_refcount_put(vma);
> >>  }
> >>
> >> -/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
> >> -static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
> >> +/*
> >> + * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
> >> + * write lock is held.
> >> + *
> >> + * Returns true if write-locked, otherwise false.
> >> + *
> >> + * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.
>
> Could it also say to what it's updated to? Or is it too obvious?
>
> >
> > True, this does not result in a functional change because we do not
> > use mm_lock_seq if __is_vma_write_locked() succeeds. However this
> > seems to add additional gotcha that you need to remember. Any reason
> > why?
>
> Actually I wonder if it's really worth returning the mm_lock_seq and passing
> it to __vma_start_write(), which could just determine it on its own. It
> would simplify things.

That looks fine to me and indeed would simplify things... Yes, please!

>
> >> + */
> >> +static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
> >> +                                        unsigned int *mm_lock_seq)
> >>  {
> >> -       mmap_assert_write_locked(vma->vm_mm);
> >> +       struct mm_struct *mm = vma->vm_mm;
> >> +       const unsigned int seq = mm->mm_lock_seq.sequence;
> >> +
> >> +       mmap_assert_write_locked(mm);
> >>
> >>         /*
> >>          * current task is holding mmap_write_lock, both vma->vm_lock_seq and
> >>          * mm->mm_lock_seq can't be concurrently modified.
> >>          */
> >> -       *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
> >> -       return (vma->vm_lock_seq == *mm_lock_seq);
> >> +       if (vma->vm_lock_seq == seq)
> >> +               return true;
> >> +       *mm_lock_seq = seq;
> >> +       return false;
> >>  }
> >>
> >>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >> --
> >> 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-22 21:41   ` Suren Baghdasaryan
@ 2026-01-23 17:59     ` Lorenzo Stoakes
  2026-01-23 19:34       ` Suren Baghdasaryan
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 17:59 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 01:41:39PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > error (but only when waiting for readers in TASK_KILLABLE state), and
> > having the return value be stored in a stack variable called 'locked' is
> > further confusion.
> >
> > More generally, we are doing a lock of rather finnicky things during the
> > acquisition of a state in which readers are excluded and moving out of this
> > state, including tracking whether we are detached or not or whether an
> > error occurred.
> >
> > We are implementing logic in __vma_enter_exclusive_locked() that
> > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > which is very confusing from a control flow perspective.
> >
> > Introducing the shared helper object state helps us avoid this, as we can
> > now handle the 'an error arose but we're detached' condition correctly in
> > both callers - a warning if not detaching, and treating the situation as if
> > no error arose in the case of a VMA detaching.
> >
> > This also acts to help document what's going on and allows us to add some
> > more logical debug asserts.
> >
> > Also update vma_mark_detached() to add a guard clause for the likely
> > 'already detached' state (given we hold the mmap write lock), and add a
> > comment about ephemeral VMA read lock reference count increments to clarify
> > why we are entering/exiting an exclusive locked state here.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> >  1 file changed, 91 insertions(+), 53 deletions(-)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index f73221174a8b..75166a43ffa4 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > +struct vma_exclude_readers_state {
> > +       /* Input parameters. */
> > +       struct vm_area_struct *vma;
> > +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > +       bool detaching;
> > +
> Are these:
>             /* Output parameters. */
> ?

Yurp.

Oh you added the comment :) well clearly I should have, will do so.

> > +       bool detached;
> > +       bool exclusive; /* Are we exclusively locked? */
> > +};
> > +
> >  /*
> >   * Now that all readers have been evicted, mark the VMA as being out of the
> >   * 'exclude readers' state.
> >   *
> >   * Returns true if the VMA is now detached, otherwise false.
> >   */
> > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -       bool detached;
> > +       struct vm_area_struct *vma = ves->vma;
> > +
> > +       VM_WARN_ON_ONCE(ves->detached);
> > +       VM_WARN_ON_ONCE(!ves->exclusive);
> >
> > -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -                                        &vma->vm_refcnt);
> > +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +                                             &vma->vm_refcnt);
> >         __vma_lockdep_release_exclusive(vma);
> > -       return detached;
> > +}
> > +
> > +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> > +{
> > +       const unsigned int tgt = ves->detaching ? 0 : 1;
> > +
> > +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
> >  }
> >
> >  /*
> > @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> >   * Note that this function pairs with vma_refcount_put() which will wake up this
> >   * thread when it detects that the last reader has released its lock.
> >   *
> > - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > - * is permitted to kill it.
> > + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> > + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> > + * signal is permitted to kill it.
> >   *
> > - * The function will return 0 immediately if the VMA is detached, and 1 once the
> > - * VMA has evicted all readers, leaving the VMA exclusively locked.
> > + * The function sets the ves->locked parameter to true if an exclusive lock was
>
> s/ves->locked/ves->exclusive
>
> > + * acquired, or false if the VMA was detached or an error arose on wait.
> >   *
> > - * If the function returns 1, the caller is required to invoke
> > - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > + * If the function indicates an exclusive lock was acquired via ves->exclusive
> > + * (or equivalently, returning 0 with !ves->detached),
>
> I would remove the mention of that equivalence because with this
> change, return 0 simply indicates that the operation was successful
> and should not be used to infer any additional states. To get specific
> state the caller should use proper individual ves fields. Using return
> value for anything else defeats the whole purpose of this cleanup.

OK I'll remove the equivalency comment.

>
> > the caller is required to
> > + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> > + * required.
> >   *
> > - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > - * may also return -EINTR to indicate a fatal signal was received while waiting.
> > + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> > + * function may also return -EINTR to indicate a fatal signal was received while
> > + * waiting.
> >   */
> > -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > -               bool detaching, int state)
> > +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -       int err;
> > -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> > +       struct vm_area_struct *vma = ves->vma;
> > +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> > +       int err = 0;
> >
> >         mmap_assert_write_locked(vma->vm_mm);
> > -
> > -       /* Additional refcnt if the vma is attached. */
> > -       if (!detaching)
> > -               tgt_refcnt++;
> > +       VM_WARN_ON_ONCE(ves->detached);
> > +       VM_WARN_ON_ONCE(ves->exclusive);
>
> Aren't these output parameters? If so, why do we stipulate their
> initial values instead of setting them appropriately?

I guess it was just to ensure correctly set up but yes it's a bit weird, will
remove.

>
> >
> >         /*
> >          * If vma is detached then only vma_mark_attached() can raise the
> > @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> >          * See the comment describing the vm_area_struct->vm_refcnt field for
> >          * details of possible refcnt values.
> >          */
> > -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> > +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> > +               ves->detached = true;
> >                 return 0;
> > +       }
> >
> >         __vma_lockdep_acquire_exclusive(vma);
> >         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> > -                  state);
> > +                  ves->state);
> >         if (err) {
> > -               if (__vma_exit_exclusive_locked(vma)) {
> > -                       /*
> > -                        * The wait failed, but the last reader went away
> > -                        * as well. Tell the caller the VMA is detached.
> > -                        */
> > -                       WARN_ON_ONCE(!detaching);
> > -                       err = 0;
> > -               }
> > +               __vma_exit_exclusive_locked(ves);
> >                 return err;
>
> Nice! We preserve both error and detached state information.

:)

>
> >         }
> > -       __vma_lockdep_stat_mark_acquired(vma);
> >
> > -       return 1;
> > +       __vma_lockdep_stat_mark_acquired(vma);
> > +       ves->exclusive = true;
> > +       return 0;
> >  }
> >
> >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >                 int state)
> >  {
> > -       int locked;
> > +       int err;
> > +       struct vma_exclude_readers_state ves = {
> > +               .vma = vma,
> > +               .state = state,
> > +       };
> >
> > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > -       if (locked < 0)
> > -               return locked;
> > +       err = __vma_enter_exclusive_locked(&ves);
> > +       if (err) {
> > +               WARN_ON_ONCE(ves.detached);
>
> I believe the above WARN_ON_ONCE() should stay inside of
> __vma_enter_exclusive_locked(). Its correctness depends on the
> implementation details of __vma_enter_exclusive_locked(). More

Well this was kind of horrible in the original implementation, as you are
literally telling the function whether you are detaching or not, and only doing
this assert if you were not.

That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
smell, you should have X do the thing.

> specifically, it is only correct because
> __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> if there was a pending SIGKILL.

Well it's a documented aspect of the function that we return 0 immediately on
detached state so I'm not sure that is an implementation detail?

I significantly prefer having that here vs. 'if not detaching then assert if
detached' for people to scratch their heads over in the function.

I think this detail is incorrect anyway, because:

	if (err) {
		if (__vma_exit_exclusive_locked(vma)) {
			/*
			 * The wait failed, but the last reader went away
			 * as well. Tell the caller the VMA is detached.
			 */
			 WARN_ON_ONCE(!detaching);
			 err = 0;
		}
		...
	}

Implies - hey we're fine with err not being zero AND detaching right? In which
case reset the error?

Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
seen an error?

Or do we?

Either way it's something we handle differently based on _caller_. So it doesn't
belong in the function at all.

It's certainly logic that's highly confusing and needs to be handled
differently.

>
> > +               return err;
> > +       }
> >
> >         /*
> >          * We should use WRITE_ONCE() here because we can have concurrent reads
> > @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >          */
> >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> >
> > -       /* vma should remain attached. */
> > -       if (locked)
> > -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> > +       if (!ves.detached) {
>
> Strictly speaking the above check should be checking ves->exclusive
> instead of !ves.detached. What you have is technically correct but
> it's again related to that comment:
>
> "If the function indicates an exclusive lock was acquired via
> ves->exclusive (or equivalently, returning 0 with !ves->detached), the
> caller is required to invoke __vma_exit_exclusive_locked() once the
> exclusive state is no longer required."
>
> So, here you are using returning 0 with !ves->detached as an
> indication that the VMA was successfully locked. I think it's less
> confusing if we use the field dedicated for that purpose.

OK changed.

>
> > +               __vma_exit_exclusive_locked(&ves);
> > +               /* VMA should remain attached. */
> > +               WARN_ON_ONCE(ves.detached);
> > +       }
> >
> >         return 0;
> >  }
> > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> >
> >  void vma_mark_detached(struct vm_area_struct *vma)
> >  {
> > -       bool detached;
> > +       struct vma_exclude_readers_state ves = {
> > +               .vma = vma,
> > +               .state = TASK_UNINTERRUPTIBLE,
> > +               .detaching = true,
> > +       };
> > +       int err;
> >
> >         vma_assert_write_locked(vma);
> >         vma_assert_attached(vma);
> > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >          * See the comment describing the vm_area_struct->vm_refcnt field for
> >          * details of possible refcnt values.
> >          */
> > -       detached = __vma_refcount_put(vma, NULL);
> > -       if (unlikely(!detached)) {
> > -               /* Wait until vma is detached with no readers. */
> > -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > -                       /*
> > -                        * Once this is complete, no readers can increment the
> > -                        * reference count, and the VMA is marked detached.
> > -                        */
> > -                       detached = __vma_exit_exclusive_locked(vma);
> > -                       WARN_ON_ONCE(!detached);
> > -               }
> > +       if (likely(__vma_refcount_put(vma, NULL)))
> > +               return;
> > +
> > +       /*
> > +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> > +        * write lock, the only read locks that might be present are those from
> > +        * threads trying to acquire the read lock and incrementing the
> > +        * reference count before realising the write lock is held and
> > +        * decrementing it.
> > +        */
> > +       err = __vma_enter_exclusive_locked(&ves);
> > +       if (!err && !ves.detached) {
>
> Same here, we should be checking ves->exclusive to decide if
> __vma_exit_exclusive_locked() should be called or not.

Ack, changed.

>
> > +               /*
> > +                * Once this is complete, no readers can increment the
> > +                * reference count, and the VMA is marked detached.
> > +                */
> > +               __vma_exit_exclusive_locked(&ves);
> >         }
> > +       /* If an error arose but we were detached anyway, we don't care. */
> > +       WARN_ON_ONCE(!ves.detached);
> >  }
> >
> >  /*
> > --
> > 2.52.0

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-23 10:02   ` Vlastimil Babka
@ 2026-01-23 18:18     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:18 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 11:02:04AM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:01, Lorenzo Stoakes wrote:
> > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > error (but only when waiting for readers in TASK_KILLABLE state), and
> > having the return value be stored in a stack variable called 'locked' is
> > further confusion.
> >
> > More generally, we are doing a lock of rather finnicky things during the
> > acquisition of a state in which readers are excluded and moving out of this
> > state, including tracking whether we are detached or not or whether an
> > error occurred.
> >
> > We are implementing logic in __vma_enter_exclusive_locked() that
> > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > which is very confusing from a control flow perspective.
> >
> > Introducing the shared helper object state helps us avoid this, as we can
> > now handle the 'an error arose but we're detached' condition correctly in
> > both callers - a warning if not detaching, and treating the situation as if
> > no error arose in the case of a VMA detaching.
> >
> > This also acts to help document what's going on and allows us to add some
> > more logical debug asserts.
> >
> > Also update vma_mark_detached() to add a guard clause for the likely
> > 'already detached' state (given we hold the mmap write lock), and add a
> > comment about ephemeral VMA read lock reference count increments to clarify
> > why we are entering/exiting an exclusive locked state here.
> >
> > No functional change intended.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > ---
> >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> >  1 file changed, 91 insertions(+), 53 deletions(-)
> >
> > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > index f73221174a8b..75166a43ffa4 100644
> > --- a/mm/mmap_lock.c
> > +++ b/mm/mmap_lock.c
> > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> >  #ifdef CONFIG_MMU
> >  #ifdef CONFIG_PER_VMA_LOCK
> >
> > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > +struct vma_exclude_readers_state {
> > +	/* Input parameters. */
> > +	struct vm_area_struct *vma;
> > +	int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > +	bool detaching;
> > +
> > +	bool detached;
> > +	bool exclusive; /* Are we exclusively locked? */
> > +};
> > +
> >  /*
> >   * Now that all readers have been evicted, mark the VMA as being out of the
> >   * 'exclude readers' state.
> >   *
> >   * Returns true if the VMA is now detached, otherwise false.
> >   */
> > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> >  {
> > -	bool detached;
> > +	struct vm_area_struct *vma = ves->vma;
> > +
> > +	VM_WARN_ON_ONCE(ves->detached);
> > +	VM_WARN_ON_ONCE(!ves->exclusive);
>
> I think this will triger when called on wait failure from
> __vma_enter_exclusive_locked(). Given the other things Suren raised about
> the field, I wonder if it's worth keeping it?

He was suggesting I use ves->exclusive over ves->detached? I've now actioned
those changes so... yeh not dropping that.

But you're right that assert is wrong, removed.

>
> > -	detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > -					 &vma->vm_refcnt);
> > +	ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > +					      &vma->vm_refcnt);
> >  	__vma_lockdep_release_exclusive(vma);
> > -	return detached;
> > +}
> > +
>
> > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> >
> >  void vma_mark_detached(struct vm_area_struct *vma)
> >  {
> > -	bool detached;
> > +	struct vma_exclude_readers_state ves = {
> > +		.vma = vma,
> > +		.state = TASK_UNINTERRUPTIBLE,
> > +		.detaching = true,
> > +	};
> > +	int err;
> >
> >  	vma_assert_write_locked(vma);
> >  	vma_assert_attached(vma);
> > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> >  	 * See the comment describing the vm_area_struct->vm_refcnt field for
> >  	 * details of possible refcnt values.
> >  	 */
> > -	detached = __vma_refcount_put(vma, NULL);
> > -	if (unlikely(!detached)) {
> > -		/* Wait until vma is detached with no readers. */
> > -		if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > -			/*
> > -			 * Once this is complete, no readers can increment the
> > -			 * reference count, and the VMA is marked detached.
> > -			 */
> > -			detached = __vma_exit_exclusive_locked(vma);
> > -			WARN_ON_ONCE(!detached);
> > -		}
> > +	if (likely(__vma_refcount_put(vma, NULL)))
> > +		return;
>
> Seems to me it would be worthwhile splitting this function to an
> static-inline-in-header vma_mark_detached() that does only the asserts and
> __vma_refcount_put(), and keeping the function here as __vma_mark_detached()
> (or maybe differently named since the detaching kinda already happened with
> the refcount put... __vma_mark_detached_finish()?) handling the rare case
> __vma_refcount_put() returns false.

Yeah good idea, that saves us always having the ves state etc. too and separates
it out nicely.

Have called it __vma_exclude_readers_for_detach(), and made the change.

>
> > +
> > +	/*
> > +	 * Wait until the VMA is detached with no readers. Since we hold the VMA
> > +	 * write lock, the only read locks that might be present are those from
> > +	 * threads trying to acquire the read lock and incrementing the
> > +	 * reference count before realising the write lock is held and
> > +	 * decrementing it.
> > +	 */
> > +	err = __vma_enter_exclusive_locked(&ves);
> > +	if (!err && !ves.detached) {
> > +		/*
> > +		 * Once this is complete, no readers can increment the
> > +		 * reference count, and the VMA is marked detached.
> > +		 */
> > +		__vma_exit_exclusive_locked(&ves);
> >  	}
> > +	/* If an error arose but we were detached anyway, we don't care. */
> > +	WARN_ON_ONCE(!ves.detached);
> >  }
> >
> >  /*
> > --
> > 2.52.0
>

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked()
  2026-01-23 16:21     ` Vlastimil Babka
  2026-01-23 17:42       ` Suren Baghdasaryan
@ 2026-01-23 18:44       ` Lorenzo Stoakes
  1 sibling, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 05:21:26PM +0100, Vlastimil Babka wrote:
> On 1/22/26 22:55, Suren Baghdasaryan wrote:
> > On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> >>
> >> The function is a little confusing, clean it up a little then add a
> >> descriptive comment.
> >
> > I appreciate the descriptive comment but what exactly was confusing in
> > this function?
> >
> >>
> >> No functional change intended.
> >>
> >> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> >> ---
> >>  include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
> >>  1 file changed, 18 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> >> index 873bc5f3c97c..b00d34b5ad10 100644
> >> --- a/include/linux/mmap_lock.h
> >> +++ b/include/linux/mmap_lock.h
> >> @@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
> >>         vma_refcount_put(vma);
> >>  }
> >>
> >> -/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
> >> -static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
> >> +/*
> >> + * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
> >> + * write lock is held.
> >> + *
> >> + * Returns true if write-locked, otherwise false.
> >> + *
> >> + * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.
>
> Could it also say to what it's updated to? Or is it too obvious?
>
> >
> > True, this does not result in a functional change because we do not
> > use mm_lock_seq if __is_vma_write_locked() succeeds. However this
> > seems to add additional gotcha that you need to remember. Any reason
> > why?
>
> Actually I wonder if it's really worth returning the mm_lock_seq and passing
> it to __vma_start_write(), which could just determine it on its own. It
> would simplify things.

I mean don't we have to worry about racing vma_end_write_all()'s?

I suppose not as you have to have the mmap write lock here exclusively, and
we (lockdep) assert we own it, so we can probably safely assume the mm
value is OK.

It'd be good to drop this parameter.

I see Suren approves so have done so... :)

>
> >> + */
> >> +static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
> >> +                                        unsigned int *mm_lock_seq)
> >>  {
> >> -       mmap_assert_write_locked(vma->vm_mm);
> >> +       struct mm_struct *mm = vma->vm_mm;
> >> +       const unsigned int seq = mm->mm_lock_seq.sequence;
> >> +
> >> +       mmap_assert_write_locked(mm);
> >>
> >>         /*
> >>          * current task is holding mmap_write_lock, both vma->vm_lock_seq and
> >>          * mm->mm_lock_seq can't be concurrently modified.
> >>          */
> >> -       *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
> >> -       return (vma->vm_lock_seq == *mm_lock_seq);
> >> +       if (vma->vm_lock_seq == seq)
> >> +               return true;
> >> +       *mm_lock_seq = seq;
> >> +       return false;
> >>  }
> >>
> >>  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> >> --
> >> 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep
  2026-01-22 22:02   ` Suren Baghdasaryan
@ 2026-01-23 18:45     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:45 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 02:02:53PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > We can use lockdep to avoid unnecessary work here, otherwise update the
> > code to logically evaluate all pertinent cases and share code with
> > vma_assert_write_locked().
> >
> > Make it clear here that we treat the VMA being detached at this point as a
> > bug, this was only implicit before.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Looks correct.
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks!


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep
  2026-01-23 16:55   ` Vlastimil Babka
@ 2026-01-23 18:49     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:49 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 05:55:58PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:02, Lorenzo Stoakes wrote:
> > +/**
> > + * vma_assert_locked() - assert that @vma holds either a VMA read or a VMA write
> > + * lock and is not detached.
> > + * @vma: The VMA to assert.
> > + */
> >  static inline void vma_assert_locked(struct vm_area_struct *vma)
> >  {
> > -	unsigned int mm_lock_seq;
> > +	unsigned int refs;
> >
> >  	/*
> >  	 * See the comment describing the vm_area_struct->vm_refcnt field for
> >  	 * details of possible refcnt values.
> >  	 */
> > -	VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
> > -		      !__is_vma_write_locked(vma, &mm_lock_seq), vma);
> > +
> > +	/*
> > +	 * If read-locked or currently excluding readers, then the VMA is
> > +	 * locked.
> > +	 */
> > +#ifdef CONFIG_LOCKDEP
> > +	if (lock_is_held(&vma->vmlock_dep_map))
> > +		return;
> > +#endif
> > +
> > +	refs = refcount_read(&vma->vm_refcnt);
> > +
> > +	/*
> > +	 * In this case we're either read-locked, write-locked with temporary
> > +	 * readers, or in the midst of excluding readers, all of which means
> > +	 * we're locked.
> > +	 */
> > +	if (refs > 1)
> > +		return;
> > +
> > +	/* It is a bug for the VMA to be detached here. */
> > +	VM_BUG_ON_VMA(!refs, vma);
> > +
>
> Yeah previously this function was all VM_BUG_ON() but since that's now
> frowned upon, can we not do it anymore?
> Seem we do have VM_WARN_ON_ONCE_VMA().

Ack yeah will replace! Already replaced some in previous patch also :)

>
> > +	/*
> > +	 * OK, the VMA has a reference count of 1 which means it is either
> > +	 * unlocked and attached or write-locked, so assert that it is
> > +	 * write-locked.
> > +	 */
> > +	vma_assert_write_locked(vma);
> >  }
> >
> >  static inline bool vma_is_attached(struct vm_area_struct *vma)
> > --
> > 2.52.0
>


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-23 17:10   ` Vlastimil Babka
@ 2026-01-23 18:51     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 06:10:29PM +0100, Vlastimil Babka wrote:
> On 1/22/26 14:02, Lorenzo Stoakes wrote:
> > Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> > be changed underneath us. This will be the case if EITHER the VMA lock or
> > the mmap lock is held.
> >
> > In order to do so, we introduce a new assert vma_assert_stablised() - this
> > will make a lockdep assert if lockdep is enabled AND the VMA is
> > read-locked.
> >
> > Currently lockdep tracking for VMA write locks is not implemented, so it
> > suffices to check in this case that we have either an mmap read or write
> > semaphore held.
> >
> > Note that because the VMA lock uses the non-standard vmlock_dep_map naming
> > convention, we cannot use lockdep_assert_is_write_held() so have to open
> > code this ourselves via lockdep-asserting that
> > lock_is_held_type(&vma->vmlock_dep_map, 0).
> >
> > We have to be careful here - for instance when merging a VMA, we use the
> > mmap write lock to stabilise the examination of adjacent VMAs which might
> > be simultaneously VMA read-locked whilst being faulted in.
> >
> > If we were to assert VMA read lock using lockdep we would encounter an
> > incorrect lockdep assert.
> >
> > Also, we have to be careful about asserting mmap locks are held - if we try
> > to address the above issue by first checking whether mmap lock is held and
> > if so asserting it via lockdep, we may find that we were raced by another
> > thread acquiring an mmap read lock simultaneously that either we don't
> > own (and thus can be released any time - so we are not stable) or was
> > indeed released since we last checked.
> >
> > So to deal with these complexities we end up with either a precise (if
> > lockdep is enabled) or imprecise (if not) approach - in the first instance
> > we assert the lock is held using lockdep and thus whether we own it.
> >
> > If we do own it, then the check is complete, otherwise we must check for
> > the VMA read lock being held (VMA write lock implies mmap write lock so the
> > mmap lock suffices for this).
> >
> > If lockdep is not enabled we simply check if the mmap lock is held and risk
> > a false negative (i.e. not asserting when we should do).
> >
> > There are a couple places in the kernel where we already do this
> > stabliisation check - the anon_vma_name() helper in mm/madvise.c and
> > vma_flag_set_atomic() in include/linux/mm.h, which we update to use
> > vma_assert_stabilised().
> >
> > This change abstracts these into vma_assert_stabilised(), uses lockdep if
> > possible, and avoids a duplicate check of whether the mmap lock is held.
> >
> > This is also self-documenting and lays the foundations for further VMA
> > stability checks in the code.
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> LGTM, thanks!
>
> Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
>

Thanks!

And thanks for the review in general :)


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-22 22:12   ` Suren Baghdasaryan
@ 2026-01-23 18:54     ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 18:54 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Thu, Jan 22, 2026 at 02:12:25PM -0800, Suren Baghdasaryan wrote:
> On Thu, Jan 22, 2026 at 5:02 AM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> >
> > Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
> > be changed underneath us. This will be the case if EITHER the VMA lock or
> > the mmap lock is held.
> >
> > In order to do so, we introduce a new assert vma_assert_stablised() - this
>
> s/vma_assert_stablised/vma_assert_stabilised

Oops will fix.

>
> > will make a lockdep assert if lockdep is enabled AND the VMA is
> > read-locked.
> >
> > Currently lockdep tracking for VMA write locks is not implemented, so it
> > suffices to check in this case that we have either an mmap read or write
> > semaphore held.
> >
> > Note that because the VMA lock uses the non-standard vmlock_dep_map naming
> > convention, we cannot use lockdep_assert_is_write_held() so have to open
> > code this ourselves via lockdep-asserting that
> > lock_is_held_type(&vma->vmlock_dep_map, 0).
> >
> > We have to be careful here - for instance when merging a VMA, we use the
> > mmap write lock to stabilise the examination of adjacent VMAs which might
> > be simultaneously VMA read-locked whilst being faulted in.
> >
> > If we were to assert VMA read lock using lockdep we would encounter an
> > incorrect lockdep assert.
> >
> > Also, we have to be careful about asserting mmap locks are held - if we try
> > to address the above issue by first checking whether mmap lock is held and
> > if so asserting it via lockdep, we may find that we were raced by another
> > thread acquiring an mmap read lock simultaneously that either we don't
> > own (and thus can be released any time - so we are not stable) or was
> > indeed released since we last checked.
> >
> > So to deal with these complexities we end up with either a precise (if
> > lockdep is enabled) or imprecise (if not) approach - in the first instance
> > we assert the lock is held using lockdep and thus whether we own it.
> >
> > If we do own it, then the check is complete, otherwise we must check for
> > the VMA read lock being held (VMA write lock implies mmap write lock so the
> > mmap lock suffices for this).
> >
> > If lockdep is not enabled we simply check if the mmap lock is held and risk
> > a false negative (i.e. not asserting when we should do).
> >
> > There are a couple places in the kernel where we already do this
> > stabliisation check - the anon_vma_name() helper in mm/madvise.c and
> > vma_flag_set_atomic() in include/linux/mm.h, which we update to use
> > vma_assert_stabilised().
> >
> > This change abstracts these into vma_assert_stabilised(), uses lockdep if
> > possible, and avoids a duplicate check of whether the mmap lock is held.
> >
> > This is also self-documenting and lays the foundations for further VMA
> > stability checks in the code.
>
> So, is the lockdep addition the only functional change here?

Yurp. Wil add to commit msg.

>
> >
> > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

Thanks :)

And also thanks to you for the review in general! :)


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-23 17:59     ` Lorenzo Stoakes
@ 2026-01-23 19:34       ` Suren Baghdasaryan
  2026-01-23 20:04         ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-23 19:34 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 9:59 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Thu, Jan 22, 2026 at 01:41:39PM -0800, Suren Baghdasaryan wrote:
> > On Thu, Jan 22, 2026 at 5:03 AM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > >
> > > It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
> > > error (but only when waiting for readers in TASK_KILLABLE state), and
> > > having the return value be stored in a stack variable called 'locked' is
> > > further confusion.
> > >
> > > More generally, we are doing a lock of rather finnicky things during the
> > > acquisition of a state in which readers are excluded and moving out of this
> > > state, including tracking whether we are detached or not or whether an
> > > error occurred.
> > >
> > > We are implementing logic in __vma_enter_exclusive_locked() that
> > > effectively acts as if 'if one caller calls us do X, if another then do Y',
> > > which is very confusing from a control flow perspective.
> > >
> > > Introducing the shared helper object state helps us avoid this, as we can
> > > now handle the 'an error arose but we're detached' condition correctly in
> > > both callers - a warning if not detaching, and treating the situation as if
> > > no error arose in the case of a VMA detaching.
> > >
> > > This also acts to help document what's going on and allows us to add some
> > > more logical debug asserts.
> > >
> > > Also update vma_mark_detached() to add a guard clause for the likely
> > > 'already detached' state (given we hold the mmap write lock), and add a
> > > comment about ephemeral VMA read lock reference count increments to clarify
> > > why we are entering/exiting an exclusive locked state here.
> > >
> > > No functional change intended.
> > >
> > > Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> > > ---
> > >  mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
> > >  1 file changed, 91 insertions(+), 53 deletions(-)
> > >
> > > diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> > > index f73221174a8b..75166a43ffa4 100644
> > > --- a/mm/mmap_lock.c
> > > +++ b/mm/mmap_lock.c
> > > @@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
> > >  #ifdef CONFIG_MMU
> > >  #ifdef CONFIG_PER_VMA_LOCK
> > >
> > > +/* State shared across __vma_[enter, exit]_exclusive_locked(). */
> > > +struct vma_exclude_readers_state {
> > > +       /* Input parameters. */
> > > +       struct vm_area_struct *vma;
> > > +       int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
> > > +       bool detaching;
> > > +
> > Are these:
> >             /* Output parameters. */
> > ?
>
> Yurp.
>
> Oh you added the comment :) well clearly I should have, will do so.
>
> > > +       bool detached;
> > > +       bool exclusive; /* Are we exclusively locked? */
> > > +};
> > > +
> > >  /*
> > >   * Now that all readers have been evicted, mark the VMA as being out of the
> > >   * 'exclude readers' state.
> > >   *
> > >   * Returns true if the VMA is now detached, otherwise false.
> > >   */
> > > -static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > > +static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
> > >  {
> > > -       bool detached;
> > > +       struct vm_area_struct *vma = ves->vma;
> > > +
> > > +       VM_WARN_ON_ONCE(ves->detached);
> > > +       VM_WARN_ON_ONCE(!ves->exclusive);
> > >
> > > -       detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > > -                                        &vma->vm_refcnt);
> > > +       ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> > > +                                             &vma->vm_refcnt);
> > >         __vma_lockdep_release_exclusive(vma);
> > > -       return detached;
> > > +}
> > > +
> > > +static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
> > > +{
> > > +       const unsigned int tgt = ves->detaching ? 0 : 1;
> > > +
> > > +       return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
> > >  }
> > >
> > >  /*
> > > @@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
> > >   * Note that this function pairs with vma_refcount_put() which will wake up this
> > >   * thread when it detects that the last reader has released its lock.
> > >   *
> > > - * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
> > > - * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
> > > - * is permitted to kill it.
> > > + * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
> > > + * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
> > > + * signal is permitted to kill it.
> > >   *
> > > - * The function will return 0 immediately if the VMA is detached, and 1 once the
> > > - * VMA has evicted all readers, leaving the VMA exclusively locked.
> > > + * The function sets the ves->locked parameter to true if an exclusive lock was
> >
> > s/ves->locked/ves->exclusive
> >
> > > + * acquired, or false if the VMA was detached or an error arose on wait.
> > >   *
> > > - * If the function returns 1, the caller is required to invoke
> > > - * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
> > > + * If the function indicates an exclusive lock was acquired via ves->exclusive
> > > + * (or equivalently, returning 0 with !ves->detached),
> >
> > I would remove the mention of that equivalence because with this
> > change, return 0 simply indicates that the operation was successful
> > and should not be used to infer any additional states. To get specific
> > state the caller should use proper individual ves fields. Using return
> > value for anything else defeats the whole purpose of this cleanup.
>
> OK I'll remove the equivalency comment.
>
> >
> > > the caller is required to
> > > + * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
> > > + * required.
> > >   *
> > > - * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
> > > - * may also return -EINTR to indicate a fatal signal was received while waiting.
> > > + * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
> > > + * function may also return -EINTR to indicate a fatal signal was received while
> > > + * waiting.
> > >   */
> > > -static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > > -               bool detaching, int state)
> > > +static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
> > >  {
> > > -       int err;
> > > -       unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
> > > +       struct vm_area_struct *vma = ves->vma;
> > > +       unsigned int tgt_refcnt = get_target_refcnt(ves);
> > > +       int err = 0;
> > >
> > >         mmap_assert_write_locked(vma->vm_mm);
> > > -
> > > -       /* Additional refcnt if the vma is attached. */
> > > -       if (!detaching)
> > > -               tgt_refcnt++;
> > > +       VM_WARN_ON_ONCE(ves->detached);
> > > +       VM_WARN_ON_ONCE(ves->exclusive);
> >
> > Aren't these output parameters? If so, why do we stipulate their
> > initial values instead of setting them appropriately?
>
> I guess it was just to ensure correctly set up but yes it's a bit weird, will
> remove.
>
> >
> > >
> > >         /*
> > >          * If vma is detached then only vma_mark_attached() can raise the
> > > @@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
> > >          * See the comment describing the vm_area_struct->vm_refcnt field for
> > >          * details of possible refcnt values.
> > >          */
> > > -       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> > > +       if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
> > > +               ves->detached = true;
> > >                 return 0;
> > > +       }
> > >
> > >         __vma_lockdep_acquire_exclusive(vma);
> > >         err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> > >                    refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> > > -                  state);
> > > +                  ves->state);
> > >         if (err) {
> > > -               if (__vma_exit_exclusive_locked(vma)) {
> > > -                       /*
> > > -                        * The wait failed, but the last reader went away
> > > -                        * as well. Tell the caller the VMA is detached.
> > > -                        */
> > > -                       WARN_ON_ONCE(!detaching);
> > > -                       err = 0;
> > > -               }
> > > +               __vma_exit_exclusive_locked(ves);
> > >                 return err;
> >
> > Nice! We preserve both error and detached state information.
>
> :)
>
> >
> > >         }
> > > -       __vma_lockdep_stat_mark_acquired(vma);
> > >
> > > -       return 1;
> > > +       __vma_lockdep_stat_mark_acquired(vma);
> > > +       ves->exclusive = true;
> > > +       return 0;
> > >  }
> > >
> > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > >                 int state)
> > >  {
> > > -       int locked;
> > > +       int err;
> > > +       struct vma_exclude_readers_state ves = {
> > > +               .vma = vma,
> > > +               .state = state,
> > > +       };
> > >
> > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > -       if (locked < 0)
> > > -               return locked;
> > > +       err = __vma_enter_exclusive_locked(&ves);
> > > +       if (err) {
> > > +               WARN_ON_ONCE(ves.detached);
> >
> > I believe the above WARN_ON_ONCE() should stay inside of
> > __vma_enter_exclusive_locked(). Its correctness depends on the
> > implementation details of __vma_enter_exclusive_locked(). More
>
> Well this was kind of horrible in the original implementation, as you are
> literally telling the function whether you are detaching or not, and only doing
> this assert if you were not.
>
> That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> smell, you should have X do the thing.
>
> > specifically, it is only correct because
> > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > if there was a pending SIGKILL.
>
> Well it's a documented aspect of the function that we return 0 immediately on
> detached state so I'm not sure that is an implementation detail?
>
> I significantly prefer having that here vs. 'if not detaching then assert if
> detached' for people to scratch their heads over in the function.
>
> I think this detail is incorrect anyway, because:
>
>         if (err) {
>                 if (__vma_exit_exclusive_locked(vma)) {
>                         /*
>                          * The wait failed, but the last reader went away
>                          * as well. Tell the caller the VMA is detached.
>                          */
>                          WARN_ON_ONCE(!detaching);
>                          err = 0;
>                 }
>                 ...
>         }
>
> Implies - hey we're fine with err not being zero AND detaching right? In which
> case reset the error?
>
> Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> seen an error?
>
> Or do we?
>
> Either way it's something we handle differently based on _caller_. So it doesn't
> belong in the function at all.
>
> It's certainly logic that's highly confusing and needs to be handled
> differently.

Just to be clear, I'm not defending the way it is done before your
change, however the old check for "if not detaching then assert if
detached" makes more sense to me than "if
__vma_enter_exclusive_locked() failed assert that we VMA is still
attached". The latter one does not make logical sense to me. It's only
correct because of the implementation detail of
__vma_enter_exclusive_locked().

>
> >
> > > +               return err;
> > > +       }
> > >
> > >         /*
> > >          * We should use WRITE_ONCE() here because we can have concurrent reads
> > > @@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > >          */
> > >         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
> > >
> > > -       /* vma should remain attached. */
> > > -       if (locked)
> > > -               WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
> > > +       if (!ves.detached) {
> >
> > Strictly speaking the above check should be checking ves->exclusive
> > instead of !ves.detached. What you have is technically correct but
> > it's again related to that comment:
> >
> > "If the function indicates an exclusive lock was acquired via
> > ves->exclusive (or equivalently, returning 0 with !ves->detached), the
> > caller is required to invoke __vma_exit_exclusive_locked() once the
> > exclusive state is no longer required."
> >
> > So, here you are using returning 0 with !ves->detached as an
> > indication that the VMA was successfully locked. I think it's less
> > confusing if we use the field dedicated for that purpose.
>
> OK changed.
>
> >
> > > +               __vma_exit_exclusive_locked(&ves);
> > > +               /* VMA should remain attached. */
> > > +               WARN_ON_ONCE(ves.detached);
> > > +       }
> > >
> > >         return 0;
> > >  }
> > > @@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
> > >
> > >  void vma_mark_detached(struct vm_area_struct *vma)
> > >  {
> > > -       bool detached;
> > > +       struct vma_exclude_readers_state ves = {
> > > +               .vma = vma,
> > > +               .state = TASK_UNINTERRUPTIBLE,
> > > +               .detaching = true,
> > > +       };
> > > +       int err;
> > >
> > >         vma_assert_write_locked(vma);
> > >         vma_assert_attached(vma);
> > > @@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
> > >          * See the comment describing the vm_area_struct->vm_refcnt field for
> > >          * details of possible refcnt values.
> > >          */
> > > -       detached = __vma_refcount_put(vma, NULL);
> > > -       if (unlikely(!detached)) {
> > > -               /* Wait until vma is detached with no readers. */
> > > -               if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> > > -                       /*
> > > -                        * Once this is complete, no readers can increment the
> > > -                        * reference count, and the VMA is marked detached.
> > > -                        */
> > > -                       detached = __vma_exit_exclusive_locked(vma);
> > > -                       WARN_ON_ONCE(!detached);
> > > -               }
> > > +       if (likely(__vma_refcount_put(vma, NULL)))
> > > +               return;
> > > +
> > > +       /*
> > > +        * Wait until the VMA is detached with no readers. Since we hold the VMA
> > > +        * write lock, the only read locks that might be present are those from
> > > +        * threads trying to acquire the read lock and incrementing the
> > > +        * reference count before realising the write lock is held and
> > > +        * decrementing it.
> > > +        */
> > > +       err = __vma_enter_exclusive_locked(&ves);
> > > +       if (!err && !ves.detached) {
> >
> > Same here, we should be checking ves->exclusive to decide if
> > __vma_exit_exclusive_locked() should be called or not.
>
> Ack, changed.
>
> >
> > > +               /*
> > > +                * Once this is complete, no readers can increment the
> > > +                * reference count, and the VMA is marked detached.
> > > +                */
> > > +               __vma_exit_exclusive_locked(&ves);
> > >         }
> > > +       /* If an error arose but we were detached anyway, we don't care. */
> > > +       WARN_ON_ONCE(!ves.detached);
> > >  }
> > >
> > >  /*
> > > --
> > > 2.52.0
>
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-23 19:34       ` Suren Baghdasaryan
@ 2026-01-23 20:04         ` Lorenzo Stoakes
  2026-01-23 22:07           ` Suren Baghdasaryan
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-23 20:04 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 11:34:25AM -0800, Suren Baghdasaryan wrote:
> > > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > > >                 int state)
> > > >  {
> > > > -       int locked;
> > > > +       int err;
> > > > +       struct vma_exclude_readers_state ves = {
> > > > +               .vma = vma,
> > > > +               .state = state,
> > > > +       };
> > > >
> > > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > > -       if (locked < 0)
> > > > -               return locked;
> > > > +       err = __vma_enter_exclusive_locked(&ves);
> > > > +       if (err) {
> > > > +               WARN_ON_ONCE(ves.detached);
> > >
> > > I believe the above WARN_ON_ONCE() should stay inside of
> > > __vma_enter_exclusive_locked(). Its correctness depends on the
> > > implementation details of __vma_enter_exclusive_locked(). More
> >
> > Well this was kind of horrible in the original implementation, as you are
> > literally telling the function whether you are detaching or not, and only doing
> > this assert if you were not.
> >
> > That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> > smell, you should have X do the thing.
> >
> > > specifically, it is only correct because
> > > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > > if there was a pending SIGKILL.
> >
> > Well it's a documented aspect of the function that we return 0 immediately on
> > detached state so I'm not sure that is an implementation detail?
> >
> > I significantly prefer having that here vs. 'if not detaching then assert if
> > detached' for people to scratch their heads over in the function.
> >
> > I think this detail is incorrect anyway, because:
> >
> >         if (err) {
> >                 if (__vma_exit_exclusive_locked(vma)) {
> >                         /*
> >                          * The wait failed, but the last reader went away
> >                          * as well. Tell the caller the VMA is detached.
> >                          */
> >                          WARN_ON_ONCE(!detaching);
> >                          err = 0;
> >                 }
> >                 ...
> >         }
> >
> > Implies - hey we're fine with err not being zero AND detaching right? In which
> > case reset the error?
> >
> > Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> > seen an error?
> >
> > Or do we?
> >
> > Either way it's something we handle differently based on _caller_. So it doesn't
> > belong in the function at all.
> >
> > It's certainly logic that's highly confusing and needs to be handled
> > differently.
>
> Just to be clear, I'm not defending the way it is done before your
> change, however the old check for "if not detaching then assert if

I mean you basically are since here I am trying to change it and you're
telling me not to, so you are definitely defending this.

> detached" makes more sense to me than "if
> __vma_enter_exclusive_locked() failed assert that we VMA is still
> attached". The latter one does not make logical sense to me. It's only

I don't understand what you're quoting here?

> correct because of the implementation detail of
> __vma_enter_exclusive_locked().

Except that implementation detail no longer exists?


Before:

         if (err) {
                 if (__vma_exit_exclusive_locked(vma)) {
                         /*
                          * The wait failed, but the last reader went away
                          * as well. Tell the caller the VMA is detached.
                          */
                          WARN_ON_ONCE(!detaching);
                          err = 0;
                 }
                 ...
         }

After:

	if (err) {
		__vma_end_exclude_readers(ves);
		return err;
	}

So now each caller receives an error _and decides what to do with it_.

In __vma_exclude_readers_for_detach():


	err = __vma_start_exclude_readers(&ves);
	if (!err && ves.exclusive) {
		...
	}
	/* If an error arose but we were detached anyway, we don't care. */
	WARN_ON_ONCE(!ves.detached);

Right that's pretty clear? We expect to be detached no matter what, and the
comment points out that, yeah, err could result in detachment.

In the __vma_start_write() path:

	err = __vma_start_exclude_readers(&ves);
	if (err) {
		WARN_ON_ONCE(ves.detached);
		return err;
	}

I mean, yes we don't expect to be detached when we're acquiring a write.

Honestly I've spent the past 6 hours responding to review for a series I
really didn't want to write in the first place, updating and testing
etc. as I go, and I've essentially accepted every single point of feedback.

So I'm a little frustrated at getting stuck on this issue.

So I'm afraid I'm going to send the v4 out as-is and we can have a v5 (or
ideally, a fix-patch) if we have to, but you definitely need to be more
convincing about this.

I might just be wrong and missing the point out of tiredness but, at this
stage, I'm not going to hold up the respin over this.

Thanks, Lorenzo



^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-23 20:04         ` Lorenzo Stoakes
@ 2026-01-23 22:07           ` Suren Baghdasaryan
  2026-01-24  8:54             ` Lorenzo Stoakes
  0 siblings, 1 reply; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-23 22:07 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 12:04 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Jan 23, 2026 at 11:34:25AM -0800, Suren Baghdasaryan wrote:
> > > > >  int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> > > > >                 int state)
> > > > >  {
> > > > > -       int locked;
> > > > > +       int err;
> > > > > +       struct vma_exclude_readers_state ves = {
> > > > > +               .vma = vma,
> > > > > +               .state = state,
> > > > > +       };
> > > > >
> > > > > -       locked = __vma_enter_exclusive_locked(vma, false, state);
> > > > > -       if (locked < 0)
> > > > > -               return locked;
> > > > > +       err = __vma_enter_exclusive_locked(&ves);
> > > > > +       if (err) {
> > > > > +               WARN_ON_ONCE(ves.detached);
> > > >
> > > > I believe the above WARN_ON_ONCE() should stay inside of
> > > > __vma_enter_exclusive_locked(). Its correctness depends on the
> > > > implementation details of __vma_enter_exclusive_locked(). More
> > >
> > > Well this was kind of horrible in the original implementation, as you are
> > > literally telling the function whether you are detaching or not, and only doing
> > > this assert if you were not.
> > >
> > > That kind of 'if the caller is X do A, if the caller is Y do B' is really a code
> > > smell, you should have X do the thing.
> > >
> > > > specifically, it is only correct because
> > > > __vma_enter_exclusive_locked() returns 0 if the VMA is detached, even
> > > > if there was a pending SIGKILL.
> > >
> > > Well it's a documented aspect of the function that we return 0 immediately on
> > > detached state so I'm not sure that is an implementation detail?
> > >
> > > I significantly prefer having that here vs. 'if not detaching then assert if
> > > detached' for people to scratch their heads over in the function.
> > >
> > > I think this detail is incorrect anyway, because:
> > >
> > >         if (err) {
> > >                 if (__vma_exit_exclusive_locked(vma)) {
> > >                         /*
> > >                          * The wait failed, but the last reader went away
> > >                          * as well. Tell the caller the VMA is detached.
> > >                          */
> > >                          WARN_ON_ONCE(!detaching);
> > >                          err = 0;
> > >                 }
> > >                 ...
> > >         }
> > >
> > > Implies - hey we're fine with err not being zero AND detaching right? In which
> > > case reset the error?
> > >
> > > Except when detaching we set TASK_UNINTERRUPTIBLE? Which surely means we never
> > > seen an error?
> > >
> > > Or do we?
> > >
> > > Either way it's something we handle differently based on _caller_. So it doesn't
> > > belong in the function at all.
> > >
> > > It's certainly logic that's highly confusing and needs to be handled
> > > differently.
> >
> > Just to be clear, I'm not defending the way it is done before your
> > change, however the old check for "if not detaching then assert if
>
> I mean you basically are since here I am trying to change it and you're
> telling me not to, so you are definitely defending this.
>
> > detached" makes more sense to me than "if
> > __vma_enter_exclusive_locked() failed assert that we VMA is still
> > attached". The latter one does not make logical sense to me. It's only
>
> I don't understand what you're quoting here?
>
> > correct because of the implementation detail of
> > __vma_enter_exclusive_locked().
>
> Except that implementation detail no longer exists?
>
>
> Before:
>
>          if (err) {
>                  if (__vma_exit_exclusive_locked(vma)) {
>                          /*
>                           * The wait failed, but the last reader went away
>                           * as well. Tell the caller the VMA is detached.
>                           */
>                           WARN_ON_ONCE(!detaching);
>                           err = 0;
>                  }
>                  ...
>          }
>
> After:
>
>         if (err) {
>                 __vma_end_exclude_readers(ves);
>                 return err;
>         }
>
> So now each caller receives an error _and decides what to do with it_.
>
> In __vma_exclude_readers_for_detach():
>
>
>         err = __vma_start_exclude_readers(&ves);
>         if (!err && ves.exclusive) {
>                 ...
>         }
>         /* If an error arose but we were detached anyway, we don't care. */
>         WARN_ON_ONCE(!ves.detached);
>
> Right that's pretty clear? We expect to be detached no matter what, and the
> comment points out that, yeah, err could result in detachment.
>
> In the __vma_start_write() path:
>
>         err = __vma_start_exclude_readers(&ves);
>         if (err) {
>                 WARN_ON_ONCE(ves.detached);
>                 return err;
>         }
>
> I mean, yes we don't expect to be detached when we're acquiring a write.
>
> Honestly I've spent the past 6 hours responding to review for a series I
> really didn't want to write in the first place, updating and testing
> etc. as I go, and I've essentially accepted every single point of feedback.
>
> So I'm a little frustrated at getting stuck on this issue.
>
> So I'm afraid I'm going to send the v4 out as-is and we can have a v5 (or
> ideally, a fix-patch) if we have to, but you definitely need to be more
> convincing about this.
>
> I might just be wrong and missing the point out of tiredness but, at this
> stage, I'm not going to hold up the respin over this.

Sorry, I didn't realize I was causing that much trouble and I
understand your frustration.
From your reply, it sounds like you made enough changes to the patch
that my concern might already be obsolete. I'll review the new
submission on Sunday and will provide my feedback.
Thanks,
Suren.

>
> Thanks, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised()
  2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
  2026-01-22 22:12   ` Suren Baghdasaryan
  2026-01-23 17:10   ` Vlastimil Babka
@ 2026-01-23 23:35   ` Hillf Danton
  2 siblings, 0 replies; 73+ messages in thread
From: Hillf Danton @ 2026-01-23 23:35 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, Suren Baghdasaryan, linux-mm, linux-kernel,
	Boqun Feng, Waiman Long, Sebastian Andrzej Siewior

On Thu, 22 Jan 2026 13:02:02 +0000 Lorenzo Stoakes wrote:
> +/**
> + * vma_assert_stabilised() - assert that this VMA cannot be changed from
> + * underneath us either by having a VMA or mmap lock held.
> + * @vma: The VMA whose stability we wish to assess.
> + *
> + * If lockdep is enabled we can precisely ensure stability via either an mmap
> + * lock owned by us or a specific VMA lock.
> + *
> + * With lockdep disabled we may sometimes race with other threads acquiring the
> + * mmap read lock simultaneous with our VMA read lock.
> + */
> +static inline void vma_assert_stabilised(struct vm_area_struct *vma)
> +{
> +	/*
> +	 * If another thread owns an mmap lock, it may go away at any time, and
> +	 * thus is no guarantee of stability.
> +	 *
> +	 * If lockdep is enabled we can accurately determine if an mmap lock is
> +	 * held and owned by us. Otherwise we must approximate.
> +	 *
> +	 * It doesn't necessarily mean we are not stabilised however, as we may
> +	 * hold a VMA read lock (not a write lock as this would require an owned
> +	 * mmap lock).
> +	 *
> +	 * If (assuming lockdep is not enabled) we were to assert a VMA read
> +	 * lock first we may also run into issues, as other threads can hold VMA
> +	 * read locks simlutaneous to us.
> +	 *
> +	 * Therefore if lockdep is not enabled we risk a false negative (i.e. no
> +	 * assert fired). If accurate checking is required, enable lockdep.
> +	 */
> +	if (IS_ENABLED(CONFIG_LOCKDEP)) {
> +		if (lockdep_is_held(&vma->vm_mm->mmap_lock))
> +			return;
> +	} else {
> +		if (rwsem_is_locked(&vma->vm_mm->mmap_lock))
> +			return;
> +	}
In case of the mmap_lock, rwsem_is_locked has nothing to do with lockdep_is_held,
as the latter is noop without lockdep enabled. And the former fails to match the
us in "assert that this VMA cannot be changed from underneath us either by having
a VMA or mmap lock held".

That said, you are adding confusion.
> +
> +	/*
> +	 * We're not stabilised by the mmap lock, so assert that we're
> +	 * stabilised by a VMA lock.
> +	 */
> +	vma_assert_locked(vma);
> +}
> +


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-23 22:07           ` Suren Baghdasaryan
@ 2026-01-24  8:54             ` Lorenzo Stoakes
  2026-01-26  6:09               ` Suren Baghdasaryan
  0 siblings, 1 reply; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-24  8:54 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 02:07:43PM -0800, Suren Baghdasaryan wrote:
>
> Sorry, I didn't realize I was causing that much trouble and I
> understand your frustration.
> From your reply, it sounds like you made enough changes to the patch
> that my concern might already be obsolete. I'll review the new
> submission on Sunday and will provide my feedback.
> Thanks,
> Suren.

Apologies for being grumpy, long day :) to be clear I value your and
Vlastimil's feedback very much, and thanks to you both for having taken the
time to review the rework.

Hopefully that's reflected in just how much I've updated the series in
response to both your absolutely valid pointing out of mistakes as well as
suggestions for improvements, I think the series is way better with your
input! (As always with code review - it is just a net positive).

Please do review the new revision with scrutiny and comment on anything you
find that you feel I should update, including this issue, perhaps I simply
misunderstood you, but hopefully you can also see my point of view as to
why I felt it was useful to factor that out.

In general I'm hoping to move away from cleanups and towards meatier series
but as co-maintainer of the VMA locks I felt it really important to make
the VMA locks logic a lot clearer - to not just complain but do something
:)

In general the issue has been around abstraction at the 'intermediate'
level as Vlasta describes it, the public API is fine, so just rearranging
things such that developers coming to the code can build a good mental
model of what's going on.

So hopefully this series helps get us a least a decent way along that road!

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns
  2026-01-24  8:54             ` Lorenzo Stoakes
@ 2026-01-26  6:09               ` Suren Baghdasaryan
  0 siblings, 0 replies; 73+ messages in thread
From: Suren Baghdasaryan @ 2026-01-26  6:09 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
	Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Sat, Jan 24, 2026 at 12:54 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Jan 23, 2026 at 02:07:43PM -0800, Suren Baghdasaryan wrote:
> >
> > Sorry, I didn't realize I was causing that much trouble and I
> > understand your frustration.
> > From your reply, it sounds like you made enough changes to the patch
> > that my concern might already be obsolete. I'll review the new
> > submission on Sunday and will provide my feedback.
> > Thanks,
> > Suren.
>
> Apologies for being grumpy, long day :)

No worries. I get it.

> to be clear I value your and
> Vlastimil's feedback very much, and thanks to you both for having taken the
> time to review the rework.

The cleanup you did may not be the most exciting work you do but it
really aids understanding the code and its intent. Thanks for doing
that!

>
> Hopefully that's reflected in just how much I've updated the series in
> response to both your absolutely valid pointing out of mistakes as well as
> suggestions for improvements, I think the series is way better with your
> input! (As always with code review - it is just a net positive).

That's my goal.

>
> Please do review the new revision with scrutiny and comment on anything you
> find that you feel I should update, including this issue, perhaps I simply
> misunderstood you, but hopefully you can also see my point of view as to
> why I felt it was useful to factor that out.

I started reviewing new patches but need to check the important ones
with fresh eyes. Will finish them tomorrow morning.

>
> In general I'm hoping to move away from cleanups and towards meatier series
> but as co-maintainer of the VMA locks I felt it really important to make
> the VMA locks logic a lot clearer - to not just complain but do something
> :)
>
> In general the issue has been around abstraction at the 'intermediate'
> level as Vlasta describes it, the public API is fine, so just rearranging
> things such that developers coming to the code can build a good mental
> model of what's going on.
>
> So hopefully this series helps get us a least a decent way along that road!

It definitely does. Thanks again for doing this!
Cheers,
Suren.

>
> Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

* Re: [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
  2026-01-23 14:41       ` Lorenzo Stoakes
@ 2026-01-26 10:04         ` Lorenzo Stoakes
  0 siblings, 0 replies; 73+ messages in thread
From: Lorenzo Stoakes @ 2026-01-26 10:04 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, Andrew Morton, David Hildenbrand,
	Liam R . Howlett, Mike Rapoport, Michal Hocko, Shakeel Butt,
	Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
	Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
	Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
	Steven Rostedt

On Fri, Jan 23, 2026 at 02:41:42PM +0000, Lorenzo Stoakes wrote:
> > > > +{
> > > > +     int oldcnt;
> > > > +     bool detached;
> > > > +
> > > > +     detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> > > > +     if (refcnt)
> > > > +             *refcnt = oldcnt - 1;
> > > > +     return detached;
> >
> > IIUC there is always a connection between detached and *refcnt
> > resulting value. If detached==true then the resulting *refcnt has to
> > be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
> > new count:
> >
> > static inline int __vma_refcount_put(struct vm_area_struct *vma)
> > {
> >        int oldcnt;
> >
> >        __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
>
> You can't do this as it's __must_check... :)
>
> So have to replace with __refcount_dec(), which is a void function.
>
> >        return oldcnt - 1;

Actually this doesn't work as __refcount_dec() won't let you decrement to zero
and will flag a saturated error if you do.

In the end the code looks like this:

static inline __must_check unsigned int
__vma_refcount_put_return(struct vm_area_struct *vma)
{
	int oldcnt;

	if (__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt))
		return 0;

	return oldcnt - 1;
}

Which combines the __must_check, abstraction of oldcnt - 1 and xxx_return()
naming requested on review.

Cheers, Lorenzo


^ permalink raw reply	[flat|nested] 73+ messages in thread

end of thread, other threads:[~2026-01-26 10:04 UTC | newest]

Thread overview: 73+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-22 13:01 [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
2026-01-22 16:26   ` Vlastimil Babka
2026-01-22 16:29     ` Lorenzo Stoakes
2026-01-23 13:52       ` Lorenzo Stoakes
2026-01-22 16:37   ` Suren Baghdasaryan
2026-01-23 13:26     ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
2026-01-22 16:48   ` Vlastimil Babka
2026-01-22 17:28     ` Suren Baghdasaryan
2026-01-23 15:06       ` Lorenzo Stoakes
2026-01-23 13:45     ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
2026-01-22 17:36   ` Vlastimil Babka
2026-01-22 19:31     ` Suren Baghdasaryan
2026-01-23  8:24       ` Vlastimil Babka
2026-01-23 14:52         ` Lorenzo Stoakes
2026-01-23 15:05           ` Vlastimil Babka
2026-01-23 15:07             ` Lorenzo Stoakes
2026-01-23 14:41       ` Lorenzo Stoakes
2026-01-26 10:04         ` Lorenzo Stoakes
2026-01-23 14:02     ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
2026-01-22 19:32   ` Suren Baghdasaryan
2026-01-22 19:41     ` Suren Baghdasaryan
2026-01-23  8:41       ` Vlastimil Babka
2026-01-23 15:08         ` Lorenzo Stoakes
2026-01-23 15:00     ` Lorenzo Stoakes
2026-01-23  8:48   ` Vlastimil Babka
2026-01-23 15:10     ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
2026-01-22 19:39   ` Suren Baghdasaryan
2026-01-23 15:11     ` Lorenzo Stoakes
2026-01-23  8:54   ` Vlastimil Babka
2026-01-23 15:10     ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH v3 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
2026-01-22 13:08   ` Lorenzo Stoakes
2026-01-22 20:15   ` Suren Baghdasaryan
2026-01-22 20:55     ` Andrew Morton
2026-01-23 16:15       ` Lorenzo Stoakes
2026-01-23 16:33     ` Lorenzo Stoakes
2026-01-23  9:16   ` Vlastimil Babka
2026-01-23 16:17     ` Lorenzo Stoakes
2026-01-23 16:28       ` Lorenzo Stoakes
2026-01-22 13:01 ` [PATCH RESEND v3 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
2026-01-22 21:41   ` Suren Baghdasaryan
2026-01-23 17:59     ` Lorenzo Stoakes
2026-01-23 19:34       ` Suren Baghdasaryan
2026-01-23 20:04         ` Lorenzo Stoakes
2026-01-23 22:07           ` Suren Baghdasaryan
2026-01-24  8:54             ` Lorenzo Stoakes
2026-01-26  6:09               ` Suren Baghdasaryan
2026-01-23 10:02   ` Vlastimil Babka
2026-01-23 18:18     ` Lorenzo Stoakes
2026-01-22 13:02 ` [PATCH RESEND v3 08/10] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
2026-01-22 21:55   ` Suren Baghdasaryan
2026-01-23 16:21     ` Vlastimil Babka
2026-01-23 17:42       ` Suren Baghdasaryan
2026-01-23 18:44       ` Lorenzo Stoakes
2026-01-22 13:02 ` [PATCH RESEND v3 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
2026-01-22 22:02   ` Suren Baghdasaryan
2026-01-23 18:45     ` Lorenzo Stoakes
2026-01-23 16:55   ` Vlastimil Babka
2026-01-23 18:49     ` Lorenzo Stoakes
2026-01-22 13:02 ` [PATCH RESEND v3 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
2026-01-22 22:12   ` Suren Baghdasaryan
2026-01-23 18:54     ` Lorenzo Stoakes
2026-01-23 17:10   ` Vlastimil Babka
2026-01-23 18:51     ` Lorenzo Stoakes
2026-01-23 23:35   ` Hillf Danton
2026-01-22 15:48 ` [PATCH RESEND v3 00/10] mm: add and use vma_assert_stabilised() helper Andrew Morton
2026-01-22 15:57   ` Lorenzo Stoakes
2026-01-22 16:01     ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox