* [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper
@ 2026-01-22 12:50 Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 1/8] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
` (8 more replies)
0 siblings, 9 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
Sometimes we wish to assert that a VMA is stable, that is - the VMA cannot
be changed underneath us. This will be the case if EITHER the VMA lock or
the mmap lock is held.
We already open-code this in two places - anon_vma_name() in mm/madvise.c
and vma_flag_set_atomic() in include/linux/mm.h.
This series adds vma_assert_stablised() which abstract this can be used in
these callsites instead.
This implementation uses lockdep where possible - that is VMA read locks -
which correctly track read lock acquisition/release via:
vma_start_read() ->
rwsem_acquire_read()
vma_start_read_locked() ->
vma_start_read_locked_nested() ->
rwsem_acquire_read()
And:
vma_end_read() ->
vma_refcount_put() ->
rwsem_release()
We don't track the VMA locks using lockdep for VMA write locks, however
these are predicated upon mmap write locks whose lockdep state we do track,
and additionally vma_assert_stabillised() asserts this check if VMA read
lock is not held, so we get lockdep coverage in this case also.
We also add extensive comments to describe what we're doing.
There's some tricky stuff around mmap locking and stabilisation races that
we have to be careful of that I describe in the patch introducing
vma_assert_stabilised().
This change also lays the foundation for future series to add this assert
in further places where we wish to make it clear that we rely upon a
stabilised VMA.
The motivation for this change was precisely this.
Addiitonally, refactor the VMA locks logic to be clearer, less confusing,
self-documenting as far as possible and more easily extendable and
debuggable in future.
v3:
* Added 8 patches of refactoring the VMA lock implementation :)
* Dropped the vma_is_*locked() predicates as too difficult to get entirely
right.
* Updated vma_assert_locked() to assert what we sensibly can, use lockdep
if possible and invoke vma_assert_write_locked() to share code as before.
* Took into account extensive feedback received from Vlastimil (thanks! :)
v2:
* Added lockdep as much as possible to the mix as per Peter and Sebastian.
* Added comments to make clear what we're doing in each case.
* I realise I made a mistake in saying the previous duplicative VMA stable
asserts were wrong - vma_assert_locked() is not a no-op if
!CONFIG_PER_VMA_LOCK, instead it degrades to asserting that the mmap lock
is held, so this is correct, though means we'd have checked this twice,
only triggering an assert the second time.
* Accounted for is_vma_writer_only() case in vma_is_read_locked().
* Accounted for two hideous issues - we cannot check VMA lock first,
because we may be holding a VMA write lock and be raced by VMA readers of
_other_ VMA's. If we check the mmap lock first and assert, we may hold a
VMA read lock and race other threads which hodl the mmap read lock and
fail an assert. We resolve this by a precise mmap ownership check if
lockdep is used, and allowing the check to be approximate if no lockdep.
* Added more comments and updated commit logs.
* Dropped Suren's Suggested-by as significant changes in this set (this was for
the vma_is_read_locked() as a concept).
https://lore.kernel.org/all/cover.1768855783.git.lorenzo.stoakes@oracle.com/
v1:
https://lore.kernel.org/all/cover.1768569863.git.lorenzo.stoakes@oracle.com/
Lorenzo Stoakes (8):
mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
mm/vma: document possible vma->vm_refcnt values and reference comment
mm/vma: rename is_vma_write_only(), separate out shared refcount put
mm/vma: add+use vma lockdep acquire/release defines
mm/vma: de-duplicate __vma_enter_locked() error path
mm/vma: clean up __vma_enter/exit_locked()
mm/vma: introduce helper struct + thread through exclusive lock fns
mm/vma: improve and document __is_vma_write_locked()
include/linux/mm_types.h | 54 ++++++++++--
include/linux/mmap_lock.h | 129 ++++++++++++++++++++++-----
mm/mmap_lock.c | 180 ++++++++++++++++++++++++++------------
3 files changed, 280 insertions(+), 83 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/8] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 2/8] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
` (7 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
The VMA_LOCK_OFFSET value encodes a flag which vma->vm_refcnt is set to in
order to indicate that a VMA is in the process of having VMA read-locks
excluded in __vma_enter_locked() (that is, first checking if there are any
VMA read locks held, and if there are, waiting on them to be released).
This happens when a VMA write lock is being established, or a VMA is being
marked detached and discovers that the VMA reference count is elevated due
to read-locks temporarily elevating the reference count only to discover a
VMA write lock is in place.
The naming does not convey any of this, so rename VMA_LOCK_OFFSET to
VM_REFCNT_EXCLUDE_READERS_FLAG (with a sensible new prefix to differentiate
from the newly introduced VMA_*_BIT flags).
Also rename VMA_REF_LIMIT to VM_REFCNT_LIMIT to make this consistent also.
Update comments to reflect this.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mm_types.h | 17 +++++++++++++----
include/linux/mmap_lock.h | 14 ++++++++------
mm/mmap_lock.c | 17 ++++++++++-------
3 files changed, 31 insertions(+), 17 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 78950eb8926d..94de392ed3c5 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -752,8 +752,17 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
}
#endif
-#define VMA_LOCK_OFFSET 0x40000000
-#define VMA_REF_LIMIT (VMA_LOCK_OFFSET - 1)
+/*
+ * WHile __vma_enter_locked() is working to ensure are no read-locks held on a
+ * VMA (either while acquiring a VMA write lock or marking a VMA detached) we
+ * set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
+ * vma_start_read() that the reference count should be left alone.
+ *
+ * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
+ */
+#define VM_REFCNT_EXCLUDE_READERS_BIT (30)
+#define VM_REFCNT_EXCLUDE_READERS_FLAG (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
+#define VM_REFCNT_LIMIT (VM_REFCNT_EXCLUDE_READERS_FLAG - 1)
struct vma_numab_state {
/*
@@ -935,10 +944,10 @@ struct vm_area_struct {
/*
* Can only be written (using WRITE_ONCE()) while holding both:
* - mmap_lock (in write mode)
- * - vm_refcnt bit at VMA_LOCK_OFFSET is set
+ * - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_FLAG is set
* Can be read reliably while holding one of:
* - mmap_lock (in read or write mode)
- * - vm_refcnt bit at VMA_LOCK_OFFSET is set or vm_refcnt > 1
+ * - vm_refcnt bit at VM_REFCNT_EXCLUDE_READERS_BIT is set or vm_refcnt > 1
* Can be read unreliably (using READ_ONCE()) for pessimistic bailout
* while holding nothing (except RCU to keep the VMA struct allocated).
*
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index b50416fbba20..5acbd4ba1b52 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -125,12 +125,14 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
static inline bool is_vma_writer_only(int refcnt)
{
/*
- * With a writer and no readers, refcnt is VMA_LOCK_OFFSET if the vma
- * is detached and (VMA_LOCK_OFFSET + 1) if it is attached. Waiting on
- * a detached vma happens only in vma_mark_detached() and is a rare
- * case, therefore most of the time there will be no unnecessary wakeup.
+ * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
+ * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
+ * attached. Waiting on a detached vma happens only in
+ * vma_mark_detached() and is a rare case, therefore most of the time
+ * there will be no unnecessary wakeup.
*/
- return (refcnt & VMA_LOCK_OFFSET) && refcnt <= VMA_LOCK_OFFSET + 1;
+ return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
+ refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
}
static inline void vma_refcount_put(struct vm_area_struct *vma)
@@ -159,7 +161,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
mmap_assert_locked(vma->vm_mm);
if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
- VMA_REF_LIMIT)))
+ VM_REFCNT_LIMIT)))
return false;
rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 7421b7ea8001..1d23b48552e9 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -54,7 +54,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
bool detaching, int state)
{
int err;
- unsigned int tgt_refcnt = VMA_LOCK_OFFSET;
+ unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
mmap_assert_write_locked(vma->vm_mm);
@@ -66,7 +66,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
* If vma is detached then only vma_mark_attached() can raise the
* vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
*/
- if (!refcount_add_not_zero(VMA_LOCK_OFFSET, &vma->vm_refcnt))
+ if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
return 0;
rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
@@ -74,7 +74,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
if (err) {
- if (refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt)) {
+ if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
/*
* The wait failed, but the last reader went away
* as well. Tell the caller the VMA is detached.
@@ -92,7 +92,8 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
{
- *detached = refcount_sub_and_test(VMA_LOCK_OFFSET, &vma->vm_refcnt);
+ *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+ &vma->vm_refcnt);
rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
}
@@ -180,13 +181,15 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
}
/*
- * If VMA_LOCK_OFFSET is set, __refcount_inc_not_zero_limited_acquire()
- * will fail because VMA_REF_LIMIT is less than VMA_LOCK_OFFSET.
+ * If VM_REFCNT_EXCLUDE_READERS_FLAG is set,
+ * __refcount_inc_not_zero_limited_acquire() will fail because
+ * VM_REFCNT_LIMIT is less than VM_REFCNT_EXCLUDE_READERS_FLAG.
+ *
* Acquire fence is required here to avoid reordering against later
* vm_lock_seq check and checks inside lock_vma_under_rcu().
*/
if (unlikely(!__refcount_inc_not_zero_limited_acquire(&vma->vm_refcnt, &oldcnt,
- VMA_REF_LIMIT))) {
+ VM_REFCNT_LIMIT))) {
/* return EAGAIN if vma got detached from under us */
vma = oldcnt ? NULL : ERR_PTR(-EAGAIN);
goto err;
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/8] mm/vma: document possible vma->vm_refcnt values and reference comment
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 1/8] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
` (6 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
The possible vma->vm_refcnt values are confusing and vague, explain in
detail what these can be in a comment describing the vma->vm_refcnt field
and reference this comment in various places that read/write this field.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mm_types.h | 39 +++++++++++++++++++++++++++++++++++++--
include/linux/mmap_lock.h | 7 +++++++
mm/mmap_lock.c | 6 ++++++
3 files changed, 50 insertions(+), 2 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 94de392ed3c5..e5ee66f84d9a 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -758,7 +758,8 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name)
* set the VM_REFCNT_EXCLUDE_READERS_FLAG in vma->vm_refcnt to indiciate to
* vma_start_read() that the reference count should be left alone.
*
- * Once the operation is complete, this value is subtracted from vma->vm_refcnt.
+ * See the comment describing vm_refcnt in vm_area_struct for details as to
+ * which values the VMA reference count can be.
*/
#define VM_REFCNT_EXCLUDE_READERS_BIT (30)
#define VM_REFCNT_EXCLUDE_READERS_FLAG (1U << VM_REFCNT_EXCLUDE_READERS_BIT)
@@ -989,7 +990,41 @@ struct vm_area_struct {
struct vma_numab_state *numab_state; /* NUMA Balancing state */
#endif
#ifdef CONFIG_PER_VMA_LOCK
- /* Unstable RCU readers are allowed to read this. */
+ /*
+ * Used to keep track of the number of references taken by VMA read or
+ * write locks. May have the VM_REFCNT_EXCLUDE_READERS_FLAG set
+ * indicating that a thread has entered __vma_enter_locked() and is
+ * waiting on any outstanding read locks to exit.
+ *
+ * This value can be equal to:
+ *
+ * 0 - Detached.
+ *
+ * 1 - Unlocked or write-locked.
+ *
+ * >1, < VM_REFCNT_EXCLUDE_READERS_FLAG - Read-locked or (unlikely)
+ * write-locked with other threads having temporarily incremented the
+ * reference count prior to determining it is write-locked and
+ * decrementing it again.
+ *
+ * VM_REFCNT_EXCLUDE_READERS_FLAG - Detached, pending
+ * __vma_exit_locked() completion which will decrement the reference
+ * count to zero. IMPORTANT - at this stage no further readers can
+ * increment the reference count. It can only be reduced.
+ *
+ * VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - Either an attached VMA pending
+ * __vma_exit_locked() completion which will decrement the reference
+ * count to one, OR a detached VMA waiting on a single spurious reader
+ * to decrement reference count. IMPORTANT - as above, no further
+ * readers can increment the reference count.
+ *
+ * > VM_REFCNT_EXCLUDE_READERS_FLAG + 1 - VMA is waiting on readers,
+ * whether it is attempting to acquire a write lock or attempting to
+ * detach. IMPORTANT - as above, no ruther readers can increment the
+ * reference count.
+ *
+ * NOTE: Unstable RCU readers are allowed to read this.
+ */
refcount_t vm_refcnt ____cacheline_aligned_in_smp;
#ifdef CONFIG_DEBUG_LOCK_ALLOC
struct lockdep_map vmlock_dep_map;
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 5acbd4ba1b52..a764439d0276 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -130,6 +130,9 @@ static inline bool is_vma_writer_only(int refcnt)
* attached. Waiting on a detached vma happens only in
* vma_mark_detached() and is a rare case, therefore most of the time
* there will be no unnecessary wakeup.
+ *
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
*/
return (refcnt & VM_REFCNT_EXCLUDE_READERS_FLAG) &&
refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
@@ -249,6 +252,10 @@ static inline void vma_assert_locked(struct vm_area_struct *vma)
{
unsigned int mm_lock_seq;
+ /*
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
+ */
VM_BUG_ON_VMA(refcount_read(&vma->vm_refcnt) <= 1 &&
!__is_vma_write_locked(vma, &mm_lock_seq), vma);
}
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 1d23b48552e9..75dc098aea14 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -65,6 +65,9 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
/*
* If vma is detached then only vma_mark_attached() can raise the
* vm_refcnt. mmap_write_lock prevents racing with vma_mark_attached().
+ *
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
*/
if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
return 0;
@@ -137,6 +140,9 @@ void vma_mark_detached(struct vm_area_struct *vma)
* before they check vm_lock_seq, realize the vma is locked and drop
* back the vm_refcnt. That is a narrow window for observing a raised
* vm_refcnt.
+ *
+ * See the comment describing the vm_area_struct->vm_refcnt field for
+ * details of possible refcnt values.
*/
if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
/* Wait until vma is detached with no readers. */
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 1/8] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 2/8] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 18:07 ` Suren Baghdasaryan
2026-01-22 12:50 ` [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
` (5 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
The is_vma_writer_only() function is misnamed - this isn't determining if
there is only a write lock, as it checks for the presence of the
VM_REFCNT_EXCLUDE_READERS_FLAG.
Really, it is checking to see whether readers are excluded, with a
possibility of a false positive in the case of a detachment (there we
expect the vma->vm_refcnt to eventually be set to
VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
Rename the function accordingly.
Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
in vma_refcount_put(), using the old value to determine what the reference
count ought to be after the operation is complete (ignoring racing
reference count adjustments).
Wrap this into a __vma_refcount_put() function, which we can then utilise
in vma_mark_detached() and thus keep the refcount primitive usage
abstracted.
Also adjust comments, removing duplicative comments covered elsewhere and
adding more to aid understanding.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
mm/mmap_lock.c | 18 +++++-------
2 files changed, 57 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index a764439d0276..0b3614aadbb4 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
vma->vm_lock_seq = UINT_MAX;
}
-static inline bool is_vma_writer_only(int refcnt)
+/**
+ * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
+ * excluded all VMA read locks.
+ * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
+ *
+ * We may be raced by other readers temporarily incrementing the reference
+ * count, though the race window is very small, this might cause spurious
+ * wakeups.
+ *
+ * In the case of a detached VMA, we may incorrectly indicate that readers are
+ * excluded when one remains, because in that scenario we target a refcount of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
+ *
+ * However, the race window for that is very small so it is unlikely.
+ *
+ * Returns: true if readers are excluded, false otherwise.
+ */
+static inline bool are_readers_excluded(int refcnt)
{
/*
- * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
- * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
- * attached. Waiting on a detached vma happens only in
- * vma_mark_detached() and is a rare case, therefore most of the time
- * there will be no unnecessary wakeup.
- *
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
@@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
}
+static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
+{
+ int oldcnt;
+ bool detached;
+
+ detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
+ if (refcnt)
+ *refcnt = oldcnt - 1;
+ return detached;
+}
+
+/**
+ * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
+ * read-lock being dropped.
+ * @vma: The VMA whose reference count we wish to decrement.
+ *
+ * If we were the last reader, wake up threads waiting to obtain an exclusive
+ * lock.
+ */
static inline void vma_refcount_put(struct vm_area_struct *vma)
{
- /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
struct mm_struct *mm = vma->vm_mm;
- int oldcnt;
+ int refcnt;
+ bool detached;
rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
- if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
- if (is_vma_writer_only(oldcnt - 1))
- rcuwait_wake_up(&mm->vma_writer_wait);
- }
+ detached = __vma_refcount_put(vma, &refcnt);
+ /*
+ * __vma_enter_locked() may be sleeping waiting for readers to drop
+ * their reference count, so wake it up if we were the last reader
+ * blocking it from being acquired.
+ */
+ if (!detached && are_readers_excluded(refcnt))
+ rcuwait_wake_up(&mm->vma_writer_wait);
}
/*
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 75dc098aea14..ebacb57e5f16 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
void vma_mark_detached(struct vm_area_struct *vma)
{
+ bool detached;
+
vma_assert_write_locked(vma);
vma_assert_attached(vma);
/*
- * We are the only writer, so no need to use vma_refcount_put().
- * The condition below is unlikely because the vma has been already
- * write-locked and readers can increment vm_refcnt only temporarily
- * before they check vm_lock_seq, realize the vma is locked and drop
- * back the vm_refcnt. That is a narrow window for observing a raised
- * vm_refcnt.
- *
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
- if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
+ detached = __vma_refcount_put(vma, NULL);
+ if (unlikely(!detached)) {
/* Wait until vma is detached with no readers. */
if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
- bool detached;
-
+ /*
+ * Once this is complete, no readers can increment the
+ * reference count, and the VMA is marked detached.
+ */
__vma_exit_locked(vma, &detached);
WARN_ON_ONCE(!detached);
}
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (2 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 19:25 ` Suren Baghdasaryan
2026-01-22 12:50 ` [PATCH v3 5/8] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
` (4 subsequent siblings)
8 siblings, 1 reply; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
The code is littered with inscrutable and duplicative lockdep incantations,
replace these with defines which explain what is going on and add
commentary to explain what we're doing.
If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
remains meaningful.
These are self-documenting and aid readability of the code.
Additionally, instead of using the confusing rwsem_*() form for something
that is emphatically not an rwsem, we instead explicitly use
lock_[acquired, release]_shared/exclusive() lockdep invocations since we
are doing something rather custom here and these make more sense to use.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
mm/mmap_lock.c | 10 +++++-----
2 files changed, 37 insertions(+), 8 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 0b3614aadbb4..da63b1be6ec0 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
#ifdef CONFIG_PER_VMA_LOCK
+/*
+ * VMA locks do not behave like most ordinary locks found in the kernel, so we
+ * cannot quite have full lockdep tracking in the way we would ideally prefer.
+ *
+ * Read locks act as shared locks which exclude an exclusive lock being
+ * taken. We therefore mark these accordingly on read lock acquire/release.
+ *
+ * Write locks are acquired exclusively per-VMA, but released in a shared
+ * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
+ * that write lock is de-acquired.
+ *
+ * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
+ * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
+ *
+ * We do, however, want to indicate that during either acquisition of a VMA
+ * write lock or detachment of a VMA that we require the lock held be exclusive,
+ * so we utilise lockdep to do so.
+ */
+#define __vma_lockdep_acquire_read(vma) \
+ lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
+#define __vma_lockdep_release_read(vma) \
+ lock_release(&vma->vmlock_dep_map, _RET_IP_)
+#define __vma_lockdep_acquire_exclusive(vma) \
+ lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
+#define __vma_lockdep_release_exclusive(vma) \
+ lock_release(&vma->vmlock_dep_map, _RET_IP_)
+/* Only meaningful if CONFIG_LOCK_STAT is defined. */
+#define __vma_lockdep_stat_mark_acquired(vma) \
+ lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
+
static inline void mm_lock_seqcount_init(struct mm_struct *mm)
{
seqcount_init(&mm->mm_lock_seq);
@@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
int refcnt;
bool detached;
- rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
-
+ __vma_lockdep_release_read(vma);
detached = __vma_refcount_put(vma, &refcnt);
/*
* __vma_enter_locked() may be sleeping waiting for readers to drop
@@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
VM_REFCNT_LIMIT)))
return false;
- rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+ __vma_lockdep_acquire_read(vma);
return true;
}
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index ebacb57e5f16..9563bfb051f4 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
return 0;
- rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
+ __vma_lockdep_acquire_exclusive(vma);
err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
@@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
WARN_ON_ONCE(!detaching);
err = 0;
}
- rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+ __vma_lockdep_release_exclusive(vma);
return err;
}
- lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
+ __vma_lockdep_stat_mark_acquired(vma);
return 1;
}
@@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
{
*detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
&vma->vm_refcnt);
- rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
+ __vma_lockdep_release_exclusive(vma);
}
int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
@@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
goto err;
}
- rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
+ __vma_lockdep_acquire_read(vma);
if (unlikely(vma->vm_mm != mm))
goto err_unstable;
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 5/8] mm/vma: de-duplicate __vma_enter_locked() error path
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (3 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 6/8] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
` (3 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
We're doing precisely the same thing that __vma_exit_locked() does, so
de-duplicate this code and keep the refcount primitive in one place.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap_lock.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 9563bfb051f4..7a0361cff6db 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -45,6 +45,14 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
#ifdef CONFIG_MMU
#ifdef CONFIG_PER_VMA_LOCK
+
+static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
+{
+ *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+ &vma->vm_refcnt);
+ __vma_lockdep_release_exclusive(vma);
+}
+
/*
* __vma_enter_locked() returns 0 immediately if the vma is not
* attached, otherwise it waits for any current readers to finish and
@@ -77,7 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
if (err) {
- if (refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
+ bool detached;
+
+ __vma_exit_locked(vma, &detached);
+ if (detached) {
/*
* The wait failed, but the last reader went away
* as well. Tell the caller the VMA is detached.
@@ -85,7 +96,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
WARN_ON_ONCE(!detaching);
err = 0;
}
- __vma_lockdep_release_exclusive(vma);
return err;
}
__vma_lockdep_stat_mark_acquired(vma);
@@ -93,13 +103,6 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
return 1;
}
-static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
-{
- *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
- &vma->vm_refcnt);
- __vma_lockdep_release_exclusive(vma);
-}
-
int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
int state)
{
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 6/8] mm/vma: clean up __vma_enter/exit_locked()
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (4 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 5/8] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 7/8] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
` (2 subsequent siblings)
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
These functions are very confusing indeed. 'Entering' a lock could be
interpreted as acquiring it, but this is not what these functions are
interacting with.
Equally they don't indicate at all what kind of lock we are 'entering' or
'exiting'. Finally they are misleading as we invoke these functions when we
already hold a write lock to detach a VMA.
These functions are explicitly simply 'entering' and 'exiting' a state in
which we hold the EXCLUSIVE lock in order that we can either mark the VMA
as being write-locked, or mark the VMA detached.
Rename the functions accordingly, and also update
__vma_exit_exclusive_locked() to return detached state with a __must_check
directive, as it is simply clumsy to pass an output pointer here to
detached state and inconsistent vs. __vma_enter_exclusive_locked().
Finally, remove the unnecessary 'inline' directives.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mmap_lock.h | 4 +--
mm/mmap_lock.c | 60 +++++++++++++++++++++++++--------------
2 files changed, 41 insertions(+), 23 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index da63b1be6ec0..873bc5f3c97c 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -209,8 +209,8 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
__vma_lockdep_release_read(vma);
detached = __vma_refcount_put(vma, &refcnt);
/*
- * __vma_enter_locked() may be sleeping waiting for readers to drop
- * their reference count, so wake it up if we were the last reader
+ * __vma_enter_exclusive_locked() may be sleeping waiting for readers to
+ * drop their reference count, so wake it up if we were the last reader
* blocking it from being acquired.
*/
if (!detached && are_readers_excluded(refcnt))
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 7a0361cff6db..f73221174a8b 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,19 +46,43 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
#ifdef CONFIG_MMU
#ifdef CONFIG_PER_VMA_LOCK
-static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
+/*
+ * Now that all readers have been evicted, mark the VMA as being out of the
+ * 'exclude readers' state.
+ *
+ * Returns true if the VMA is now detached, otherwise false.
+ */
+static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
{
- *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
- &vma->vm_refcnt);
+ bool detached;
+
+ detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+ &vma->vm_refcnt);
__vma_lockdep_release_exclusive(vma);
+ return detached;
}
/*
- * __vma_enter_locked() returns 0 immediately if the vma is not
- * attached, otherwise it waits for any current readers to finish and
- * returns 1. Returns -EINTR if a signal is received while waiting.
+ * Mark the VMA as being in a state of excluding readers, check to see if any
+ * VMA read locks are indeed held, and if so wait for them to be released.
+ *
+ * Note that this function pairs with vma_refcount_put() which will wake up this
+ * thread when it detects that the last reader has released its lock.
+ *
+ * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
+ * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
+ * is permitted to kill it.
+ *
+ * The function will return 0 immediately if the VMA is detached, and 1 once the
+ * VMA has evicted all readers, leaving the VMA exclusively locked.
+ *
+ * If the function returns 1, the caller is required to invoke
+ * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ *
+ * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
+ * may also return -EINTR to indicate a fatal signal was received while waiting.
*/
-static inline int __vma_enter_locked(struct vm_area_struct *vma,
+static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
bool detaching, int state)
{
int err;
@@ -85,13 +109,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
state);
if (err) {
- bool detached;
-
- __vma_exit_locked(vma, &detached);
- if (detached) {
+ if (__vma_exit_exclusive_locked(vma)) {
/*
* The wait failed, but the last reader went away
- * as well. Tell the caller the VMA is detached.
+ * as well. Tell the caller the VMA is detached.
*/
WARN_ON_ONCE(!detaching);
err = 0;
@@ -108,7 +129,7 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
{
int locked;
- locked = __vma_enter_locked(vma, false, state);
+ locked = __vma_enter_exclusive_locked(vma, false, state);
if (locked < 0)
return locked;
@@ -120,12 +141,9 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
*/
WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
- if (locked) {
- bool detached;
-
- __vma_exit_locked(vma, &detached);
- WARN_ON_ONCE(detached); /* vma should remain attached */
- }
+ /* vma should remain attached. */
+ if (locked)
+ WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
return 0;
}
@@ -145,12 +163,12 @@ void vma_mark_detached(struct vm_area_struct *vma)
detached = __vma_refcount_put(vma, NULL);
if (unlikely(!detached)) {
/* Wait until vma is detached with no readers. */
- if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
+ if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
/*
* Once this is complete, no readers can increment the
* reference count, and the VMA is marked detached.
*/
- __vma_exit_locked(vma, &detached);
+ detached = __vma_exit_exclusive_locked(vma);
WARN_ON_ONCE(!detached);
}
}
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 7/8] mm/vma: introduce helper struct + thread through exclusive lock fns
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (5 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 6/8] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 8/8] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
2026-01-22 12:55 ` [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
It is confusing to have __vma_enter_exclusive_locked() return 0, 1 or an
error (but only when waiting for readers in TASK_KILLABLE state), and
having the return value be stored in a stack variable called 'locked' is
further confusion.
More generally, we are doing a lock of rather finnicky things during the
acquisition of a state in which readers are excluded and moving out of this
state, including tracking whether we are detached or not or whether an
error occurred.
We are implementing logic in __vma_enter_exclusive_locked() that
effectively acts as if 'if one caller calls us do X, if another then do Y',
which is very confusing from a control flow perspective.
Introducing the shared helper object state helps us avoid this, as we can
now handle the 'an error arose but we're detached' condition correctly in
both callers - a warning if not detaching, and treating the situation as if
no error arose in the case of a VMA detaching.
This also acts to help document what's going on and allows us to add some
more logical debug asserts.
Also update vma_mark_detached() to add a guard clause for the likely
'already detached' state (given we hold the mmap write lock), and add a
comment about ephemeral VMA read lock reference count increments to clarify
why we are entering/exiting an exclusive locked state here.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
mm/mmap_lock.c | 144 +++++++++++++++++++++++++++++++------------------
1 file changed, 91 insertions(+), 53 deletions(-)
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index f73221174a8b..75166a43ffa4 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -46,20 +46,40 @@ EXPORT_SYMBOL(__mmap_lock_do_trace_released);
#ifdef CONFIG_MMU
#ifdef CONFIG_PER_VMA_LOCK
+/* State shared across __vma_[enter, exit]_exclusive_locked(). */
+struct vma_exclude_readers_state {
+ /* Input parameters. */
+ struct vm_area_struct *vma;
+ int state; /* TASK_KILLABLE or TASK_UNINTERRUPTIBLE. */
+ bool detaching;
+
+ bool detached;
+ bool exclusive; /* Are we exclusively locked? */
+};
+
/*
* Now that all readers have been evicted, mark the VMA as being out of the
* 'exclude readers' state.
*
* Returns true if the VMA is now detached, otherwise false.
*/
-static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
+static void __vma_exit_exclusive_locked(struct vma_exclude_readers_state *ves)
{
- bool detached;
+ struct vm_area_struct *vma = ves->vma;
+
+ VM_WARN_ON_ONCE(ves->detached);
+ VM_WARN_ON_ONCE(!ves->exclusive);
- detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
- &vma->vm_refcnt);
+ ves->detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
+ &vma->vm_refcnt);
__vma_lockdep_release_exclusive(vma);
- return detached;
+}
+
+static unsigned int get_target_refcnt(struct vma_exclude_readers_state *ves)
+{
+ const unsigned int tgt = ves->detaching ? 0 : 1;
+
+ return tgt | VM_REFCNT_EXCLUDE_READERS_FLAG;
}
/*
@@ -69,30 +89,31 @@ static bool __must_check __vma_exit_exclusive_locked(struct vm_area_struct *vma)
* Note that this function pairs with vma_refcount_put() which will wake up this
* thread when it detects that the last reader has released its lock.
*
- * The state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases where we
- * wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal signal
- * is permitted to kill it.
+ * The ves->state parameter ought to be set to TASK_UNINTERRUPTIBLE in cases
+ * where we wish the thread to sleep uninterruptibly or TASK_KILLABLE if a fatal
+ * signal is permitted to kill it.
*
- * The function will return 0 immediately if the VMA is detached, and 1 once the
- * VMA has evicted all readers, leaving the VMA exclusively locked.
+ * The function sets the ves->locked parameter to true if an exclusive lock was
+ * acquired, or false if the VMA was detached or an error arose on wait.
*
- * If the function returns 1, the caller is required to invoke
- * __vma_exit_exclusive_locked() once the exclusive state is no longer required.
+ * If the function indicates an exclusive lock was acquired via ves->exclusive
+ * (or equivalently, returning 0 with !ves->detached), the caller is required to
+ * invoke __vma_exit_exclusive_locked() once the exclusive state is no longer
+ * required.
*
- * If state is set to something other than TASK_UNINTERRUPTIBLE, the function
- * may also return -EINTR to indicate a fatal signal was received while waiting.
+ * If ves->state is set to something other than TASK_UNINTERRUPTIBLE, the
+ * function may also return -EINTR to indicate a fatal signal was received while
+ * waiting.
*/
-static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
- bool detaching, int state)
+static int __vma_enter_exclusive_locked(struct vma_exclude_readers_state *ves)
{
- int err;
- unsigned int tgt_refcnt = VM_REFCNT_EXCLUDE_READERS_FLAG;
+ struct vm_area_struct *vma = ves->vma;
+ unsigned int tgt_refcnt = get_target_refcnt(ves);
+ int err = 0;
mmap_assert_write_locked(vma->vm_mm);
-
- /* Additional refcnt if the vma is attached. */
- if (!detaching)
- tgt_refcnt++;
+ VM_WARN_ON_ONCE(ves->detached);
+ VM_WARN_ON_ONCE(ves->exclusive);
/*
* If vma is detached then only vma_mark_attached() can raise the
@@ -101,37 +122,39 @@ static int __vma_enter_exclusive_locked(struct vm_area_struct *vma,
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
- if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
+ if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt)) {
+ ves->detached = true;
return 0;
+ }
__vma_lockdep_acquire_exclusive(vma);
err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
refcount_read(&vma->vm_refcnt) == tgt_refcnt,
- state);
+ ves->state);
if (err) {
- if (__vma_exit_exclusive_locked(vma)) {
- /*
- * The wait failed, but the last reader went away
- * as well. Tell the caller the VMA is detached.
- */
- WARN_ON_ONCE(!detaching);
- err = 0;
- }
+ __vma_exit_exclusive_locked(ves);
return err;
}
- __vma_lockdep_stat_mark_acquired(vma);
- return 1;
+ __vma_lockdep_stat_mark_acquired(vma);
+ ves->exclusive = true;
+ return 0;
}
int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
int state)
{
- int locked;
+ int err;
+ struct vma_exclude_readers_state ves = {
+ .vma = vma,
+ .state = state,
+ };
- locked = __vma_enter_exclusive_locked(vma, false, state);
- if (locked < 0)
- return locked;
+ err = __vma_enter_exclusive_locked(&ves);
+ if (err) {
+ WARN_ON_ONCE(ves.detached);
+ return err;
+ }
/*
* We should use WRITE_ONCE() here because we can have concurrent reads
@@ -141,9 +164,11 @@ int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
*/
WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
- /* vma should remain attached. */
- if (locked)
- WARN_ON_ONCE(__vma_exit_exclusive_locked(vma));
+ if (!ves.detached) {
+ __vma_exit_exclusive_locked(&ves);
+ /* VMA should remain attached. */
+ WARN_ON_ONCE(ves.detached);
+ }
return 0;
}
@@ -151,7 +176,12 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
void vma_mark_detached(struct vm_area_struct *vma)
{
- bool detached;
+ struct vma_exclude_readers_state ves = {
+ .vma = vma,
+ .state = TASK_UNINTERRUPTIBLE,
+ .detaching = true,
+ };
+ int err;
vma_assert_write_locked(vma);
vma_assert_attached(vma);
@@ -160,18 +190,26 @@ void vma_mark_detached(struct vm_area_struct *vma)
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
- detached = __vma_refcount_put(vma, NULL);
- if (unlikely(!detached)) {
- /* Wait until vma is detached with no readers. */
- if (__vma_enter_exclusive_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
- /*
- * Once this is complete, no readers can increment the
- * reference count, and the VMA is marked detached.
- */
- detached = __vma_exit_exclusive_locked(vma);
- WARN_ON_ONCE(!detached);
- }
+ if (likely(__vma_refcount_put(vma, NULL)))
+ return;
+
+ /*
+ * Wait until the VMA is detached with no readers. Since we hold the VMA
+ * write lock, the only read locks that might be present are those from
+ * threads trying to acquire the read lock and incrementing the
+ * reference count before realising the write lock is held and
+ * decrementing it.
+ */
+ err = __vma_enter_exclusive_locked(&ves);
+ if (!err && !ves.detached) {
+ /*
+ * Once this is complete, no readers can increment the
+ * reference count, and the VMA is marked detached.
+ */
+ __vma_exit_exclusive_locked(&ves);
}
+ /* If an error arose but we were detached anyway, we don't care. */
+ WARN_ON_ONCE(!ves.detached);
}
/*
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 8/8] mm/vma: improve and document __is_vma_write_locked()
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (6 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 7/8] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
@ 2026-01-22 12:50 ` Lorenzo Stoakes
2026-01-22 12:55 ` [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:50 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
The function is a little confusing, clean it up a little then add a
descriptive comment.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mmap_lock.h | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index 873bc5f3c97c..b00d34b5ad10 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -252,17 +252,30 @@ static inline void vma_end_read(struct vm_area_struct *vma)
vma_refcount_put(vma);
}
-/* WARNING! Can only be used if mmap_lock is expected to be write-locked */
-static inline bool __is_vma_write_locked(struct vm_area_struct *vma, unsigned int *mm_lock_seq)
+/*
+ * Determine whether a VMA is write-locked. Must be invoked ONLY if the mmap
+ * write lock is held.
+ *
+ * Returns true if write-locked, otherwise false.
+ *
+ * Note that mm_lock_seq is updated only if the VMA is NOT write-locked.
+ */
+static inline bool __is_vma_write_locked(struct vm_area_struct *vma,
+ unsigned int *mm_lock_seq)
{
- mmap_assert_write_locked(vma->vm_mm);
+ struct mm_struct *mm = vma->vm_mm;
+ const unsigned int seq = mm->mm_lock_seq.sequence;
+
+ mmap_assert_write_locked(mm);
/*
* current task is holding mmap_write_lock, both vma->vm_lock_seq and
* mm->mm_lock_seq can't be concurrently modified.
*/
- *mm_lock_seq = vma->vm_mm->mm_lock_seq.sequence;
- return (vma->vm_lock_seq == *mm_lock_seq);
+ if (vma->vm_lock_seq == seq)
+ return true;
+ *mm_lock_seq = seq;
+ return false;
}
int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
--
2.52.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
` (7 preceding siblings ...)
2026-01-22 12:50 ` [PATCH v3 8/8] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
@ 2026-01-22 12:55 ` Lorenzo Stoakes
8 siblings, 0 replies; 12+ messages in thread
From: Lorenzo Stoakes @ 2026-01-22 12:55 UTC (permalink / raw)
To: Andrew Morton
Cc: David Hildenbrand, Liam R . Howlett, Vlastimil Babka,
Mike Rapoport, Suren Baghdasaryan, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
Oops... this got screwed up, sorry for noise.
Will resend, corrected...
Cheers, Lorenzo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put
2026-01-22 12:50 ` [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
@ 2026-01-22 18:07 ` Suren Baghdasaryan
0 siblings, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 18:07 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
On Thu, Jan 22, 2026 at 4:50 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The is_vma_writer_only() function is misnamed - this isn't determining if
> there is only a write lock, as it checks for the presence of the
> VM_REFCNT_EXCLUDE_READERS_FLAG.
>
> Really, it is checking to see whether readers are excluded, with a
> possibility of a false positive in the case of a detachment (there we
> expect the vma->vm_refcnt to eventually be set to
> VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
> eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
>
> Rename the function accordingly.
>
> Relatedly, we use a finnicky __refcount_dec_and_test() primitive directly
> in vma_refcount_put(), using the old value to determine what the reference
> count ought to be after the operation is complete (ignoring racing
> reference count adjustments).
IIUC, __refcount_dec_and_test() can decrement the refcount by only 1
and the old value returned (oldcnt) will be the exact value that it
was before this decrement. Therefore oldcnt - 1 must reflect the
refcount value after the decrement. It's possible the refcount gets
manipulated after this operation but that does not make this operation
wrong. I don't quite understand why you think that's racy or finnicky.
>
> Wrap this into a __vma_refcount_put() function, which we can then utilise
> in vma_mark_detached() and thus keep the refcount primitive usage
> abstracted.
>
> Also adjust comments, removing duplicative comments covered elsewhere and
> adding more to aid understanding.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> ---
> include/linux/mmap_lock.h | 62 +++++++++++++++++++++++++++++++--------
> mm/mmap_lock.c | 18 +++++-------
> 2 files changed, 57 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index a764439d0276..0b3614aadbb4 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -122,15 +122,27 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
> vma->vm_lock_seq = UINT_MAX;
> }
>
> -static inline bool is_vma_writer_only(int refcnt)
> +/**
> + * are_readers_excluded() - Determine whether @refcnt describes a VMA which has
> + * excluded all VMA read locks.
> + * @refcnt: The VMA reference count obtained from vm_area_struct->vm_refcnt.
> + *
> + * We may be raced by other readers temporarily incrementing the reference
> + * count, though the race window is very small, this might cause spurious
> + * wakeups.
> + *
> + * In the case of a detached VMA, we may incorrectly indicate that readers are
> + * excluded when one remains, because in that scenario we target a refcount of
> + * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
> + * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
> + *
> + * However, the race window for that is very small so it is unlikely.
> + *
> + * Returns: true if readers are excluded, false otherwise.
> + */
> +static inline bool are_readers_excluded(int refcnt)
> {
> /*
> - * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
> - * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
> - * attached. Waiting on a detached vma happens only in
> - * vma_mark_detached() and is a rare case, therefore most of the time
> - * there will be no unnecessary wakeup.
> - *
> * See the comment describing the vm_area_struct->vm_refcnt field for
> * details of possible refcnt values.
> */
> @@ -138,18 +150,42 @@ static inline bool is_vma_writer_only(int refcnt)
> refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
> }
>
> +static inline bool __vma_refcount_put(struct vm_area_struct *vma, int *refcnt)
> +{
> + int oldcnt;
> + bool detached;
> +
> + detached = __refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
> + if (refcnt)
> + *refcnt = oldcnt - 1;
> + return detached;
IIUC there is always a connection between detached and *refcnt
resulting value. If detached==true then the resulting *refcnt has to
be 0. If so, __vma_refcount_put() can simply return (oldcnt - 1) as
new count:
static inline int __vma_refcount_put(struct vm_area_struct *vma)
{
int oldcnt;
__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt);
return oldcnt - 1;
}
And later:
newcnt = __vma_refcount_put(&vma->vm_refcnt);
detached = newcnt == 0;
> +}
> +
> +/**
> + * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
> + * read-lock being dropped.
> + * @vma: The VMA whose reference count we wish to decrement.
> + *
> + * If we were the last reader, wake up threads waiting to obtain an exclusive
> + * lock.
> + */
> static inline void vma_refcount_put(struct vm_area_struct *vma)
> {
> - /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
> + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
> struct mm_struct *mm = vma->vm_mm;
> - int oldcnt;
> + int refcnt;
> + bool detached;
>
> rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> - if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
>
> - if (is_vma_writer_only(oldcnt - 1))
> - rcuwait_wake_up(&mm->vma_writer_wait);
> - }
> + detached = __vma_refcount_put(vma, &refcnt);
> + /*
> + * __vma_enter_locked() may be sleeping waiting for readers to drop
> + * their reference count, so wake it up if we were the last reader
> + * blocking it from being acquired.
> + */
> + if (!detached && are_readers_excluded(refcnt))
> + rcuwait_wake_up(&mm->vma_writer_wait);
> }
>
> /*
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index 75dc098aea14..ebacb57e5f16 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -130,25 +130,23 @@ EXPORT_SYMBOL_GPL(__vma_start_write);
>
> void vma_mark_detached(struct vm_area_struct *vma)
> {
> + bool detached;
> +
> vma_assert_write_locked(vma);
> vma_assert_attached(vma);
>
> /*
> - * We are the only writer, so no need to use vma_refcount_put().
> - * The condition below is unlikely because the vma has been already
> - * write-locked and readers can increment vm_refcnt only temporarily
I think the above part of the comment is still important and should be
kept intact.
> - * before they check vm_lock_seq, realize the vma is locked and drop
> - * back the vm_refcnt. That is a narrow window for observing a raised
> - * vm_refcnt.
> - *
> * See the comment describing the vm_area_struct->vm_refcnt field for
> * details of possible refcnt values.
> */
> - if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
> + detached = __vma_refcount_put(vma, NULL);
> + if (unlikely(!detached)) {
> /* Wait until vma is detached with no readers. */
> if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
> - bool detached;
> -
> + /*
> + * Once this is complete, no readers can increment the
> + * reference count, and the VMA is marked detached.
> + */
> __vma_exit_locked(vma, &detached);
> WARN_ON_ONCE(!detached);
> }
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines
2026-01-22 12:50 ` [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
@ 2026-01-22 19:25 ` Suren Baghdasaryan
0 siblings, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2026-01-22 19:25 UTC (permalink / raw)
To: Lorenzo Stoakes
Cc: Andrew Morton, David Hildenbrand, Liam R . Howlett,
Vlastimil Babka, Mike Rapoport, Michal Hocko, Shakeel Butt,
Jann Horn, linux-mm, linux-kernel, linux-rt-devel,
Peter Zijlstra, Ingo Molnar, Will Deacon, Boqun Feng,
Waiman Long, Sebastian Andrzej Siewior, Clark Williams,
Steven Rostedt
On Thu, Jan 22, 2026 at 4:50 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> The code is littered with inscrutable and duplicative lockdep incantations,
> replace these with defines which explain what is going on and add
> commentary to explain what we're doing.
>
> If lockdep is disabled these become no-ops. We must use defines so _RET_IP_
> remains meaningful.
>
> These are self-documenting and aid readability of the code.
>
> Additionally, instead of using the confusing rwsem_*() form for something
> that is emphatically not an rwsem, we instead explicitly use
> lock_[acquired, release]_shared/exclusive() lockdep invocations since we
> are doing something rather custom here and these make more sense to use.
>
> No functional change intended.
>
> Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
> ---
> include/linux/mmap_lock.h | 35 ++++++++++++++++++++++++++++++++---
> mm/mmap_lock.c | 10 +++++-----
> 2 files changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
> index 0b3614aadbb4..da63b1be6ec0 100644
> --- a/include/linux/mmap_lock.h
> +++ b/include/linux/mmap_lock.h
> @@ -78,6 +78,36 @@ static inline void mmap_assert_write_locked(const struct mm_struct *mm)
>
> #ifdef CONFIG_PER_VMA_LOCK
>
> +/*
> + * VMA locks do not behave like most ordinary locks found in the kernel, so we
> + * cannot quite have full lockdep tracking in the way we would ideally prefer.
> + *
> + * Read locks act as shared locks which exclude an exclusive lock being
> + * taken. We therefore mark these accordingly on read lock acquire/release.
> + *
> + * Write locks are acquired exclusively per-VMA, but released in a shared
> + * fashion, that is upon vma_end_write_all(), we update the mmap's seqcount such
> + * that write lock is de-acquired.
> + *
> + * We therefore cannot track write locks per-VMA, nor do we try. Mitigating this
> + * is the fact that, of course, we do lockdep-track the mmap lock rwsem.
> + *
> + * We do, however, want to indicate that during either acquisition of a VMA
> + * write lock or detachment of a VMA that we require the lock held be exclusive,
> + * so we utilise lockdep to do so.
> + */
> +#define __vma_lockdep_acquire_read(vma) \
> + lock_acquire_shared(&vma->vmlock_dep_map, 0, 1, NULL, _RET_IP_)
> +#define __vma_lockdep_release_read(vma) \
> + lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +#define __vma_lockdep_acquire_exclusive(vma) \
> + lock_acquire_exclusive(&vma->vmlock_dep_map, 0, 0, NULL, _RET_IP_)
> +#define __vma_lockdep_release_exclusive(vma) \
> + lock_release(&vma->vmlock_dep_map, _RET_IP_)
> +/* Only meaningful if CONFIG_LOCK_STAT is defined. */
> +#define __vma_lockdep_stat_mark_acquired(vma) \
> + lock_acquired(&vma->vmlock_dep_map, _RET_IP_)
> +
> static inline void mm_lock_seqcount_init(struct mm_struct *mm)
> {
> seqcount_init(&mm->mm_lock_seq);
> @@ -176,8 +206,7 @@ static inline void vma_refcount_put(struct vm_area_struct *vma)
> int refcnt;
> bool detached;
>
> - rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> -
> + __vma_lockdep_release_read(vma);
> detached = __vma_refcount_put(vma, &refcnt);
> /*
> * __vma_enter_locked() may be sleeping waiting for readers to drop
> @@ -203,7 +232,7 @@ static inline bool vma_start_read_locked_nested(struct vm_area_struct *vma, int
> VM_REFCNT_LIMIT)))
> return false;
>
> - rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> + __vma_lockdep_acquire_read(vma);
> return true;
> }
>
> diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
> index ebacb57e5f16..9563bfb051f4 100644
> --- a/mm/mmap_lock.c
> +++ b/mm/mmap_lock.c
> @@ -72,7 +72,7 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> if (!refcount_add_not_zero(VM_REFCNT_EXCLUDE_READERS_FLAG, &vma->vm_refcnt))
> return 0;
>
> - rwsem_acquire(&vma->vmlock_dep_map, 0, 0, _RET_IP_);
> + __vma_lockdep_acquire_exclusive(vma);
> err = rcuwait_wait_event(&vma->vm_mm->vma_writer_wait,
> refcount_read(&vma->vm_refcnt) == tgt_refcnt,
> state);
> @@ -85,10 +85,10 @@ static inline int __vma_enter_locked(struct vm_area_struct *vma,
> WARN_ON_ONCE(!detaching);
> err = 0;
> }
> - rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> + __vma_lockdep_release_exclusive(vma);
> return err;
> }
> - lock_acquired(&vma->vmlock_dep_map, _RET_IP_);
> + __vma_lockdep_stat_mark_acquired(vma);
>
> return 1;
> }
> @@ -97,7 +97,7 @@ static inline void __vma_exit_locked(struct vm_area_struct *vma, bool *detached)
> {
> *detached = refcount_sub_and_test(VM_REFCNT_EXCLUDE_READERS_FLAG,
> &vma->vm_refcnt);
> - rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
> + __vma_lockdep_release_exclusive(vma);
> }
>
> int __vma_start_write(struct vm_area_struct *vma, unsigned int mm_lock_seq,
> @@ -199,7 +199,7 @@ static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm,
> goto err;
> }
>
> - rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_);
> + __vma_lockdep_acquire_read(vma);
>
> if (unlikely(vma->vm_mm != mm))
> goto err_unstable;
> --
> 2.52.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2026-01-22 19:26 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-22 12:50 [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 1/8] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 2/8] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 3/8] mm/vma: rename is_vma_write_only(), separate out shared refcount put Lorenzo Stoakes
2026-01-22 18:07 ` Suren Baghdasaryan
2026-01-22 12:50 ` [PATCH v3 4/8] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
2026-01-22 19:25 ` Suren Baghdasaryan
2026-01-22 12:50 ` [PATCH v3 5/8] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 6/8] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 7/8] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
2026-01-22 12:50 ` [PATCH v3 8/8] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
2026-01-22 12:55 ` [PATCH v3 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox