From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: David Hildenbrand <david@kernel.org>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Shakeel Butt <shakeel.butt@linux.dev>,
Jann Horn <jannh@google.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
linux-rt-devel@lists.linux.dev,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Will Deacon <will@kernel.org>,
Boqun Feng <boqun.feng@gmail.com>,
Waiman Long <longman@redhat.com>,
Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
Clark Williams <clrkwllms@kernel.org>,
Steven Rostedt <rostedt@goodmis.org>
Subject: [PATCH v4 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put
Date: Fri, 23 Jan 2026 20:12:13 +0000 [thread overview]
Message-ID: <32053580bff460eb1092ef780b526cefeb748bad.1769198904.git.lorenzo.stoakes@oracle.com> (raw)
In-Reply-To: <cover.1769198904.git.lorenzo.stoakes@oracle.com>
The is_vma_writer_only() function is misnamed - this isn't determining if
there is only a write lock, as it checks for the presence of the
VM_REFCNT_EXCLUDE_READERS_FLAG.
Really, it is checking to see whether readers are excluded, with a
possibility of a false positive in the case of a detachment (there we
expect the vma->vm_refcnt to eventually be set to
VM_REFCNT_EXCLUDE_READERS_FLAG, whereas for an attached VMA we expect it to
eventually be set to VM_REFCNT_EXCLUDE_READERS_FLAG + 1).
Rename the function accordingly.
Relatedly, we use a __refcount_dec_and_test() primitive directly in
vma_refcount_put(), using the old value to determine what the reference
count ought to be after the operation is complete (ignoring racing
reference count adjustments).
Wrap this into a __vma_refcount_put_return() function, which we can then
utilise in vma_mark_detached() and thus keep the refcount primitive usage
abstracted.
This function, as the name implies, returns the value after the reference
count has been updated.
This reduces duplication in the two invocations of this function.
Also adjust comments, removing duplicative comments covered elsewhere and
adding more to aid understanding.
No functional change intended.
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
---
include/linux/mmap_lock.h | 66 +++++++++++++++++++++++++++++++--------
mm/mmap_lock.c | 17 +++++-----
2 files changed, 63 insertions(+), 20 deletions(-)
diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h
index a764439d0276..294fb282052d 100644
--- a/include/linux/mmap_lock.h
+++ b/include/linux/mmap_lock.h
@@ -122,15 +122,22 @@ static inline void vma_lock_init(struct vm_area_struct *vma, bool reset_refcnt)
vma->vm_lock_seq = UINT_MAX;
}
-static inline bool is_vma_writer_only(int refcnt)
+/*
+ * This function determines whether the input VMA reference count describes a
+ * VMA which has excluded all VMA read locks.
+ *
+ * In the case of a detached VMA, we may incorrectly indicate that readers are
+ * excluded when one remains, because in that scenario we target a refcount of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG, rather than the attached target of
+ * VM_REFCNT_EXCLUDE_READERS_FLAG + 1.
+ *
+ * However, the race window for that is very small so it is unlikely.
+ *
+ * Returns: true if readers are excluded, false otherwise.
+ */
+static inline bool __vma_are_readers_excluded(int refcnt)
{
/*
- * With a writer and no readers, refcnt is VM_REFCNT_EXCLUDE_READERS_FLAG
- * if the vma is detached and (VM_REFCNT_EXCLUDE_READERS_FLAG + 1) if it is
- * attached. Waiting on a detached vma happens only in
- * vma_mark_detached() and is a rare case, therefore most of the time
- * there will be no unnecessary wakeup.
- *
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
@@ -138,18 +145,51 @@ static inline bool is_vma_writer_only(int refcnt)
refcnt <= VM_REFCNT_EXCLUDE_READERS_FLAG + 1;
}
+/*
+ * Actually decrement the VMA reference count.
+ *
+ * The function returns the reference count as it was immediately after the
+ * decrement took place. If it returns zero, the VMA is now detached.
+ */
+static inline __must_check unsigned int
+__vma_refcount_put_return(struct vm_area_struct *vma)
+{
+ int oldcnt;
+
+ if (__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt))
+ return 0;
+
+ return oldcnt - 1;
+}
+
+/**
+ * vma_refcount_put() - Drop reference count in VMA vm_refcnt field due to a
+ * read-lock being dropped.
+ * @vma: The VMA whose reference count we wish to decrement.
+ *
+ * If we were the last reader, wake up threads waiting to obtain an exclusive
+ * lock.
+ */
static inline void vma_refcount_put(struct vm_area_struct *vma)
{
- /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */
+ /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt. */
struct mm_struct *mm = vma->vm_mm;
- int oldcnt;
+ int newcnt;
rwsem_release(&vma->vmlock_dep_map, _RET_IP_);
- if (!__refcount_dec_and_test(&vma->vm_refcnt, &oldcnt)) {
- if (is_vma_writer_only(oldcnt - 1))
- rcuwait_wake_up(&mm->vma_writer_wait);
- }
+ newcnt = __vma_refcount_put_return(vma);
+ /*
+ * __vma_enter_locked() may be sleeping waiting for readers to drop
+ * their reference count, so wake it up if we were the last reader
+ * blocking it from being acquired.
+ *
+ * We may be raced by other readers temporarily incrementing the
+ * reference count, though the race window is very small, this might
+ * cause spurious wakeups.
+ */
+ if (newcnt && __vma_are_readers_excluded(newcnt))
+ rcuwait_wake_up(&mm->vma_writer_wait);
}
/*
diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c
index 75dc098aea14..6be1bbcde09e 100644
--- a/mm/mmap_lock.c
+++ b/mm/mmap_lock.c
@@ -134,21 +134,24 @@ void vma_mark_detached(struct vm_area_struct *vma)
vma_assert_attached(vma);
/*
- * We are the only writer, so no need to use vma_refcount_put().
- * The condition below is unlikely because the vma has been already
- * write-locked and readers can increment vm_refcnt only temporarily
- * before they check vm_lock_seq, realize the vma is locked and drop
- * back the vm_refcnt. That is a narrow window for observing a raised
- * vm_refcnt.
+ * This condition - that the VMA is still attached (refcnt > 0) - is
+ * unlikely, because the vma has been already write-locked and readers
+ * can increment vm_refcnt only temporarily before they check
+ * vm_lock_seq, realize the vma is locked and drop back the
+ * vm_refcnt. That is a narrow window for observing a raised vm_refcnt.
*
* See the comment describing the vm_area_struct->vm_refcnt field for
* details of possible refcnt values.
*/
- if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) {
+ if (unlikely(__vma_refcount_put_return(vma))) {
/* Wait until vma is detached with no readers. */
if (__vma_enter_locked(vma, true, TASK_UNINTERRUPTIBLE)) {
bool detached;
+ /*
+ * Once this is complete, no readers can increment the
+ * reference count, and the VMA is marked detached.
+ */
__vma_exit_locked(vma, &detached);
WARN_ON_ONCE(!detached);
}
--
2.52.0
next prev parent reply other threads:[~2026-01-23 20:12 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-23 20:12 [PATCH v4 00/10] mm: add and use vma_assert_stabilised() helper Lorenzo Stoakes
2026-01-23 20:12 ` [PATCH v4 01/10] mm/vma: rename VMA_LOCK_OFFSET to VM_REFCNT_EXCLUDE_READERS_FLAG Lorenzo Stoakes
2026-01-30 16:50 ` Liam R. Howlett
2026-01-23 20:12 ` [PATCH v4 02/10] mm/vma: document possible vma->vm_refcnt values and reference comment Lorenzo Stoakes
2026-01-26 5:15 ` Suren Baghdasaryan
2026-01-26 9:33 ` Lorenzo Stoakes
2026-01-26 9:40 ` Vlastimil Babka
2026-01-30 17:06 ` Liam R. Howlett
2026-01-23 20:12 ` Lorenzo Stoakes [this message]
2026-01-26 5:36 ` [PATCH v4 03/10] mm/vma: rename is_vma_write_only(), separate out shared refcount put Suren Baghdasaryan
2026-01-26 9:45 ` Lorenzo Stoakes
2026-01-26 10:12 ` Vlastimil Babka
2026-01-23 20:12 ` [PATCH v4 04/10] mm/vma: add+use vma lockdep acquire/release defines Lorenzo Stoakes
2026-01-28 11:18 ` Sebastian Andrzej Siewior
2026-01-28 11:31 ` Lorenzo Stoakes
2026-01-28 11:37 ` Sebastian Andrzej Siewior
2026-01-28 11:48 ` Lorenzo Stoakes
2026-01-29 21:30 ` Suren Baghdasaryan
2026-01-23 20:12 ` [PATCH v4 05/10] mm/vma: de-duplicate __vma_enter_locked() error path Lorenzo Stoakes
2026-01-23 20:12 ` [PATCH v4 06/10] mm/vma: clean up __vma_enter/exit_locked() Lorenzo Stoakes
2026-01-26 5:47 ` Suren Baghdasaryan
2026-01-26 9:45 ` Lorenzo Stoakes
2026-01-26 10:25 ` Vlastimil Babka
2026-01-23 20:12 ` [PATCH v4 07/10] mm/vma: introduce helper struct + thread through exclusive lock fns Lorenzo Stoakes
2026-01-26 11:16 ` Vlastimil Babka
2026-01-26 16:09 ` Lorenzo Stoakes
2026-01-26 19:38 ` Andrew Morton
2026-01-26 18:15 ` Suren Baghdasaryan
2026-01-23 20:12 ` [PATCH v4 08/10] mm/vma: improve and document __is_vma_write_locked() Lorenzo Stoakes
2026-01-26 11:30 ` Vlastimil Babka
2026-01-26 16:29 ` Lorenzo Stoakes
2026-01-26 19:21 ` Suren Baghdasaryan
2026-01-28 11:51 ` Lorenzo Stoakes
2026-01-28 13:01 ` Vlastimil Babka
2026-01-28 18:52 ` Suren Baghdasaryan
2026-01-26 16:30 ` Lorenzo Stoakes
2026-01-23 20:12 ` [PATCH v4 09/10] mm/vma: update vma_assert_locked() to use lockdep Lorenzo Stoakes
2026-01-26 13:42 ` Vlastimil Babka
2026-01-26 16:44 ` Lorenzo Stoakes
2026-01-26 17:16 ` Lorenzo Stoakes
2026-01-26 17:37 ` Lorenzo Stoakes
2026-01-23 20:12 ` [PATCH v4 10/10] mm/vma: add and use vma_assert_stabilised() Lorenzo Stoakes
2026-01-23 22:48 ` [PATCH v4 00/10] mm: add and use vma_assert_stabilised() helper Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=32053580bff460eb1092ef780b526cefeb748bad.1769198904.git.lorenzo.stoakes@oracle.com \
--to=lorenzo.stoakes@oracle.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=bigeasy@linutronix.de \
--cc=boqun.feng@gmail.com \
--cc=clrkwllms@kernel.org \
--cc=david@kernel.org \
--cc=jannh@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=longman@redhat.com \
--cc=mhocko@suse.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox