[PATCH linux-next] mm/madvise: prefer VMA lock for MADV_REMOVE

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: <wang.yaxin@zte.com.cn>
To: <akpm@linux-foundation.org>, <liam.howlett@oracle.com>,
	<lorenzo.stoakes@oracle.com>, <david@kernel.org>
Cc: <vbabka@suse.cz>, <jannh@google.com>, <linux-mm@kvack.org>,
	<linux-kernel@vger.kernel.org>, <xu.xin16@zte.com.cn>,
	<yang.yang29@zte.com.cn>, <wang.yaxin@zte.com.cn>,
	<fan.yu9@zte.com.cn>, <he.peilin@zte.com.cn>,
	<tu.qiang35@zte.com.cn>, <qiu.yutan@zte.com.cn>,
	<jiang.kun2@zte.com.cn>, <lu.zhongjun@zte.com.cn>
Subject: [PATCH linux-next] mm/madvise: prefer VMA lock for MADV_REMOVE
Date: Wed, 14 Jan 2026 11:24:17 +0800 (CST)	[thread overview]
Message-ID: <202601141124178748cM66DJW2fzNea7Uym1mG@zte.com.cn> (raw)

From: Jiang Kun <jiang.kun2@zte.com.cn>

MADV_REMOVE currently runs under the process-wide mmap_read_lock() and
temporarily drops and reacquires it around filesystem hole-punching.
For single-VMA, local-mm, non-UFFD-armed ranges we can safely operate
under the finer-grained per-VMA read lock to reduce contention and lock
hold time, while preserving semantics.

This patch:
- Switches MADV_REMOVE to prefer MADVISE_VMA_READ_LOCK via get_lock_mode().
- Adds a branch in madvise_remove():
  * Under VMA lock: avoid mark_mmap_lock_dropped() and mmap lock churn;
    take a file reference and call vfs_fallocate() directly.
  * Under mmap read lock fallback: preserve existing behavior including
    userfaultfd_remove() coordination and temporary mmap_read_unlock/lock
    around vfs_fallocate().

Constraints and fallback:
- try_vma_read_lock() enforces single VMA, local mm, and userfaultfd
  not armed (userfaultfd_armed(vma) == false). If any condition fails,
  we fall back to mmap_read_lock(mm) and use the original path.
- Semantics are unchanged: permission checks, VM_LOCKED rejection,
  shared-may-write requirement, error propagation all remain as before.

Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn>
Signed-off-by: Yaxin Wang <wang.yaxin@zte.com.cn>
---
 mm/madvise.c | 31 +++++++++++++++++++++++++------
 1 file changed, 25 insertions(+), 6 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 6bf7009fa5ce..279ec5169879 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1015,7 +1015,19 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
 	unsigned long start = madv_behavior->range.start;
 	unsigned long end = madv_behavior->range.end;

-	mark_mmap_lock_dropped(madv_behavior);
+	/*
+	 * Prefer VMA read lock path: when operating under VMA lock, we avoid
+	 * dropping/reacquiring the mmap lock and directly perform the filesystem
+	 * operation while the VMA is read-locked. We still take and drop a file
+	 * reference to protect against concurrent file changes.
+	 *
+	 * When operating under mmap read lock (fallback), preserve existing
+	 * behaviour: mark lock dropped, coordinate with userfaultfd_remove(),
+	 * temporarily drop mmap_read_lock around vfs_fallocate(), and then
+	 * reacquire it.
+	 */
+	if (madv_behavior->lock_mode == MADVISE_MMAP_READ_LOCK)
+		mark_mmap_lock_dropped(madv_behavior);

 	if (vma->vm_flags & VM_LOCKED)
 		return -EINVAL;
@@ -1033,12 +1045,19 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
 			+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);

 	/*
-	 * Filesystem's fallocate may need to take i_rwsem.  We need to
-	 * explicitly grab a reference because the vma (and hence the
-	 * vma's reference to the file) can go away as soon as we drop
-	 * mmap_lock.
+	 * Execute filesystem punch-hole under appropriate locking.
+	 * - VMA lock path: no mmap lock held; call vfs_fallocate() directly.
+	 * - mmap lock path: follow existing protocol including UFFD coordination
+	 *   and temporary mmap_read_unlock/lock around the filesystem call.
 	 */
 	get_file(f);
+	if (madv_behavior->lock_mode == MADVISE_VMA_READ_LOCK) {
+		error = vfs_fallocate(f,
+					FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+					offset, end - start);
+		fput(f);
+		return error;
+	}
 	if (userfaultfd_remove(vma, start, end)) {
 		/* mmap_lock was not released by userfaultfd_remove() */
 		mmap_read_unlock(mm);
@@ -1754,7 +1773,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
 		return MADVISE_NO_LOCK;

 	switch (madv_behavior->behavior) {
-	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_COLD:
 	case MADV_PAGEOUT:
@@ -1762,6 +1780,7 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
 	case MADV_POPULATE_WRITE:
 	case MADV_COLLAPSE:
 		return MADVISE_MMAP_READ_LOCK;
+	case MADV_REMOVE:
 	case MADV_GUARD_INSTALL:
 	case MADV_GUARD_REMOVE:
 	case MADV_DONTNEED:
-- 
2.43.5

next             reply	other threads:[~2026-01-14  3:24 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-14  3:24 wang.yaxin [this message]
2026-01-14  4:18 ` Matthew Wilcox
2026-01-14  4:19 ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202601141124178748cM66DJW2fzNea7Uym1mG@zte.com.cn \
    --to=wang.yaxin@zte.com.cn \
    --cc=akpm@linux-foundation.org \
    --cc=david@kernel.org \
    --cc=fan.yu9@zte.com.cn \
    --cc=he.peilin@zte.com.cn \
    --cc=jannh@google.com \
    --cc=jiang.kun2@zte.com.cn \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lu.zhongjun@zte.com.cn \
    --cc=qiu.yutan@zte.com.cn \
    --cc=tu.qiang35@zte.com.cn \
    --cc=vbabka@suse.cz \
    --cc=xu.xin16@zte.com.cn \
    --cc=yang.yang29@zte.com.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox