From: <jiang.kun2@zte.com.cn>
To: <akpm@linux-foundation.org>, <liam.howlett@oracle.com>,
<ljs@kernel.org>, <david@kernel.org>, <vbabka@kernel.org>,
<jannh@google.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<xu.xin16@zte.com.cn>, <wang.yaxin@zte.com.cn>,
<jiang.kun2@zte.com.cn>, <lu.zhongjun@zte.com.cn>
Subject: [PATCH v2] mm/madvise: prefer VMA lock for MADV_REMOVE
Date: Fri, 10 Apr 2026 16:02:49 +0800 (CST) [thread overview]
Message-ID: <20260410160249749i98jwNgNLmLMKRNVeoKVe@zte.com.cn> (raw)
From: Jiang Kun <jiang.kun2@zte.com.cn>
MADV_REMOVE prefers the per-VMA read lock for single-VMA, local-mm,
non-UFFD-armed ranges, avoiding mmap_lock contention for such ranges.
However, calling into the filesystem while holding vm_lock (VMA lock) can
create lock ordering issues. syzbot reported a possible deadlock in
blkdev_fallocate() when vfs_fallocate() is called under vm_lock.
Fix this by dropping the VMA lock before invoking vfs_fallocate(), after
taking an extra reference to the file. Keep the existing mmap_lock fallback
path and its userfaultfd coordination unchanged.
Repeated benchmark runs show no regression in the uncontended case, and show
benefit once mmap_lock contention is introduced.
Link: https://ci.syzbot.org/series/30acb9df-ca55-4cbf-81ed-89b84da8edc1
Link: https://lore.kernel.org/all/aWcZCwz__qwwKbxw@casper.infradead.org/
Signed-off-by: Jiang Kun <jiang.kun2@zte.com.cn>
Signed-off-by: Yaxin Wang <wang.yaxin@zte.com.cn>
---
mm/madvise.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index 69708e953cf5..0932579bccb4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1008,8 +1008,6 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
unsigned long start = madv_behavior->range.start;
unsigned long end = madv_behavior->range.end;
- mark_mmap_lock_dropped(madv_behavior);
-
if (vma->vm_flags & VM_LOCKED)
return -EINVAL;
@@ -1025,6 +1023,20 @@ static long madvise_remove(struct madvise_behavior *madv_behavior)
offset = (loff_t)(start - vma->vm_start)
+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
+ /* Avoid calling into the filesystem while holding a VMA lock. */
+ if (madv_behavior->lock_mode == MADVISE_VMA_READ_LOCK) {
+ get_file(f);
+ vma_end_read(vma);
+ madv_behavior->vma = NULL;
+ error = vfs_fallocate(f,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ offset, end - start);
+ fput(f);
+ return error;
+ }
+
+ mark_mmap_lock_dropped(madv_behavior);
+
/*
* Filesystem's fallocate may need to take i_rwsem. We need to
* explicitly grab a reference because the vma (and hence the
@@ -1677,7 +1689,8 @@ int madvise_walk_vmas(struct madvise_behavior *madv_behavior)
if (madv_behavior->lock_mode == MADVISE_VMA_READ_LOCK &&
try_vma_read_lock(madv_behavior)) {
error = madvise_vma_behavior(madv_behavior);
- vma_end_read(madv_behavior->vma);
+ if (madv_behavior->vma)
+ vma_end_read(madv_behavior->vma);
return error;
}
@@ -1746,7 +1759,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
return MADVISE_NO_LOCK;
switch (madv_behavior->behavior) {
- case MADV_REMOVE:
case MADV_WILLNEED:
case MADV_COLD:
case MADV_PAGEOUT:
@@ -1754,6 +1766,7 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi
case MADV_POPULATE_WRITE:
case MADV_COLLAPSE:
return MADVISE_MMAP_READ_LOCK;
+ case MADV_REMOVE:
case MADV_GUARD_INSTALL:
case MADV_GUARD_REMOVE:
case MADV_DONTNEED:
--
2.53.0
reply other threads:[~2026-04-10 8:03 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260410160249749i98jwNgNLmLMKRNVeoKVe@zte.com.cn \
--to=jiang.kun2@zte.com.cn \
--cc=akpm@linux-foundation.org \
--cc=david@kernel.org \
--cc=jannh@google.com \
--cc=liam.howlett@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ljs@kernel.org \
--cc=lu.zhongjun@zte.com.cn \
--cc=vbabka@kernel.org \
--cc=wang.yaxin@zte.com.cn \
--cc=xu.xin16@zte.com.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox