From: Kiryl Shutsemau <kirill@shutemov.name>
To: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>,
Matthew Wilcox <willy@infradead.org>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
Harry Yoo <harry.yoo@oracle.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Kiryl Shutsemau <kas@kernel.org>
Subject: [PATCHv3 1/5] mm/rmap: Fix a mlock race condition in folio_referenced_one()
Date: Tue, 23 Sep 2025 12:03:06 +0100 [thread overview]
Message-ID: <20250923110310.689126-2-kirill@shutemov.name> (raw)
In-Reply-To: <20250923110310.689126-1-kirill@shutemov.name>
From: Kiryl Shutsemau <kas@kernel.org>
The mlock_vma_folio() function requires the page table lock to be held
in order to safely mlock the folio. However, folio_referenced_one()
mlocks a large folios outside of the page_vma_mapped_walk() loop where
the page table lock has already been dropped.
Rework the mlock logic to use the same code path inside the loop for
both large and small folios.
Use PVMW_PGTABLE_CROSSED to detect when the folio is mapped across a
page table boundary.
Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
---
mm/rmap.c | 59 ++++++++++++++++++++-----------------------------------
1 file changed, 21 insertions(+), 38 deletions(-)
diff --git a/mm/rmap.c b/mm/rmap.c
index 568198e9efc2..3d0235f332de 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -851,34 +851,34 @@ static bool folio_referenced_one(struct folio *folio,
{
struct folio_referenced_arg *pra = arg;
DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0);
- int referenced = 0;
- unsigned long start = address, ptes = 0;
+ int ptes = 0, referenced = 0;
while (page_vma_mapped_walk(&pvmw)) {
address = pvmw.address;
if (vma->vm_flags & VM_LOCKED) {
- if (!folio_test_large(folio) || !pvmw.pte) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- page_vma_mapped_walk_done(&pvmw);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
- /*
- * For large folio fully mapped to VMA, will
- * be handled after the pvmw loop.
- *
- * For large folio cross VMA boundaries, it's
- * expected to be picked by page reclaim. But
- * should skip reference of pages which are in
- * the range of VM_LOCKED vma. As page reclaim
- * should just count the reference of pages out
- * the range of VM_LOCKED vma.
- */
ptes++;
pra->mapcount--;
- continue;
+
+ /* Only mlock fully mapped pages */
+ if (pvmw.pte && ptes != pvmw.nr_pages)
+ continue;
+
+ /*
+ * All PTEs must be protected by page table lock in
+ * order to mlock the page.
+ *
+ * If page table boundary has been cross, current ptl
+ * only protect part of ptes.
+ */
+ if (pvmw.flags & PVMW_PGTABLE_CROSSSED)
+ continue;
+
+ /* Restore the mlock which got missed */
+ mlock_vma_folio(folio, vma);
+ page_vma_mapped_walk_done(&pvmw);
+ pra->vm_flags |= VM_LOCKED;
+ return false; /* To break the loop */
}
/*
@@ -914,23 +914,6 @@ static bool folio_referenced_one(struct folio *folio,
pra->mapcount--;
}
- if ((vma->vm_flags & VM_LOCKED) &&
- folio_test_large(folio) &&
- folio_within_vma(folio, vma)) {
- unsigned long s_align, e_align;
-
- s_align = ALIGN_DOWN(start, PMD_SIZE);
- e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
-
- /* folio doesn't cross page table boundary and fully mapped */
- if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
- }
-
if (referenced)
folio_clear_idle(folio);
if (folio_test_clear_young(folio))
--
2.50.1
next prev parent reply other threads:[~2025-09-23 11:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-23 11:03 [PATCHv3 0/5] mm: Improve mlock tracking for large folios Kiryl Shutsemau
2025-09-23 11:03 ` Kiryl Shutsemau [this message]
2025-09-23 11:03 ` [PATCHv3 2/5] mm/rmap: mlock large folios in try_to_unmap_one() Kiryl Shutsemau
2025-09-23 11:03 ` [PATCHv3 3/5] mm/fault: Try to map the entire file folio in finish_fault() Kiryl Shutsemau
2025-09-23 11:03 ` [PATCHv3 4/5] mm/filemap: Map entire large folio faultaround Kiryl Shutsemau
2025-09-23 11:03 ` [PATCHv3 5/5] mm/rmap: Improve mlock tracking for large folios Kiryl Shutsemau
2025-09-23 11:05 ` [PATCHv3 0/5] mm: " Kiryl Shutsemau
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250923110310.689126-2-kirill@shutemov.name \
--to=kirill@shutemov.name \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=kas@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox