From: Kiryl Shutsemau <kirill@shutemov.name>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Yin Fengwei <fengwei.yin@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>,
Matthew Wilcox <willy@infradead.org>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
Harry Yoo <harry.yoo@oracle.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios
Date: Thu, 18 Sep 2025 14:48:20 +0100 [thread overview]
Message-ID: <ndryzvkmrfidmjgj4tl27hk2kmspmb42mxl2smuwgmp5hyedzh@thggle3dhp5j> (raw)
In-Reply-To: <429481ef-6527-40f5-b7a0-c9370fd1e374@lucifer.local>
On Thu, Sep 18, 2025 at 02:10:05PM +0100, Lorenzo Stoakes wrote:
> On Thu, Sep 18, 2025 at 12:21:57PM +0100, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > The kernel currently does not mlock large folios when adding them to
> > rmap, stating that it is difficult to confirm that the folio is fully
> > mapped and safe to mlock it. However, nowadays the caller passes a
> > number of pages of the folio that are getting mapped, making it easy to
> > check if the entire folio is mapped to the VMA.
> >
> > mlock the folio on rmap if it is fully mapped to the VMA.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
>
> The logic looks good to me, so:
>
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>
> But note the comments below.
>
> > ---
> > mm/rmap.c | 13 ++++---------
> > 1 file changed, 4 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 568198e9efc2..ca8d4ef42c2d 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1478,13 +1478,8 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> > PageAnonExclusive(cur_page), folio);
> > }
> >
> > - /*
> > - * For large folio, only mlock it if it's fully mapped to VMA. It's
> > - * not easy to check whether the large folio is fully mapped to VMA
> > - * here. Only mlock normal 4K folio and leave page reclaim to handle
> > - * large folio.
> > - */
> > - if (!folio_test_large(folio))
> > + /* Only mlock it if the folio is fully mapped to the VMA */
> > + if (folio_nr_pages(folio) == nr_pages)
>
> OK this is nice, as partially mapped will have folio_nr_pages() != nr_pages. So
> logically this must be correct.
>
> > mlock_vma_folio(folio, vma);
> > }
> >
> > @@ -1620,8 +1615,8 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
> > nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped);
> > __folio_mod_stat(folio, nr, nr_pmdmapped);
> >
> > - /* See comments in folio_add_anon_rmap_*() */
> > - if (!folio_test_large(folio))
> > + /* Only mlock it if the folio is fully mapped to the VMA */
> > + if (folio_nr_pages(folio) == nr_pages)
> > mlock_vma_folio(folio, vma);
> > }
> >
> > --
> > 2.50.1
> >
>
> I see in try_to_unmap_one():
>
> if (!(flags & TTU_IGNORE_MLOCK) &&
> (vma->vm_flags & VM_LOCKED)) {
> /* Restore the mlock which got missed */
> if (!folio_test_large(folio))
> mlock_vma_folio(folio, vma);
>
> Do we care about this?
>
> It seems like folio_referenced_one() does some similar logic:
>
> if (vma->vm_flags & VM_LOCKED) {
> if (!folio_test_large(folio) || !pvmw.pte) {
> /* Restore the mlock which got missed */
> mlock_vma_folio(folio, vma);
> page_vma_mapped_walk_done(&pvmw);
> pra->vm_flags |= VM_LOCKED;
> return false; /* To break the loop */
> }
>
> ...
>
> if ((vma->vm_flags & VM_LOCKED) &&
> folio_test_large(folio) &&
> folio_within_vma(folio, vma)) {
> unsigned long s_align, e_align;
>
> s_align = ALIGN_DOWN(start, PMD_SIZE);
> e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
>
> /* folio doesn't cross page table boundary and fully mapped */
> if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
> /* Restore the mlock which got missed */
> mlock_vma_folio(folio, vma);
> pra->vm_flags |= VM_LOCKED;
> return false; /* To break the loop */
> }
> }
>
> So maybe we could do something similar in try_to_unmap_one()?
Hm. This seems to be buggy to me.
mlock_vma_folio() has to be called with ptl taken, no? It gets dropped
by this place.
+Fengwei.
I think this has to be handled inside the loop once ptes reaches
folio_nr_pages(folio).
Maybe something like this (untested):
diff --git a/mm/rmap.c b/mm/rmap.c
index ca8d4ef42c2d..719f1c99470c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -858,17 +858,13 @@ static bool folio_referenced_one(struct folio *folio,
address = pvmw.address;
if (vma->vm_flags & VM_LOCKED) {
- if (!folio_test_large(folio) || !pvmw.pte) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- page_vma_mapped_walk_done(&pvmw);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
+ unsigned long s_align, e_align;
+
+ /* Small folio or PMD-mapped large folio */
+ if (!folio_test_large(folio) || !pvmw.pte)
+ goto restore_mlock;
+
/*
- * For large folio fully mapped to VMA, will
- * be handled after the pvmw loop.
- *
* For large folio cross VMA boundaries, it's
* expected to be picked by page reclaim. But
* should skip reference of pages which are in
@@ -878,7 +874,23 @@ static bool folio_referenced_one(struct folio *folio,
*/
ptes++;
pra->mapcount--;
- continue;
+
+ /* Folio must be fully mapped to be mlocked */
+ if (ptes != folio_nr_pages(folio))
+ continue;
+
+ s_align = ALIGN_DOWN(start, PMD_SIZE);
+ e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
+
+ /* folio doesn't cross page table */
+ if (s_align != e_align)
+ continue;
+restore_mlock:
+ /* Restore the mlock which got missed */
+ mlock_vma_folio(folio, vma);
+ page_vma_mapped_walk_done(&pvmw);
+ pra->vm_flags |= VM_LOCKED;
+ return false; /* To break the loop */
}
/*
@@ -914,23 +926,6 @@ static bool folio_referenced_one(struct folio *folio,
pra->mapcount--;
}
- if ((vma->vm_flags & VM_LOCKED) &&
- folio_test_large(folio) &&
- folio_within_vma(folio, vma)) {
- unsigned long s_align, e_align;
-
- s_align = ALIGN_DOWN(start, PMD_SIZE);
- e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
-
- /* folio doesn't cross page table boundary and fully mapped */
- if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
- /* Restore the mlock which got missed */
- mlock_vma_folio(folio, vma);
- pra->vm_flags |= VM_LOCKED;
- return false; /* To break the loop */
- }
- }
-
if (referenced)
folio_clear_idle(folio);
if (folio_test_clear_young(folio))
--
Kiryl Shutsemau / Kirill A. Shutemov
next prev parent reply other threads:[~2025-09-18 13:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-18 11:21 [PATCH 0/2] mm: " kirill
2025-09-18 11:21 ` [PATCH 1/2] mm/fault: Try to map the entire file folio in finish_fault() kirill
2025-09-18 11:30 ` David Hildenbrand
2025-09-18 13:13 ` Lorenzo Stoakes
2025-09-19 2:52 ` Baolin Wang
2025-09-18 11:21 ` [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios kirill
2025-09-18 11:31 ` David Hildenbrand
2025-09-18 13:10 ` Lorenzo Stoakes
2025-09-18 13:48 ` Kiryl Shutsemau [this message]
2025-09-18 14:58 ` Kiryl Shutsemau
2025-09-18 14:38 ` Johannes Weiner
2025-09-18 19:32 ` Shakeel Butt
2025-09-18 13:14 ` [PATCH 0/2] mm: " Lorenzo Stoakes
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ndryzvkmrfidmjgj4tl27hk2kmspmb42mxl2smuwgmp5hyedzh@thggle3dhp5j \
--to=kirill@shutemov.name \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=riel@surriel.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox