Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kiryl Shutsemau <kirill@shutemov.name>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 Yin Fengwei <fengwei.yin@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	 David Hildenbrand <david@redhat.com>,
	Hugh Dickins <hughd@google.com>,
	 Matthew Wilcox <willy@infradead.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	 Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>, Rik van Riel <riel@surriel.com>,
	 Harry Yoo <harry.yoo@oracle.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios
Date: Thu, 18 Sep 2025 14:48:20 +0100	[thread overview]
Message-ID: <ndryzvkmrfidmjgj4tl27hk2kmspmb42mxl2smuwgmp5hyedzh@thggle3dhp5j> (raw)
In-Reply-To: <429481ef-6527-40f5-b7a0-c9370fd1e374@lucifer.local>

On Thu, Sep 18, 2025 at 02:10:05PM +0100, Lorenzo Stoakes wrote:
> On Thu, Sep 18, 2025 at 12:21:57PM +0100, kirill@shutemov.name wrote:
> > From: Kiryl Shutsemau <kas@kernel.org>
> >
> > The kernel currently does not mlock large folios when adding them to
> > rmap, stating that it is difficult to confirm that the folio is fully
> > mapped and safe to mlock it. However, nowadays the caller passes a
> > number of pages of the folio that are getting mapped, making it easy to
> > check if the entire folio is mapped to the VMA.
> >
> > mlock the folio on rmap if it is fully mapped to the VMA.
> >
> > Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
> 
> The logic looks good to me, so:
> 
> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
> 
> But note the comments below.
> 
> > ---
> >  mm/rmap.c | 13 ++++---------
> >  1 file changed, 4 insertions(+), 9 deletions(-)
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 568198e9efc2..ca8d4ef42c2d 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1478,13 +1478,8 @@ static __always_inline void __folio_add_anon_rmap(struct folio *folio,
> >  				 PageAnonExclusive(cur_page), folio);
> >  	}
> >
> > -	/*
> > -	 * For large folio, only mlock it if it's fully mapped to VMA. It's
> > -	 * not easy to check whether the large folio is fully mapped to VMA
> > -	 * here. Only mlock normal 4K folio and leave page reclaim to handle
> > -	 * large folio.
> > -	 */
> > -	if (!folio_test_large(folio))
> > +	/* Only mlock it if the folio is fully mapped to the VMA */
> > +	if (folio_nr_pages(folio) == nr_pages)
> 
> OK this is nice, as partially mapped will have folio_nr_pages() != nr_pages. So
> logically this must be correct.
> 
> >  		mlock_vma_folio(folio, vma);
> >  }
> >
> > @@ -1620,8 +1615,8 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio,
> >  	nr = __folio_add_rmap(folio, page, nr_pages, vma, level, &nr_pmdmapped);
> >  	__folio_mod_stat(folio, nr, nr_pmdmapped);
> >
> > -	/* See comments in folio_add_anon_rmap_*() */
> > -	if (!folio_test_large(folio))
> > +	/* Only mlock it if the folio is fully mapped to the VMA */
> > +	if (folio_nr_pages(folio) == nr_pages)
> >  		mlock_vma_folio(folio, vma);
> >  }
> >
> > --
> > 2.50.1
> >
> 
> I see in try_to_unmap_one():
> 
> 		if (!(flags & TTU_IGNORE_MLOCK) &&
> 		    (vma->vm_flags & VM_LOCKED)) {
> 			/* Restore the mlock which got missed */
> 			if (!folio_test_large(folio))
> 				mlock_vma_folio(folio, vma);
> 
> Do we care about this?
> 
> It seems like folio_referenced_one() does some similar logic:
> 
> 		if (vma->vm_flags & VM_LOCKED) {
> 			if (!folio_test_large(folio) || !pvmw.pte) {
> 				/* Restore the mlock which got missed */
> 				mlock_vma_folio(folio, vma);
> 				page_vma_mapped_walk_done(&pvmw);
> 				pra->vm_flags |= VM_LOCKED;
> 				return false; /* To break the loop */
> 			}
> 
> ...
> 
> 	if ((vma->vm_flags & VM_LOCKED) &&
> 			folio_test_large(folio) &&
> 			folio_within_vma(folio, vma)) {
> 		unsigned long s_align, e_align;
> 
> 		s_align = ALIGN_DOWN(start, PMD_SIZE);
> 		e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
> 
> 		/* folio doesn't cross page table boundary and fully mapped */
> 		if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
> 			/* Restore the mlock which got missed */
> 			mlock_vma_folio(folio, vma);
> 			pra->vm_flags |= VM_LOCKED;
> 			return false; /* To break the loop */
> 		}
> 	}
> 
> So maybe we could do something similar in try_to_unmap_one()?

Hm. This seems to be buggy to me.

mlock_vma_folio() has to be called with ptl taken, no? It gets dropped
by this place.

+Fengwei.

I think this has to be handled inside the loop once ptes reaches
folio_nr_pages(folio).

Maybe something like this (untested):

diff --git a/mm/rmap.c b/mm/rmap.c
index ca8d4ef42c2d..719f1c99470c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -858,17 +858,13 @@ static bool folio_referenced_one(struct folio *folio,
 		address = pvmw.address;
 
 		if (vma->vm_flags & VM_LOCKED) {
-			if (!folio_test_large(folio) || !pvmw.pte) {
-				/* Restore the mlock which got missed */
-				mlock_vma_folio(folio, vma);
-				page_vma_mapped_walk_done(&pvmw);
-				pra->vm_flags |= VM_LOCKED;
-				return false; /* To break the loop */
-			}
+			unsigned long s_align, e_align;
+
+			/* Small folio or PMD-mapped large folio */
+			if (!folio_test_large(folio) || !pvmw.pte)
+				goto restore_mlock;
+
 			/*
-			 * For large folio fully mapped to VMA, will
-			 * be handled after the pvmw loop.
-			 *
 			 * For large folio cross VMA boundaries, it's
 			 * expected to be picked  by page reclaim. But
 			 * should skip reference of pages which are in
@@ -878,7 +874,23 @@ static bool folio_referenced_one(struct folio *folio,
 			 */
 			ptes++;
 			pra->mapcount--;
-			continue;
+
+			/* Folio must be fully mapped to be mlocked */
+			if (ptes != folio_nr_pages(folio))
+				continue;
+
+			s_align = ALIGN_DOWN(start, PMD_SIZE);
+			e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
+
+			/* folio doesn't cross page table */
+			if (s_align != e_align)
+				continue;
+restore_mlock:
+			/* Restore the mlock which got missed */
+			mlock_vma_folio(folio, vma);
+			page_vma_mapped_walk_done(&pvmw);
+			pra->vm_flags |= VM_LOCKED;
+			return false; /* To break the loop */
 		}
 
 		/*
@@ -914,23 +926,6 @@ static bool folio_referenced_one(struct folio *folio,
 		pra->mapcount--;
 	}
 
-	if ((vma->vm_flags & VM_LOCKED) &&
-			folio_test_large(folio) &&
-			folio_within_vma(folio, vma)) {
-		unsigned long s_align, e_align;
-
-		s_align = ALIGN_DOWN(start, PMD_SIZE);
-		e_align = ALIGN_DOWN(start + folio_size(folio) - 1, PMD_SIZE);
-
-		/* folio doesn't cross page table boundary and fully mapped */
-		if ((s_align == e_align) && (ptes == folio_nr_pages(folio))) {
-			/* Restore the mlock which got missed */
-			mlock_vma_folio(folio, vma);
-			pra->vm_flags |= VM_LOCKED;
-			return false; /* To break the loop */
-		}
-	}
-
 	if (referenced)
 		folio_clear_idle(folio);
 	if (folio_test_clear_young(folio))
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

next prev parent reply	other threads:[~2025-09-18 13:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-18 11:21 [PATCH 0/2] mm: " kirill
2025-09-18 11:21 ` [PATCH 1/2] mm/fault: Try to map the entire file folio in finish_fault() kirill
2025-09-18 11:30   ` David Hildenbrand
2025-09-18 13:13     ` Lorenzo Stoakes
2025-09-19  2:52       ` Baolin Wang
2025-09-18 11:21 ` [PATCH 2/2] mm/rmap: Improve mlock tracking for large folios kirill
2025-09-18 11:31   ` David Hildenbrand
2025-09-18 13:10   ` Lorenzo Stoakes
2025-09-18 13:48     ` Kiryl Shutsemau [this message]
2025-09-18 14:58       ` Kiryl Shutsemau
2025-09-18 14:38   ` Johannes Weiner
2025-09-18 19:32   ` Shakeel Butt
2025-09-18 13:14 ` [PATCH 0/2] mm: " Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ndryzvkmrfidmjgj4tl27hk2kmspmb42mxl2smuwgmp5hyedzh@thggle3dhp5j \
    --to=kirill@shutemov.name \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox