Re: [PATCH v1] mm/khugepaged: replace page_mapcount() check by folio_likely_mapped_shared()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: John Hubbard <jhubbard@nvidia.com>
To: David Hildenbrand <david@redhat.com>, <linux-kernel@vger.kernel.org>
Cc: <linux-mm@kvack.org>, <linux-doc@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Zi Yan <ziy@nvidia.com>, "Yang Shi" <yang.shi@linux.alibaba.com>,
	Ryan Roberts <ryan.roberts@arm.com>
Subject: Re: [PATCH v1] mm/khugepaged: replace page_mapcount() check by folio_likely_mapped_shared()
Date: Wed, 24 Apr 2024 21:00:50 -0700	[thread overview]
Message-ID: <73de5556-e574-4ed7-a7fb-c4648e46206b@nvidia.com> (raw)
In-Reply-To: <20240424122630.495788-1-david@redhat.com>

On 4/24/24 5:26 AM, David Hildenbrand wrote:

Hi David,

Overall, I think this looks good, just a few questions, and of course
some silly documentation nits.


> We want to limit the use of page_mapcount() to places where absolutely
> required, to prepare for kernel configs where we won't keep track of
> per-page mapcounts in large folios.


Just curious, can you elaborate on the motivation? I probably missed
the discussions that explained why page_mapcount() in large folios
is not desirable. Are we getting rid of a field in struct page/folio?
Some other reason?

...
> To summarize, in the common case, this change is not expected to matter
> much. The more common application of khugepaged operates on

Based on the diffs (and some quick hacks for testing that I ran), I agree.

...
> 
> This really needs the folio_likely_mapped_shared() optimization [1] that
> resides in mm-unstable, I think, to reduce "false negatives".
> 
> The khugepage MM selftests keep working as expected, including:
> 
> 	Run test: collapse_max_ptes_shared (khugepaged:anon)
> 	Allocate huge page... OK
> 	Share huge page over fork()... OK
> 	Trigger CoW on page 255 of 512... OK
> 	Maybe collapse with max_ptes_shared exceeded.... OK
> 	Trigger CoW on page 256 of 512... OK
> 	Collapse with max_ptes_shared PTEs shared.... OK
> 	Check if parent still has huge page... OK

Well, a word of caution! These tests do not (yet) cover either of
the interesting new cases that folio_likely_mapped_shared() presents:
KSM or hugetlbfs interactions. In other words, false positives.


> 
> Where we check that collapsing in the parent behaves as expected after
> COWing a lot of pages in the parent: a sane scenario that is essentially
> unchanged and which does not depend on any action in the child process
> (compared to the cases discussed in (B) above).
> 
> [1] https://lkml.kernel.org/r/20240409192301.907377-6-david@redhat.com
> 
> ---
>   Documentation/admin-guide/mm/transhuge.rst |  3 ++-
>   mm/khugepaged.c                            | 22 +++++++++++++++-------
>   2 files changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index f82300b9193fe..076443cc10a6c 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -278,7 +278,8 @@ collapsed, resulting fewer pages being collapsed into
>   THPs, and lower memory access performance.
>   
>   ``max_ptes_shared`` specifies how many pages can be shared across multiple
> -processes. Exceeding the number would block the collapse::
> +processes. khugepaged might treat pages of THPs as shared if any page of
> +that THP is shared. Exceeding the number would block the collapse::
>   
>   	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
>   
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 2f73d2aa9ae84..cf518fc440982 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -583,7 +583,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
>   		folio = page_folio(page);
>   		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
>   
> -		if (page_mapcount(page) > 1) {
> +		/* See hpage_collapse_scan_pmd(). */

Why? Because it has an identical code snippet?

I thought about asking if we should factor that out, just to
keep the policy the same. Thoughts?

> +		if (folio_likely_mapped_shared(folio)) {
>   			++shared;
>   			if (cc->is_khugepaged &&
>   			    shared > khugepaged_max_ptes_shared) {
> @@ -1317,8 +1318,20 @@ static int hpage_collapse_scan_pmd(struct mm_struct *mm,
>   			result = SCAN_PAGE_NULL;
>   			goto out_unmap;
>   		}
> +		folio = page_folio(page);
>   
> -		if (page_mapcount(page) > 1) {
> +		if (!folio_test_anon(folio)) {
> +			result = SCAN_PAGE_ANON;
> +			goto out_unmap;
> +		}
> +
> +		/*
> +		 * We treat a single page as shared if any part of the THP
> +		 * is shared. "False negatives" from
> +		 * folio_likely_mapped_shared() are not expected to matter
> +		 * much in practice.

Maybe delete that second sentence? It is not really pulling its
weight here. :)


thanks,
-- 
John Hubbard
NVIDIA

next prev parent reply	other threads:[~2024-04-25  4:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-24 12:26 David Hildenbrand
2024-04-24 16:28 ` Yang Shi
2024-04-24 16:36   ` David Hildenbrand
2024-04-25  4:00 ` John Hubbard [this message]
2024-04-25  4:17   ` Matthew Wilcox
2024-04-25  5:40     ` John Hubbard
     [not found]       ` <7273b0d6-06e7-4741-b77b-b49949c46d63@redhat.com>
2024-04-26  1:23         ` John Hubbard
2024-04-26  6:57           ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73de5556-e574-4ed7-a7fb-c4648e46206b@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox