linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	James Houghton <jthoughton@google.com>,
	Peter Xu <peterx@redhat.com>, Yang Shi <shy828301@gmail.com>,
	Vishal Moola <vishal.moola@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	Muchun Song <songmuchun@bytedance.com>,
	stable@vger.kernel.org
Subject: Re: [PATCH 1/2] mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps
Date: Fri, 3 Feb 2023 14:40:55 +0100	[thread overview]
Message-ID: <Y90O5+UVYaaN1U3y@dhcp22.suse.cz> (raw)
In-Reply-To: <Y9rUHw2kuSwg2ntI@monkey>

On Wed 01-02-23 13:05:35, Mike Kravetz wrote:
> On 02/01/23 08:47, Michal Hocko wrote:
> > On Mon 30-01-23 14:08:47, Mike Kravetz wrote:
> > > On 01/30/23 13:36, Michal Hocko wrote:
> > > > On Fri 27-01-23 17:12:05, Mike Kravetz wrote:
> > > > > On 01/27/23 15:04, Andrew Morton wrote:
> > > > > > On Fri, 27 Jan 2023 17:23:39 +0100 David Hildenbrand <david@redhat.com> wrote:
> > > > > > > On 26.01.23 23:27, Mike Kravetz wrote:
> > > > 
> > > > Yes, this looks simple enough. My only concern would be that this
> > > > special casing might be required on other places which is hard to
> > > > evaluate. I thought PSS reported by smaps would be broken as well but it
> > > > seems pss is not really accounted for hugetlb mappings at all.
> > > > 
> > > > Have you tried to look into {in,de}creasing the map count of the page when
> > > > the the pte is {un}shared for it?
> > > 
> > > A quick thought is that it would not be too difficult.  It would need
> > > to include the following:
> > > - At PMD share time in huge_pmd_share(),
> > >   Go through all entries in the PMD, and increment map and ref count for
> > >   all referenced pages.  huge_pmd_share is just adding another sharing
> > >   process.
> > > - At PMD unshare time in huge_pmd_unshare(),
> > >   Go through all entries in the PMD, and decrement map and ref count for
> > >   all referenced pages.  huge_pmd_unshare is just removing one sharing
> > >   process.
> > > - At page fault time, check if we are adding a new entry to a shared PMD.
> > >   If yes, add 'num_of_sharing__processes - 1' to the ref and map count.
> > > 
> > > In each of the above operations, we are holding the PTL lock (which is
> > > really the split/PMD lock) so synchronization should not be an issue.
> > > 
> > > Although I mention processes sharing the PMD above, it is really mappings/vmas
> > > sharing the PMD.  You could have two mappings of the same object in the same
> > > process sharing PMDs.
> > > 
> > > I'll code this up and see how it looks.
> > 
> > Thanks!
> >  
> > > However, unless you have an objection I would prefer the simple patches
> > > move forward, especially for stable backports.
> > 
> > Yes, the current patch is much simpler and more suitable for stable
> > backports. If the explicit map count modifications are not all that
> > terrible then this would sound like a more appropriate long term plan
> > though.
> 
> The approach mentioned above seems to be simple enough.  Patch is below.
> 
> I 'tested' with the same method and tests used to measure fault scalabilty
> when developing vma based locking [1].  I figured this would be a good stress
> of the share, unshare and fault paths.  With the patch, we are doing more
> with the page table lock held, so I expected to see a little difference
> in scalability, but not as much as actually measured:
> 
> 				next-20230131
> test		instances	unmodified	patched
> --------------------------------------------------------------------------
> Combined faults 24		61888.4		58314.8
> Combined forks  24		  157.3		  130.1

So faults are 6 % slower while forks are hit by 18%. This is quite a
lot and more than I expected. pmd sharing shouldn't really be a common
operation AFAICS. It should only happen with the first mapping in the
pud (so once every 1GB/2MB faults for fully populated mappings).

It would be good to know whether this is purely lock contention based
or the additional work in each #pf and unmapping makes a big impact as
well.

> These tests could seem a bit like a micro-benchmark targeting these code
> paths.  However, I put them together based on the description of a
> customer workload that prompted the vma based locking work.  And, performance
> of these tests seems to reflect performance of their workloads.
> 
> This extra overhead is the cost needed to make shared PMD map counts be
> accurate and in line with what is normal and expected.  I think it is
> worth the cost.  Other opinions?  Of course, the patch below may have
> issues so please take a look.

If 18% slowdown really reflects a real workload then this might just be
too expensive I am afraid.

> [1] https://lore.kernel.org/linux-mm/20220914221810.95771-1-mike.kravetz@oracle.com/
> 
> 
> >From bff5a717521f96b0e5075ac4b5a1ef84a3589b7e Mon Sep 17 00:00:00 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Mon, 30 Jan 2023 20:14:14 -0800
> Subject: [PATCH] hugetlb: Adjust hugetlbp page ref/map counts for PMD sharing
> 
> When hugetlb PMDS are shared, the sharing code simply adds the shared
> PMD to another processes page table.  It will not update the ref/map
> counts of pages referenced by the shared PMD.  As a result, the ref/map
> count will only reflect when the page was added to the shared PMD.  Even
> though the shared PMD may be in MANY process page tables, ref/map counts
> on the pages will only appear to be that of a single process.
> 
> Update ref/map counts to take PMD sharing into account.  This is done in
> three distinct places:
> 1) At PMD share time in huge_pmd_share(),
>    Go through all entries in the PMD, and increment map and ref count for
>    all referenced pages.  huge_pmd_share is just adding another use and
>    mapping of each page.
> 2) At PMD unshare time in huge_pmd_unshare(),
>    Go through all entries in the PMD, and decrement map and ref count for
>    all referenced pages.  huge_pmd_unshare is just removing one use and
>    mapping of each page.
> 3) When faulting in a new hugetlb page,
>    Check if we are adding a new entry to a shared PMD.  If yes, add
>    'num_of_sharing__processes - 1' to the ref and map count.

Honestly, I didn't really have much time to think about this very deeply
so I might be missing something here. The patch seems correct to me.
adjust_shared_pmd_page_counts's delta parameter is confusing because it
implies a delta adjustments while it justs want to be "bool increase"
instead.

Thanks for looking into this Mike!
[...]
> +static void adjust_shared_pmd_page_counts(pmd_t *pmd_start, int delta)
> +{
> +	struct folio *folio;
> +	struct page *page;
> +	pte_t *ptep, pte;
> +	int i;
> +
> +	for (i= 0; i < PTRS_PER_PMD; i++) {
> +		ptep = (pte_t *)(pmd_start + i);
> +
> +		pte = huge_ptep_get(ptep);
> +		if (huge_pte_none(pte) || !pte_present(pte))
> +			continue;
> +
> +		page = pte_page(pte);
> +		folio = (struct folio *)page;
> +		if (delta > 0) {
> +			folio_get(folio);
> +			atomic_inc(&folio->_entire_mapcount);
> +		} else {
> +			folio_put(folio);
> +			atomic_dec(&folio->_entire_mapcount);
> +		}
> +	}
> +}
[...]

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2023-02-03 13:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26 22:27 [PATCH 0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs Mike Kravetz
2023-01-26 22:27 ` [PATCH 1/2] mm: hugetlb: proc: check for hugetlb shared PMD in /proc/PID/smaps Mike Kravetz
2023-01-27 16:23   ` David Hildenbrand
2023-01-27 23:04     ` Andrew Morton
2023-01-28  1:12       ` Mike Kravetz
2023-01-30 12:36         ` Michal Hocko
2023-01-30 22:08           ` Mike Kravetz
2023-02-01  7:47             ` Michal Hocko
2023-02-01 21:05               ` Mike Kravetz
2023-02-03 13:40                 ` Michal Hocko [this message]
2023-02-03 20:16                   ` Mike Kravetz
2023-02-13 18:01                     ` Michal Hocko
2023-02-14  1:40                       ` Mike Kravetz
2023-01-26 22:27 ` [PATCH 2/2] migrate: hugetlb: Check for hugetlb shared PMD in node migration Mike Kravetz
2023-01-27 16:23   ` David Hildenbrand
2023-01-26 22:47 ` [PATCH 0/2] Fixes for hugetlb mapcount at most 1 for shared PMDs Andrew Morton
2023-01-26 22:48 ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y90O5+UVYaaN1U3y@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=stable@vger.kernel.org \
    --cc=vishal.moola@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox