Re: [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	torvalds@linux-foundation.org, akpm@linux-foundation.org,
	luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
	willy@infradead.org, mgorman@suse.de, rostedt@goodmis.org,
	tglx@linutronix.de, vincent.guittot@linaro.org,
	jon.grimm@amd.com, bharata@amd.com, boris.ostrovsky@oracle.com,
	konrad.wilk@oracle.com
Subject: Re: [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible
Date: Thu, 06 Apr 2023 10:00:40 -0700	[thread overview]
Message-ID: <87355dlzd3.fsf@oracle.com> (raw)
In-Reply-To: <20230405202704.GF365912@hirez.programming.kicks-ass.net>


Peter Zijlstra <peterz@infradead.org> writes:

> On Sun, Apr 02, 2023 at 10:22:33PM -0700, Ankur Arora wrote:
>> clear_contig_region() can be used to clear up to a huge-page (2MB/1GB)
>> chunk Allow preemption in the irqentry_exit path to make sure we don't
>> hold on to the CPU for an arbitrarily long period.
>>
>> Performance: vm-scalability/case-anon-w-seq-hugetlb mmaps an anonymous
>> hugetlb-2mb region, and then writes sequentially to the region, demand
>> faulting pages on the way.
>>
>> This test, with a CONFIG_VOLUNTARY config shows the effects of this
>> change: stime drops (~18% on Icelakex, ~5% on Milan), while the utime
>> goes up (~15% on Icelakex, ~13% on Milan.)
>>
>>   *Icelakex*                  mm/clear_huge_page   x86/clear_huge_page   change
>>   (mem=4GB/task, tasks=128)
>>
>>   stime                           293.02 +- .49%        239.39 +- .83%   -18.30%
>>   utime                           440.11 +- .28%        508.74 +- .60%   +15.59%
>>   wall-clock                        5.96 +- .33%          6.27 +-2.23%   + 5.20%
>>
>>
>>
>>   *Milan*                     mm/clear_huge_page   x86/clear_huge_page   change
>>   (mem=1GB/task, tasks=512)
>>
>>   stime                          490.95 +- 3.55%       466.90 +- 4.79%   - 4.89%
>>   utime                          276.43 +- 2.85%       311.97 +- 5.15%   +12.85%
>>   wall-clock                       3.74 +- 6.41%         3.58 +- 7.82%   - 4.27%
>>
>> The drop in stime is due to REP; STOS being more efficient for bigger
>> extents.  The increase in utime is due to cache effects of that change:
>> mm/clear_huge_page() clears page-at-a-time, while narrowing towards the
>> faulting page; while x86/clear_huge_page only optimizes for cache
>> locality in the local neighbourhood of the faulting address.
>>
>> This effect on utime is visible via the increased L1-dcache-load-misses
>> and LLC-load* and an increased backend boundedness for perf user-stat
>> --all-user on Icelakex. The effect is slight but given the heavy cache
>> pressure generated by the test, shows up in the drop in user IPC:
>>
>>     -  9,455,243,414,829      instructions                     #    2.75  insn per cycle              ( +- 14.14% )  (46.17%)
>>     -  2,367,920,864,112      L1-dcache-loads                  #    1.054 G/sec                       ( +- 14.14% )  (69.24%)
>>     -     42,075,182,813      L1-dcache-load-misses            #    2.96% of all L1-dcache accesses   ( +- 14.14% )  (69.24%)
>>     -         20,365,688      LLC-loads                        #    9.064 K/sec                       ( +- 13.98% )  (69.24%)
>>     -            890,382      LLC-load-misses                  #    7.18% of all LL-cache accesses    ( +- 14.91% )  (69.24%)
>>
>>     +  9,467,796,660,698      instructions                     #    2.37  insn per cycle              ( +- 14.14% )  (46.16%)
>>     +  2,369,973,307,561      L1-dcache-loads                  #    1.027 G/sec                       ( +- 14.14% )  (69.24%)
>>     +     42,155,621,201      L1-dcache-load-misses            #    2.96% of all L1-dcache accesses   ( +- 14.14% )  (69.24%)
>>     +         22,116,300      LLC-loads                        #    9.588 K/sec                       ( +- 14.20% )  (69.24%)
>>     +          1,355,607      LLC-load-misses                  #   10.29% of all LL-cache accesses    ( +- 15.49% )  (69.25%)
>>
>> Given the fact that the stime improves for all loads using this path,
>> while the utime drop is load dependent add this change.
>
> Either I really need sleep, or *NONE* of the above is actually relevant
> to what the patch below actually does!

Yeah, you are right about the relevance.

I wanted to provide two sets of stats: the raw memory BW stats and the
relevant vm-scalability workload. The vm-scalability workload needs a
more reasonable scheduling model than what's present until this patch
and so it seemed to make sense to put here for that reason.
But yeah it doesn't really fit here.

> The above talks about the glories of using large clears, while the patch
> allows reschedules which are about latency.

Yeah, let me find a more reasonable way to present these.

Ankur

>> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com>
>> ---
>>  arch/x86/mm/hugetlbpage.c | 10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/arch/x86/mm/hugetlbpage.c b/arch/x86/mm/hugetlbpage.c
>> index 4294b77c4f18..c8564b0552e5 100644
>> --- a/arch/x86/mm/hugetlbpage.c
>> +++ b/arch/x86/mm/hugetlbpage.c
>> @@ -158,7 +158,17 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr,
>>  static void clear_contig_region(struct page *page, unsigned long vaddr,
>>  				unsigned int npages)
>>  {
>> +	might_sleep();
>> +
>> +	/*
>> +	 * We might be clearing a large region.
>> +	 * Allow rescheduling.
>> +	 */
>> +	allow_resched();
>>  	clear_user_pages(page_address(page), vaddr, page, npages);
>> +	disallow_resched();
>> +
>> +	cond_resched();
>>  }
>>
>>  void clear_huge_page(struct page *page,
>> --
>> 2.31.1
>>


--
ankur

next prev parent reply	other threads:[~2023-04-06 17:00 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03  5:22 [PATCH 0/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-04-03  5:22 ` [PATCH 1/9] huge_pages: get rid of process_huge_page() Ankur Arora
2023-04-03  5:22 ` [PATCH 2/9] huge_page: get rid of {clear,copy}_subpage() Ankur Arora
2023-04-03  5:22 ` [PATCH 3/9] huge_page: allow arch override for clear/copy_huge_page() Ankur Arora
2023-04-03  5:22 ` [PATCH 4/9] x86/clear_page: parameterize clear_page*() to specify length Ankur Arora
2023-04-06  8:19   ` Peter Zijlstra
2023-04-07  3:03     ` Ankur Arora
2023-04-03  5:22 ` [PATCH 5/9] x86/clear_pages: add clear_pages() Ankur Arora
2023-04-06  8:23   ` Peter Zijlstra
2023-04-07  0:50     ` Ankur Arora
2023-04-07 10:34       ` Peter Zijlstra
2023-04-09 13:26         ` Matthew Wilcox
2023-04-03  5:22 ` [PATCH 6/9] mm/clear_huge_page: use multi-page clearing Ankur Arora
2023-04-03  5:22 ` [PATCH 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-04-05 20:07   ` Peter Zijlstra
2023-04-03  5:22 ` [PATCH 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-04-04  9:38   ` Thomas Gleixner
2023-04-05  5:29     ` Ankur Arora
2023-04-05 20:22   ` Peter Zijlstra
2023-04-06 16:56     ` Ankur Arora
2023-04-06 20:13       ` Peter Zijlstra
2023-04-06 20:16         ` Peter Zijlstra
2023-04-07  2:29         ` Ankur Arora
2023-04-07 10:23           ` Peter Zijlstra
2023-04-03  5:22 ` [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-04-05 20:27   ` Peter Zijlstra
2023-04-06 17:00     ` Ankur Arora [this message]
2023-04-05 19:48 ` [PATCH 0/9] x86/clear_huge_page: multi-page clearing Raghavendra K T
2023-04-08 22:46   ` Ankur Arora
2023-04-10  6:26     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87355dlzd3.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox