linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Ankur Arora <ankur.a.arora@oracle.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	akpm@linux-foundation.org, luto@kernel.org, bp@alien8.de,
	dave.hansen@linux.intel.com, hpa@zytor.com, mingo@redhat.com,
	juri.lelli@redhat.com, vincent.guittot@linaro.org,
	willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
	rostedt@goodmis.org, tglx@linutronix.de, jon.grimm@amd.com,
	bharata@amd.com, raghavendra.kt@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com
Subject: Re: [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing
Date: Tue, 05 Sep 2023 15:14:59 -0700	[thread overview]
Message-ID: <8734zsb730.fsf@oracle.com> (raw)
In-Reply-To: <20230903081404.hmkhnrk243h2nuoa@f>


Mateusz Guzik <mjguzik@gmail.com> writes:

> On Wed, Aug 30, 2023 at 11:49:49AM -0700, Ankur Arora wrote:
>> This series adds a multi-page clearing primitive, clear_pages(),
>> which enables more effective use of x86 string instructions by
>> advertising the real region-size to be cleared.
>>
>> Region-size can be used as a hint by uarchs to optimize the
>> clearing.
>>
>> Also add allow_resched() which marks a code-section as allowing
>> rescheduling in the irqentry_exit path. This allows clear_pages()
>> to get by without having to call cond_sched() periodically.
>> (preempt_model_full() already handles this via
>> irqentry_exit_cond_resched(), so we handle this similarly for
>> preempt_model_none() and preempt_model_voluntary().)
>>
>> Performance
>> ==
>>
>> With this demand fault performance gets a decent increase:
>>
>>   *Milan*     mm/clear_huge_page   x86/clear_huge_page   change
>>                           (GB/s)                (GB/s)
>>
>>   pg-sz=2MB                14.55                 19.29    +32.5%
>>   pg-sz=1GB                19.34                 49.60   +156.4%
>>
>> Milan (and some other AMD Zen uarchs tested) take advantage of the
>> hint to elide cacheline allocation for pg-sz=1GB. The cut-off for
>> this optimization seems to be at around region-size > LLC-size so
>> the pg-sz=2MB load still allocates cachelines.
>>
>
> Have you benchmarked clzero? It is an AMD-specific instruction issuing
> non-temporal stores. It is definitely something to try out for 1G pages.

Thanks for the suggestion. Been a little while, but see the numbers here:
https://lore.kernel.org/linux-mm/20220606203725.1313715-15-ankur.a.arora@oracle.com/

> One would think rep stosq has to be at least not worse since the CPU is
> explicitly told what to do and is free to optimize it however it sees
> fit, but the rep prefix has a long history of underperforming.

I agree that historically REP variants have been all over the place.
But, if you look at the numbers, REP; STOS and CLZERO are pretty close,
at least for current generation of AMD uarchs.

Now, current uarch performance is no guarantee for future uarchs, but
if the kernel uses REP; STOS in performance paths, then hopefully
they'll also shows up in internal CPU regression benchmarks which might
mean that the high performance persists.

That said, I think using CLZERO/MOVNT is a good idea -- though, as a
fallback option or where it is better to send an explicit hint while
say, clearing a 2MB region.


Thanks
Ankur

> I'm not saying it is going to be better, but that this should be tested,
> albeit one can easily argue this can be done at a later date.
>
>
> I would do it myself but my access to AMD CPUs is limited.
>
>>
>>   *Icelakex*  mm/clear_huge_page   x86/clear_huge_page   change
>>                           (GB/s)                (GB/s)
>>
>>   pg-sz=2MB                 9.19                 12.94   +40.8%
>>   pg-sz=1GB                 9.36                 12.97   +38.5%
>>
>> Icelakex sees a decent improvement in performance but for both
>> region-sizes does continue to allocate cachelines.
>>
>>
>> Negative: there is, a downside to clearing in larger chunks: the
>> current approach clears page-at-a-time, narrowing towards
>> the faulting subpage. This has better cache characteristics for
>> some sequential access workloads where subpages near the faulting
>> page have a greater likelihood of access.
>>
>> I'm not sure if there are real cases which care about this workload
>> but one example is the vm-scalability/case-anon-w-seq-hugetlb test.
>> This test starts a process for each online CPU, with each process
>> writing sequentially to its set of hugepages.
>>
>> The bottleneck here is the memory pipe and so the improvement in
>> stime is limited, and because the clearing is less cache-optimal
>> now, utime suffers from worse user cache misses.
>>
>>   *Icelakex*               mm/clear_huge_page  x86/clear_huge_page  change
>>   (tasks=128, mem=4GB/task)
>>
>>   stime                        286.8 +- 3.6%      243.9 +- 4.1%     -14.9%
>>   utime                        497.7 +- 4.1%      553.5 +- 2.0%     +11.2%
>>   wall-clock                     6.9 +- 2.8%        7.0 +- 1.4%     + 1.4%
>>
>>
>>   *Milan*                  mm/clear_huge_page  x86/clear_huge_page  change
>>   (mem=1GB/task, tasks=512)
>>
>>   stime                        501.3 +- 1.4%      498.0 +- 0.9%      -0.5%
>>   utime                        298.7 +- 1.1%      335.0 +- 2.2%     +12.1%
>>   wall-clock                     3.5 +- 2.8%        3.8 +- 2.6%      +8.5%
>>
>> The same test performs better if we have a smaller number of processes,
>> since there is more backend BW available, and thus the improved stime
>> compensates for the worse utime.
>>
>> This could be improved by using more circuitous chunking (somewhat
>> like this:
>> https://lore.kernel.org/lkml/20220606203725.1313715-1-ankur.a.arora@oracle.com/).
>> But I'm not sure if it is worth doing. Opinions?
>>
>> Patches
>> ==
>>
>> Patch 1, 2, 3:
>>   "mm/clear_huge_page: allow arch override for clear_huge_page()",
>>   "mm/huge_page: separate clear_huge_page() and copy_huge_page()",
>>   "mm/huge_page: cleanup clear_/copy_subpage()"
>> are minor. The first one allows clear_huge_page() to have an
>> arch specific version and the other two are mechanical cleanup
>> patches.
>>
>> Patches 3, 4, 5:
>>   "x86/clear_page: extend clear_page*() for multi-page clearing",
>>   "x86/clear_page: add clear_pages()",
>>   "x86/clear_huge_page: multi-page clearing"
>> define the x86 specific clear_pages() and clear_huge_pages().
>>
>> Patches 6, 7, 8:
>>   "sched: define TIF_ALLOW_RESCHED"
>>   "irqentry: define irqentry_exit_allow_resched()"
>> which defines allow_resched() to demarcate preemptible sections.
>>
>> This gets used in patch 9:
>>   "x86/clear_huge_page: make clear_contig_region() preemptible".
>>
>> Changelog:
>>
>> v2:
>>   - Addressed review comments from peterz, tglx.
>>   - Removed clear_user_pages(), and CONFIG_X86_32:clear_pages()
>>   - General code cleanup
>>
>> Also at:
>>   github.com/terminus/linux clear-pages.v2
>>
>> Comments appreciated!
>>
>> Ankur Arora (9):
>>   mm/clear_huge_page: allow arch override for clear_huge_page()
>>   mm/huge_page: separate clear_huge_page() and copy_huge_page()
>>   mm/huge_page: cleanup clear_/copy_subpage()
>>   x86/clear_page: extend clear_page*() for multi-page clearing
>>   x86/clear_page: add clear_pages()
>>   x86/clear_huge_page: multi-page clearing
>>   sched: define TIF_ALLOW_RESCHED
>>   irqentry: define irqentry_exit_allow_resched()
>>   x86/clear_huge_page: make clear_contig_region() preemptible
>>
>>  arch/x86/include/asm/page_64.h     |  27 +++--
>>  arch/x86/include/asm/thread_info.h |   2 +
>>  arch/x86/lib/clear_page_64.S       |  52 ++++++---
>>  arch/x86/mm/hugetlbpage.c          |  59 ++++++++++
>>  include/linux/entry-common.h       |  13 +++
>>  include/linux/sched.h              |  30 +++++
>>  kernel/entry/common.c              |  13 ++-
>>  kernel/sched/core.c                |  32 ++---
>>  mm/memory.c                        | 181 +++++++++++++++++------------
>>  9 files changed, 297 insertions(+), 112 deletions(-)
>>
>> --
>> 2.31.1
>>
>>


--
ankur


  reply	other threads:[~2023-09-05 22:15 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-30 18:49 Ankur Arora
2023-08-30 18:49 ` [PATCH v2 1/9] mm/clear_huge_page: allow arch override for clear_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 2/9] mm/huge_page: separate clear_huge_page() and copy_huge_page() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 3/9] mm/huge_page: cleanup clear_/copy_subpage() Ankur Arora
2023-09-08 13:09   ` Matthew Wilcox
2023-09-11 17:22     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 4/9] x86/clear_page: extend clear_page*() for multi-page clearing Ankur Arora
2023-09-08 13:11   ` Matthew Wilcox
2023-08-30 18:49 ` [PATCH v2 5/9] x86/clear_page: add clear_pages() Ankur Arora
2023-08-30 18:49 ` [PATCH v2 6/9] x86/clear_huge_page: multi-page clearing Ankur Arora
2023-08-31 18:26   ` kernel test robot
2023-09-08 12:38   ` Peter Zijlstra
2023-09-13  6:43   ` Raghavendra K T
2023-08-30 18:49 ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-08  7:02   ` Peter Zijlstra
2023-09-08 17:15     ` Linus Torvalds
2023-09-08 22:50       ` Peter Zijlstra
2023-09-09  5:15         ` Linus Torvalds
2023-09-09  6:39           ` Ankur Arora
2023-09-09  9:11             ` Peter Zijlstra
2023-09-09 20:04               ` Ankur Arora
2023-09-09  5:30       ` Ankur Arora
2023-09-09  9:12         ` Peter Zijlstra
2023-09-09 20:15     ` Ankur Arora
2023-09-09 21:16       ` Linus Torvalds
2023-09-10  3:48         ` Ankur Arora
2023-09-10  4:35           ` Linus Torvalds
2023-09-10 10:01             ` Ankur Arora
2023-09-10 18:32               ` Linus Torvalds
2023-09-11 15:04                 ` Peter Zijlstra
2023-09-11 16:29                   ` andrew.cooper3
2023-09-11 17:04                   ` Ankur Arora
2023-09-12  8:26                     ` Peter Zijlstra
2023-09-12 12:24                       ` Phil Auld
2023-09-12 12:33                       ` Matthew Wilcox
2023-09-18 23:42                       ` Thomas Gleixner
2023-09-19  1:57                         ` Linus Torvalds
2023-09-19  8:03                           ` Ingo Molnar
2023-09-19  8:43                             ` Ingo Molnar
2023-09-19 13:43                               ` Thomas Gleixner
2023-09-19 13:25                             ` Thomas Gleixner
2023-09-19 12:30                           ` Thomas Gleixner
2023-09-19 13:00                             ` Arches that don't support PREEMPT Matthew Wilcox
2023-09-19 13:34                               ` Geert Uytterhoeven
2023-09-19 13:37                               ` John Paul Adrian Glaubitz
2023-09-19 13:42                                 ` Peter Zijlstra
2023-09-19 13:48                                   ` John Paul Adrian Glaubitz
2023-09-19 14:16                                     ` Peter Zijlstra
2023-09-19 14:24                                       ` John Paul Adrian Glaubitz
2023-09-19 14:32                                         ` Matthew Wilcox
2023-09-19 15:31                                           ` Steven Rostedt
2023-09-20 14:38                                       ` Anton Ivanov
2023-09-21 12:20                                       ` Arnd Bergmann
2023-09-19 14:17                                     ` Thomas Gleixner
2023-09-19 14:50                                       ` H. Peter Anvin
2023-09-19 14:57                                         ` Matt Turner
2023-09-19 17:09                                         ` Ulrich Teichert
2023-09-19 17:25                                     ` Linus Torvalds
2023-09-19 17:58                                       ` John Paul Adrian Glaubitz
2023-09-19 18:31                                       ` Thomas Gleixner
2023-09-19 18:38                                         ` Steven Rostedt
2023-09-19 18:52                                           ` Linus Torvalds
2023-09-19 19:53                                             ` Thomas Gleixner
2023-09-20  7:32                                           ` Ingo Molnar
2023-09-20  7:29                                         ` Ingo Molnar
2023-09-20  8:26                                       ` Thomas Gleixner
2023-09-20 10:37                                       ` David Laight
2023-09-19 14:21                                   ` Anton Ivanov
2023-09-19 15:17                                     ` Thomas Gleixner
2023-09-19 15:21                                       ` Anton Ivanov
2023-09-19 16:22                                         ` Richard Weinberger
2023-09-19 16:41                                           ` Anton Ivanov
2023-09-19 17:33                                             ` Thomas Gleixner
2023-10-06 14:51                               ` Geert Uytterhoeven
2023-09-20 14:22                             ` [PATCH v2 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-09-20 20:51                               ` Thomas Gleixner
2023-09-21  0:14                                 ` Thomas Gleixner
2023-09-21  0:58                                 ` Ankur Arora
2023-09-21  2:12                                   ` Thomas Gleixner
2023-09-20 23:58                             ` Thomas Gleixner
2023-09-21  0:57                               ` Ankur Arora
2023-09-21  2:02                                 ` Thomas Gleixner
2023-09-21  4:16                                   ` Ankur Arora
2023-09-21 13:59                                     ` Steven Rostedt
2023-09-21 16:00                               ` Linus Torvalds
2023-09-21 22:55                                 ` Thomas Gleixner
2023-09-23  1:11                                   ` Thomas Gleixner
2023-10-02 14:15                                     ` Steven Rostedt
2023-10-02 16:13                                       ` Thomas Gleixner
2023-10-18  1:03                                     ` Paul E. McKenney
2023-10-18 12:09                                       ` Ankur Arora
2023-10-18 17:51                                         ` Paul E. McKenney
2023-10-18 22:53                                           ` Thomas Gleixner
2023-10-18 23:25                                             ` Paul E. McKenney
2023-10-18 13:16                                       ` Thomas Gleixner
2023-10-18 14:31                                         ` Steven Rostedt
2023-10-18 17:55                                           ` Paul E. McKenney
2023-10-18 18:00                                             ` Steven Rostedt
2023-10-18 18:13                                               ` Paul E. McKenney
2023-10-19 12:37                                                 ` Daniel Bristot de Oliveira
2023-10-19 17:08                                                   ` Paul E. McKenney
2023-10-18 17:19                                         ` Paul E. McKenney
2023-10-18 17:41                                           ` Steven Rostedt
2023-10-18 17:59                                             ` Paul E. McKenney
2023-10-18 20:15                                           ` Ankur Arora
2023-10-18 20:42                                             ` Paul E. McKenney
2023-10-19  0:21                                           ` Thomas Gleixner
2023-10-19 19:13                                             ` Paul E. McKenney
2023-10-20 21:59                                               ` Paul E. McKenney
2023-10-20 22:56                                               ` Ankur Arora
2023-10-20 23:36                                                 ` Paul E. McKenney
2023-10-21  1:05                                                   ` Ankur Arora
2023-10-21  2:08                                                     ` Paul E. McKenney
2023-10-24 12:15                                               ` Thomas Gleixner
2023-10-24 18:59                                                 ` Paul E. McKenney
2023-09-23 22:50                             ` Thomas Gleixner
2023-09-24  0:10                               ` Thomas Gleixner
2023-09-24  7:19                               ` Matthew Wilcox
2023-09-24  7:55                                 ` Thomas Gleixner
2023-09-24 10:29                                   ` Matthew Wilcox
2023-09-25  0:13                               ` Ankur Arora
2023-10-06 13:01                             ` Geert Uytterhoeven
2023-09-19  7:21                         ` Ingo Molnar
2023-09-19 19:05                         ` Ankur Arora
2023-10-24 14:34                         ` Steven Rostedt
2023-10-25  1:49                           ` Steven Rostedt
2023-10-26  7:50                           ` Sergey Senozhatsky
2023-10-26 12:48                             ` Steven Rostedt
2023-09-11 16:48             ` Steven Rostedt
2023-09-11 20:50               ` Linus Torvalds
2023-09-11 21:16                 ` Linus Torvalds
2023-09-12  7:20                   ` Peter Zijlstra
2023-09-12  7:38                     ` Ingo Molnar
2023-09-11 22:20                 ` Steven Rostedt
2023-09-11 23:10                   ` Ankur Arora
2023-09-11 23:16                     ` Steven Rostedt
2023-09-12 16:30                   ` Linus Torvalds
2023-09-12  3:27                 ` Matthew Wilcox
2023-09-12 16:20                   ` Linus Torvalds
2023-09-19  3:21   ` Andy Lutomirski
2023-09-19  9:20     ` Thomas Gleixner
2023-09-19  9:49       ` Ingo Molnar
2023-08-30 18:49 ` [PATCH v2 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-09-08 12:42   ` Peter Zijlstra
2023-09-11 17:24     ` Ankur Arora
2023-08-30 18:49 ` [PATCH v2 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-09-08 12:45   ` Peter Zijlstra
2023-09-03  8:14 ` [PATCH v2 0/9] x86/clear_huge_page: multi-page clearing Mateusz Guzik
2023-09-05 22:14   ` Ankur Arora [this message]
2023-09-08  2:18   ` Raghavendra K T
2023-09-05  1:06 ` Raghavendra K T
2023-09-05 19:36   ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8734zsb730.fsf@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=mjguzik@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox