[PATCH 0/9] x86/clear_huge_page: multi-page clearing

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ankur Arora <ankur.a.arora@oracle.com>
To: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	luto@kernel.org, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, mingo@redhat.com, juri.lelli@redhat.com,
	willy@infradead.org, mgorman@suse.de, peterz@infradead.org,
	rostedt@goodmis.org, tglx@linutronix.de,
	vincent.guittot@linaro.org, jon.grimm@amd.com, bharata@amd.com,
	boris.ostrovsky@oracle.com, konrad.wilk@oracle.com,
	ankur.a.arora@oracle.com
Subject: [PATCH 0/9] x86/clear_huge_page: multi-page clearing
Date: Sun,  2 Apr 2023 22:22:24 -0700	[thread overview]
Message-ID: <20230403052233.1880567-1-ankur.a.arora@oracle.com> (raw)

This series introduces multi-page clearing for hugepages. 

This is a follow up of some of the ideas discussed at:
  https://lore.kernel.org/lkml/CAHk-=wj9En-BC4t7J9xFZOws5ShwaR9yor7FxHZr8CTVyEP_+Q@mail.gmail.com/

On x86 page clearing is typically done via string intructions. These,
unlike a MOV loop, allow us to explicitly advertise the region-size to
the processor, which could serve as a hint to current (and/or
future) uarchs to elide cacheline allocation.

In current generation processors, Milan (and presumably other Zen
variants) use the hint to elide cacheline allocation (for
region-size > LLC-size.)

An additional reason for doing this is that string instructions are typically
microcoded, and clearing in bigger chunks than the current page-at-a-
time logic amortizes some of the cost.

All uarchs tested (Milan, Icelakex, Skylakex) showed improved performance.

There are, however, some problems:

1. extended zeroing periods means there's an increased latency due to
   the now missing preemption points.

   That's handled in patches 7, 8, 9:
     "sched: define TIF_ALLOW_RESCHED"
     "irqentry: define irqentry_exit_allow_resched()"
     "x86/clear_huge_page: make clear_contig_region() preemptible"
   by the context marking itself reschedulable, and rescheduling in
   irqexit context if needed (for PREEMPTION_NONE/_VOLUNTARY.)

2. the current page-at-a-time clearing logic does left-right narrowing
   towards the faulting page which benefits workloads by maintaining
   cache locality for workloads which have a sequential pattern. Clearing
   in large chunks loses that.

   Some (but not all) of that could be ameliorated by something like
   this patch:
   https://lore.kernel.org/lkml/20220606203725.1313715-1-ankur.a.arora@oracle.com/

   But, before doing that I'd like some comments on whether that is
   worth doing for this specific use case?

Rest of the series:
  Patches 1, 2, 3:
    "huge_pages: get rid of process_huge_page()"
    "huge_page: get rid of {clear,copy}_subpage()"
    "huge_page: allow arch override for clear/copy_huge_page()"
  are mechanical and they simplify some of the current clear_huge_page()
  logic.

  Patches 4, 5:
  "x86/clear_page: parameterize clear_page*() to specify length"
  "x86/clear_pages: add clear_pages()"

  add clear_pages() and helpers.

  Patch 6: "mm/clear_huge_page: use multi-page clearing" adds the
  chunked x86 clear_huge_page() implementation.


Performance
==

Demand fault performance gets a decent boost:

  *Icelakex*  mm/clear_huge_page   x86/clear_huge_page   change   
                          (GB/s)                (GB/s)            
                                                                  
  pg-sz=2MB                 8.76                 11.82   +34.93%  
  pg-sz=1GB                 8.99                 12.18   +35.48%  


  *Milan*     mm/clear_huge_page   x86/clear_huge_page   change    
                          (GB/s)                (GB/s)             
                                                                   
  pg-sz=2MB                12.24                 17.54    +43.30%  
  pg-sz=1GB                17.98                 37.24   +107.11%  


vm-scalability/case-anon-w-seq-hugetlb, gains in stime but performs
worse when user space tries to touch those pages:

  *Icelakex*                  mm/clear_huge_page   x86/clear_huge_page   change
  (mem=4GB/task, tasks=128)

  stime                           293.02 +- .49%        239.39 +- .83%   -18.30%
  utime                           440.11 +- .28%        508.74 +- .60%   +15.59%
  wall-clock                        5.96 +- .33%          6.27 +-2.23%   + 5.20%


  *Milan*                     mm/clear_huge_page   x86/clear_huge_page   change
  (mem=1GB/task, tasks=512)

  stime                          490.95 +- 3.55%       466.90 +- 4.79%   - 4.89%
  utime                          276.43 +- 2.85%       311.97 +- 5.15%   +12.85%
  wall-clock                       3.74 +- 6.41%         3.58 +- 7.82%   - 4.27%

Also at:
  github.com/terminus/linux clear-pages.v1

Comments appreciated!

Ankur Arora (9):
  huge_pages: get rid of process_huge_page()
  huge_page: get rid of {clear,copy}_subpage()
  huge_page: allow arch override for clear/copy_huge_page()
  x86/clear_page: parameterize clear_page*() to specify length
  x86/clear_pages: add clear_pages()
  mm/clear_huge_page: use multi-page clearing
  sched: define TIF_ALLOW_RESCHED
  irqentry: define irqentry_exit_allow_resched()
  x86/clear_huge_page: make clear_contig_region() preemptible

 arch/x86/include/asm/page.h        |   6 +
 arch/x86/include/asm/page_32.h     |   6 +
 arch/x86/include/asm/page_64.h     |  25 +++--
 arch/x86/include/asm/thread_info.h |   2 +
 arch/x86/lib/clear_page_64.S       |  45 ++++++--
 arch/x86/mm/hugetlbpage.c          |  59 ++++++++++
 include/linux/sched.h              |  29 +++++
 kernel/entry/common.c              |   8 ++
 kernel/sched/core.c                |  36 +++---
 mm/memory.c                        | 174 +++++++++++++++--------------
 10 files changed, 270 insertions(+), 120 deletions(-)

-- 
2.31.1

next             reply	other threads:[~2023-04-03  5:23 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-03  5:22 Ankur Arora [this message]
2023-04-03  5:22 ` [PATCH 1/9] huge_pages: get rid of process_huge_page() Ankur Arora
2023-04-03  5:22 ` [PATCH 2/9] huge_page: get rid of {clear,copy}_subpage() Ankur Arora
2023-04-03  5:22 ` [PATCH 3/9] huge_page: allow arch override for clear/copy_huge_page() Ankur Arora
2023-04-03  5:22 ` [PATCH 4/9] x86/clear_page: parameterize clear_page*() to specify length Ankur Arora
2023-04-06  8:19   ` Peter Zijlstra
2023-04-07  3:03     ` Ankur Arora
2023-04-03  5:22 ` [PATCH 5/9] x86/clear_pages: add clear_pages() Ankur Arora
2023-04-06  8:23   ` Peter Zijlstra
2023-04-07  0:50     ` Ankur Arora
2023-04-07 10:34       ` Peter Zijlstra
2023-04-09 13:26         ` Matthew Wilcox
2023-04-03  5:22 ` [PATCH 6/9] mm/clear_huge_page: use multi-page clearing Ankur Arora
2023-04-03  5:22 ` [PATCH 7/9] sched: define TIF_ALLOW_RESCHED Ankur Arora
2023-04-05 20:07   ` Peter Zijlstra
2023-04-03  5:22 ` [PATCH 8/9] irqentry: define irqentry_exit_allow_resched() Ankur Arora
2023-04-04  9:38   ` Thomas Gleixner
2023-04-05  5:29     ` Ankur Arora
2023-04-05 20:22   ` Peter Zijlstra
2023-04-06 16:56     ` Ankur Arora
2023-04-06 20:13       ` Peter Zijlstra
2023-04-06 20:16         ` Peter Zijlstra
2023-04-07  2:29         ` Ankur Arora
2023-04-07 10:23           ` Peter Zijlstra
2023-04-03  5:22 ` [PATCH 9/9] x86/clear_huge_page: make clear_contig_region() preemptible Ankur Arora
2023-04-05 20:27   ` Peter Zijlstra
2023-04-06 17:00     ` Ankur Arora
2023-04-05 19:48 ` [PATCH 0/9] x86/clear_huge_page: multi-page clearing Raghavendra K T
2023-04-08 22:46   ` Ankur Arora
2023-04-10  6:26     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230403052233.1880567-1-ankur.a.arora@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@amd.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jon.grimm@amd.com \
    --cc=juri.lelli@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox