Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Byungchul Park <byungchul@sk.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: kernel_team@skhynix.com, akpm@linux-foundation.org,
	ying.huang@intel.com, namit@vmware.com, xhao@linux.alibaba.com,
	mgorman@techsingularity.net, hughd@google.com,
	willy@infradead.org, peterz@infradead.org, luto@kernel.org,
	tglx@linutronix.de, mingo@redhat.com, bp@alien8.de,
	dave.hansen@linux.intel.com
Subject: Re: [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration
Date: Mon, 30 Oct 2023 09:00:56 +0100	[thread overview]
Message-ID: <a8337371-50ed-4618-b48e-78b96d18810f@redhat.com> (raw)
In-Reply-To: <20231030072540.38631-3-byungchul@sk.com>

On 30.10.23 08:25, Byungchul Park wrote:
> Implementation of CONFIG_MIGRC that stands for 'Migration Read Copy'.
> We always face the migration overhead at either promotion or demotion,
> while working with tiered memory e.g. CXL memory and found out TLB
> shootdown is a quite big one that is needed to get rid of if possible.
> 
> Fortunately, TLB flush can be defered or even skipped if both source and
> destination of folios during migration are kept until all TLB flushes
> required will have been done, of course, only if the target PTE entries
> have read only permission, more precisely speaking, don't have write
> permission. Otherwise, no doubt the folio might get messed up.
> 
> To achieve that:
> 
>     1. For the folios that map only to non-writable TLB entries, prevent
>        TLB flush at migration by keeping both source and destination
>        folios, which will be handled later at a better time.
> 
>     2. When any non-writable TLB entry changes to writable e.g. through
>        fault handler, give up CONFIG_MIGRC mechanism so as to perform
>        TLB flush required right away.
> 
>     3. Temporarily stop migrc from working when the system is in very
>        high memory pressure e.g. direct reclaim needed.
> 
> The measurement result:
> 
>     Architecture - x86_64
>     QEMU - kvm enabled, host cpu
>     Numa - 2 nodes (16 CPUs 1GB, no CPUs 8GB)
>     Linux Kernel - v6.6-rc5, numa balancing tiering on, demotion enabled
>     Benchmark - XSBench -p 50000000 (-p option makes the runtime longer)
> 
>     run 'perf stat' using events:
>        1) itlb.itlb_flush
>        2) tlb_flush.dtlb_thread
>        3) tlb_flush.stlb_any
>        4) dTLB-load-misses
>        5) dTLB-store-misses
>        6) iTLB-load-misses
> 
>     run 'cat /proc/vmstat' and pick:
>        1) numa_pages_migrated
>        2) pgmigrate_success
>        3) nr_tlb_remote_flush
>        4) nr_tlb_remote_flush_received
>        5) nr_tlb_local_flush_all
>        6) nr_tlb_local_flush_one
> 
>     BEFORE - mainline v6.6-rc5
>     ------------------------------------------
>     $ perf stat -a \
> 	   -e itlb.itlb_flush \
> 	   -e tlb_flush.dtlb_thread \
> 	   -e tlb_flush.stlb_any \
> 	   -e dTLB-load-misses \
> 	   -e dTLB-store-misses \
> 	   -e iTLB-load-misses \
> 	   ./XSBench -p 50000000
> 
>     Performance counter stats for 'system wide':
> 
>        20953405     itlb.itlb_flush
>        114886593    tlb_flush.dtlb_thread
>        88267015     tlb_flush.stlb_any
>        115304095543 dTLB-load-misses
>        163904743    dTLB-store-misses
>        608486259	   iTLB-load-misses
> 
>     556.787113849 seconds time elapsed
> 
>     $ cat /proc/vmstat
> 
>     ...
>     numa_pages_migrated 3378748
>     pgmigrate_success 7720310
>     nr_tlb_remote_flush 751464
>     nr_tlb_remote_flush_received 10742115
>     nr_tlb_local_flush_all 21899
>     nr_tlb_local_flush_one 740157
>     ...
> 
>     AFTER - mainline v6.6-rc5 + CONFIG_MIGRC
>     ------------------------------------------
>     $ perf stat -a \
> 	   -e itlb.itlb_flush \
> 	   -e tlb_flush.dtlb_thread \
> 	   -e tlb_flush.stlb_any \
> 	   -e dTLB-load-misses \
> 	   -e dTLB-store-misses \
> 	   -e iTLB-load-misses \
> 	   ./XSBench -p 50000000
> 
>     Performance counter stats for 'system wide':
> 
>        4353555      itlb.itlb_flush
>        72482780     tlb_flush.dtlb_thread
>        68226458     tlb_flush.stlb_any
>        114331610808 dTLB-load-misses
>        116084771    dTLB-store-misses
>        377180518    iTLB-load-misses
> 
>     552.667718220 seconds time elapsed
> 
>     $ cat /proc/vmstat
> 

So, an improvement of 0.74% ? How stable are the results? Serious 
question: worth the churn?

Or did I get the numbers wrong?

>   #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 5c02720c53a5..1ca2ac91aa14 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -135,6 +135,9 @@ enum pageflags {
>   #ifdef CONFIG_ARCH_USES_PG_ARCH_X
>   	PG_arch_2,
>   	PG_arch_3,
> +#endif
> +#ifdef CONFIG_MIGRC
> +	PG_migrc,		/* Page has its copy under migrc's control */
>   #endif
>   	__NR_PAGEFLAGS,
>   
> @@ -589,6 +592,10 @@ TESTCLEARFLAG(Young, young, PF_ANY)
>   PAGEFLAG(Idle, idle, PF_ANY)
>   #endif
>   
> +#ifdef CONFIG_MIGRC
> +PAGEFLAG(Migrc, migrc, PF_ANY)
> +#endif

I assume you know this: new pageflags are frowned upon.

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2023-10-30  8:01 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-30  7:25 [v3 0/3] Reduce TLB flushes under some specific conditions Byungchul Park
2023-10-30  7:25 ` [v3 1/3] mm/rmap: Recognize non-writable TLB entries during TLB batch flush Byungchul Park
2023-10-30  7:52   ` Nadav Amit
2023-10-30 10:26     ` Byungchul Park
2023-10-30  7:25 ` [v3 2/3] mm: Defer TLB flush by keeping both src and dst folios at migration Byungchul Park
2023-10-30  8:00   ` David Hildenbrand [this message]
2023-10-30  9:58     ` Byungchul Park
2023-11-01  3:06       ` Huang, Ying
2023-10-30  8:50   ` Nadav Amit
2023-10-30 12:51     ` Byungchul Park
2023-10-30 15:58       ` Nadav Amit
2023-10-30 22:40         ` Byungchul Park
2023-11-08  4:12       ` Byungchul Park
2023-11-09 10:16         ` Nadav Amit
2023-11-10  1:02           ` Byungchul Park
2023-11-10  3:13             ` Byungchul Park
2023-11-10 22:18               ` Nadav Amit
2023-11-15  5:48                 ` Byungchul Park
2023-11-09  5:35       ` Byungchul Park
2023-10-30  7:25 ` [v3 3/3] mm, migrc: Add a sysctl knob to enable/disable MIGRC mechanism Byungchul Park
2023-10-30  8:51   ` Nadav Amit
2023-10-30 10:36     ` Byungchul Park
2023-10-30 17:55 ` [v3 0/3] Reduce TLB flushes under some specific conditions Dave Hansen
2023-10-30 18:32   ` Nadav Amit
2023-10-30 22:55   ` Byungchul Park
2023-10-31  8:46     ` David Hildenbrand
2023-10-31  2:37   ` Byungchul Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a8337371-50ed-4618-b48e-78b96d18810f@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hughd@google.com \
    --cc=kernel_team@skhynix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=namit@vmware.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    --cc=xhao@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox