Re: [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Raghavendra K T <raghavendra.kt@amd.com>
To: AneeshKumar.KizhakeVeetil@arm.com, Hasan.Maruf@amd.com,
	Michael.Day@amd.com, akpm@linux-foundation.org, bharata@amd.com,
	dave.hansen@intel.com, david@redhat.com,
	dongjoo.linux.dev@gmail.com, feng.tang@intel.com,
	gourry@gourry.net, hannes@cmpxchg.org, honggyu.kim@sk.com,
	hughd@google.com, jhubbard@nvidia.com, jon.grimm@amd.com,
	k.shutemov@gmail.com, kbusch@meta.com, kmanaouil.dev@gmail.com,
	leesuyeon0506@gmail.com, leillc@google.com,
	liam.howlett@oracle.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, mgorman@techsingularity.net,
	mingo@redhat.com, nadav.amit@gmail.com, nphamcs@gmail.com,
	peterz@infradead.org, riel@surriel.com, rientjes@google.com,
	rppt@kernel.org, santosh.shukla@amd.com, shivankg@amd.com,
	shy828301@gmail.com, sj@kernel.org, vbabka@suse.cz,
	weixugc@google.com, willy@infradead.org,
	ying.huang@linux.alibaba.com, ziy@nvidia.com,
	Jonathan.Cameron@huawei.com, alok.rathore@samsung.com,
	kinseyho@google.com, yuanchu@google.com
Subject: Re: [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit
Date: Tue, 25 Mar 2025 12:06:39 +0530	[thread overview]
Message-ID: <ff53d70a-7d59-4f0d-aad0-03628f9d8b67@amd.com> (raw)
In-Reply-To: <20250321203555.4n6byk6vmnkmpewi@offworld>

+kinseyho and yuanchu

On 3/22/2025 2:05 AM, Davidlohr Bueso wrote:
> On Fri, 21 Mar 2025, Raghavendra K T wrote:
> 
>>> But a longer running/ more memory workload may make more difference.
>>> I will comeback with that number.
>>
>>                 base NUMAB=2   Patched NUMAB=0
>>                 time in sec    time in sec
>> ===================================================
>> 8G:              134.33 (0.19)   119.88 ( 0.25)
>> 16G:             292.24 (0.60)   325.06 (11.11)
>> 32G:             585.06 (0.24)   546.15 ( 0.50)
>> 64G:            1278.98 (0.27)  1221.41 ( 1.54)
>>
>> We can see that numbers have not changed much between NUMAB=1 NUMAB=0 in
>> patched case.
> 
> Thanks. Since this might vary across workloads, another important metric
> here is numa hit/misses statistics.

Hello David, sorry for coming back late.

Yes I did collect some of the other stats along with this (posting for
8GB only). I did not se much difference in total numa_hit. But there are 
differences in in numa_local etc.. (not pasted here)

#grep -A2 completed  abench_cxl_6.14.0-rc6-kmmscand+_8G.log 
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log
abench_cxl_6.14.0-rc6-kmmscand+_8G.log:Benchmark completed in 
120292376.0 us, Total thread execution time 7490922681.0 us
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_hit 6376927
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_miss 0
--
abench_cxl_6.14.0-rc6-kmmscand+_8G.log:Benchmark completed in 
119583939.0 us, Total thread execution time 7461705291.0 us
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_hit 6373409
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_miss 0
--
abench_cxl_6.14.0-rc6-kmmscand+_8G.log:Benchmark completed in 
119784117.0 us, Total thread execution time 7482710944.0 us
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_hit 6378384
abench_cxl_6.14.0-rc6-kmmscand+_8G.log-numa_miss 0
--
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log:Benchmark completed in 
134481344.0 us, Total thread execution time 8409840511.0 us
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_hit 6303300
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_miss 0
--
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log:Benchmark completed in 
133967260.0 us, Total thread execution time 8352886349.0 us
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_hit 6304063
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_miss 0
--
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log:Benchmark completed in 
134554911.0 us, Total thread execution time 8444951713.0 us
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_hit 6302506
abench_cxl_6.14.0-rc6-cxlfix+_numab2_8G.log-numa_miss 0

> 
> fyi I have also been trying this series to get some numbers as well, but
> noticed overnight things went south (so no chance before LSFMM):
>

This issue looks to be different. Could you please let me know any ways
to reproduce?

I had tested perf bench numa mem, did not find anything.

The issue I know of currently is:

kmmscand:
  for_each_mm
     for_each_vma
         scan_vma and get accessed_folo_list
         add to migration_list() // does not check for duplicate

kmmmigrated:
   for_each_folio in migration_list
        migrate_misplaced_folio()

there is also
   cleanup_migration_list() in mm teardown

migration_list is protected by single lock, and kmmscand is too
aggressive and can potentially bombard with migration_list (practical
workload may generate lesser pages though). That results in non-fatal
  softlockup that will be fixed with mmslot as I noted somewhere.

But now main challenge to solve in kmmscand is, it generates:
t1-> migration_list1 (of recently accessed folios)
t2-> migration_list2

How do I get the union of migration_list1 and migration_list2 so that
instead of migrating on first access, we can get a hotter page to
promote.

I had few solutions in mind: (That I wanted to get opinion / suggestion
from exerts during LSFMM)

1. Reusing DAMON VA scanning. scanning params are controlled in KMMSCAND 
(current heuristics)


2. Can we use LRU information to filter access list (LRU active/ folio 
is in (n-1) generation?)
  (I do see Kinseyho just posted LRU based approach)

3. Can we split the address range to 2MB to monitor? PMD level access 
monitoring.

4. Any possible ways of using bloom-filters for list1,list2

- Raghu

[snip...]

next prev parent reply	other threads:[~2025-03-25  6:37 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-19 19:30 Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 01/13] mm: Add kmmscand kernel daemon Raghavendra K T
2025-03-21 16:06   ` Jonathan Cameron
2025-03-24 15:09     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 02/13] mm: Maintain mm_struct list in the system Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 03/13] mm: Scan the mm and create a migration list Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 04/13] mm: Create a separate kernel thread for migration Raghavendra K T
2025-03-21 17:29   ` Jonathan Cameron
2025-03-24 15:17     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 05/13] mm/migration: Migrate accessed folios to toptier node Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 06/13] mm: Add throttling of mm scanning using scan_period Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 07/13] mm: Add throttling of mm scanning using scan_size Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 08/13] mm: Add initial scan delay Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 09/13] mm: Add heuristic to calculate target node Raghavendra K T
2025-03-21 17:42   ` Jonathan Cameron
2025-03-24 16:17     ` Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 10/13] sysfs: Add sysfs support to tune scanning Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 11/13] vmstat: Add vmstat counters Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 12/13] trace/kmmscand: Add tracing of scanning and migration Raghavendra K T
2025-03-19 19:30 ` [RFC PATCH V1 13/13] prctl: Introduce new prctl to control scanning Raghavendra K T
2025-03-19 23:00 ` [RFC PATCH V1 00/13] mm: slowtier page promotion based on PTE A bit Davidlohr Bueso
2025-03-20  8:51   ` Raghavendra K T
2025-03-20 19:11     ` Raghavendra K T
2025-03-21 20:35       ` Davidlohr Bueso
2025-03-25  6:36         ` Raghavendra K T [this message]
2025-03-20 21:50     ` Davidlohr Bueso
2025-03-21  6:48       ` Raghavendra K T
2025-03-21 15:52 ` Jonathan Cameron
     [not found] ` <20250321105309.3521-1-hdanton@sina.com>
2025-03-23 18:14   ` [RFC PATCH V1 09/13] mm: Add heuristic to calculate target node Raghavendra K T
     [not found]   ` <20250324110543.3599-1-hdanton@sina.com>
2025-03-24 14:54     ` Raghavendra K T

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff53d70a-7d59-4f0d-aad0-03628f9d8b67@amd.com \
    --to=raghavendra.kt@amd.com \
    --cc=AneeshKumar.KizhakeVeetil@arm.com \
    --cc=Hasan.Maruf@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Michael.Day@amd.com \
    --cc=akpm@linux-foundation.org \
    --cc=alok.rathore@samsung.com \
    --cc=bharata@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dongjoo.linux.dev@gmail.com \
    --cc=feng.tang@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=honggyu.kim@sk.com \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=jon.grimm@amd.com \
    --cc=k.shutemov@gmail.com \
    --cc=kbusch@meta.com \
    --cc=kinseyho@google.com \
    --cc=kmanaouil.dev@gmail.com \
    --cc=leesuyeon0506@gmail.com \
    --cc=leillc@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=nphamcs@gmail.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=santosh.shukla@amd.com \
    --cc=shivankg@amd.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox