Re: [RFC PATCH v1 0/7] A subsystem for hot page detection and promotion

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Bharata B Rao <bharata@amd.com>
To: Balbir Singh <balbirs@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
	gourry@gourry.net, hannes@cmpxchg.org,
	mgorman@techsingularity.net, mingo@redhat.com,
	peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com,
	rientjes@google.com, sj@kernel.org, weixugc@google.com,
	willy@infradead.org, ying.huang@linux.alibaba.com,
	ziy@nvidia.com, dave@stgolabs.net, nifan.cxl@gmail.com,
	xuezhengchu@huawei.com, yiannis@zptcorp.com,
	akpm@linux-foundation.org, david@redhat.com, byungchul@sk.com,
	kinseyho@google.com, joshua.hahnjy@gmail.com, yuanchu@google.com
Subject: Re: [RFC PATCH v1 0/7] A subsystem for hot page detection and promotion
Date: Fri, 15 Aug 2025 21:05:32 +0530	[thread overview]
Message-ID: <14359326-bdc2-4d9a-b243-b5ffcad0716b@amd.com> (raw)
In-Reply-To: <fa0690e8-ad88-4ffc-9c63-c1d8f3d60f47@nvidia.com>

On 15-Aug-25 5:29 PM, Balbir Singh wrote:
> On 8/14/25 23:48, Bharata B Rao wrote:
>> Hi,
>>
>> This patchset is about adding a dedicated sub-system for maintaining
>> hot pages information from the lower tiers and promoting the hot pages
>> to the top tiers. It exposes an API that other sub-systems which detect
>> accesses, can use to report the accesses for further processing. Further
>> processing includes system-wide accumulation of memory access info at
>> PFN granularity, classification the PFNs as hot and promotion of hot
>> pages using per-node kernel threads. This is a continuation of the
>> earlier kpromoted work [1] that I posted a while back.
>>
>> Kernel thread based async batch migration [2] was an off-shoot of
>> this effort that attempted to batch the migrations from NUMA
>> balancing by creating a separate kernel thread for migration.
>> Per-page hotness information was stored as part of extended page
>> flags. The kernel thread then scanned the entire PFN space to pick
>> the PFNs that are classified as hot.
>>
>> The observed challenges from the previous approaches were these:
>>
>> 1. Too many PFNs need to be scanned to identify the hot PFNs in
>>    approach [2].
>> 2. Hot page records stored in hash lists become unwieldy for
>>    extracting the required hot pages in approach [1].
>> 3. Dynamic allocation vs static availability of space to store
>>    per-page hotness information.
>>
>> This series tries to address challenges 1 and 2 by maintaining
>> the hot page records in hash lists for quick lookup and maintaining
>> a separate per-target-node max heap for storing ready-to-migrate
>> hot page records. The records in heap are priority-ordered based
>> on "hotness" of the page.
>>
> 
> Could you elaborate on when/how a page is considered hot? Is it based
> on how often a page has been scanned?

There are multiple sub-systems within the kernel which detect and
act upon page accesses. NUMA balancing (via hint faults), MGLRU (via
page table scanning for PTE A bit) are examples of the same. The
idea behind this patchset is to consolidate such access information
within a new dedicated sub-system for hot page promotion that
maintains hotness data for accessed pages and promotes them when
a threshold is reached.

Currently I am considering only the number of accesses as an
indicator of page hotness. We need to consider the time of access
too. Both of them should contribute to the eventual "hotness" indicator.
Maybe something similar/analogous to how memory tiering derives
adistance value from bandwidth and latency could be tried out.

> 
>> The API for reporting the page access remains unchanged from [1].
>> When the page access gets recorded, the hotness data of the page
>> is updated and if it crosses a threshold, it gets tracked in the
>> heap as well. These heaps are per-target-node and corresponding
>> migrate threads will periodically extract the top records from
>> them and do batch migration. 
>>
> 
> I don't quite follow the heaps and tracking in the heap, could
> you please clarify

When different sub-systems report page accesses via the API
introduced by this new sub-system, a record for each such page
is stored in hash lists (hashed by PFN value). In addition to
the PFN and target_nid, the hotness record includes parameters
like frequency and time of access from which the hotness is
derived. Repeated reporting of access on the same PFN will result
in updating of hotness information. When the hotness of a
record (as updated during reporting of access) crosses a threshold,
the record becomes part of a max heap data structure. Records
in the max heap are arranged based on the hotness and hence
the top elements of the heap will correspond to the hottest
pages. There will be one such heap for each toptier node so
that per-toptier-node kpromoted thread can easily extract the
top N records from its own heap and perform batched migration.

Hope this clarifies.

Regards,
Bharata.

     prev parent reply	other threads:[~2025-08-15 15:35 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-14 13:48 Bharata B Rao
2025-08-14 13:48 ` [RFC PATCH v1 1/7] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-08-15  1:29   ` Huang, Ying
2025-08-14 13:48 ` [RFC PATCH v1 2/7] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-08-15  1:39   ` Huang, Ying
2025-08-14 13:48 ` [RFC PATCH v1 3/7] mm: Hot page tracking and promotion Bharata B Rao
2025-08-15  1:56   ` Huang, Ying
2025-08-15 14:16     ` Bharata B Rao
     [not found]   ` <CGME20250821111729epcas5p4b57cdfb4a339e8ac7fc1ea803d6baa34@epcas5p4.samsung.com>
2025-08-21 11:17     ` Alok Rathore
2025-08-21 15:10       ` Bharata B Rao
2025-08-14 13:48 ` [RFC PATCH v1 4/7] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-08-14 13:48 ` [RFC PATCH v1 5/7] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-08-14 13:48 ` [RFC PATCH v1 6/7] mm: mglru: generalize page table walk Bharata B Rao
2025-08-14 13:48 ` [RFC PATCH v1 7/7] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-08-15 11:59 ` [RFC PATCH v1 0/7] A subsystem for hot page detection and promotion Balbir Singh
2025-08-15 15:35   ` Bharata B Rao [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14359326-bdc2-4d9a-b243-b5ffcad0716b@amd.com \
    --to=bharata@amd.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=balbirs@nvidia.com \
    --cc=byungchul@sk.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kinseyho@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=nifan.cxl@gmail.com \
    --cc=peterz@infradead.org \
    --cc=raghavendra.kt@amd.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=sj@kernel.org \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=xuezhengchu@huawei.com \
    --cc=yiannis@zptcorp.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox