From: Alok Rathore <alok.rathore@samsung.com>
To: Bharata B Rao <bharata@amd.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
Jonathan.Cameron@huawei.com, dave.hansen@intel.com,
gourry@gourry.net, mgorman@techsingularity.net, mingo@redhat.com,
peterz@infradead.org, raghavendra.kt@amd.com, riel@surriel.com,
rientjes@google.com, sj@kernel.org, weixugc@google.com,
willy@infradead.org, ying.huang@linux.alibaba.com,
ziy@nvidia.com, dave@stgolabs.net, nifan.cxl@gmail.com,
xuezhengchu@huawei.com, yiannis@zptcorp.com,
akpm@linux-foundation.org, david@redhat.com, byungchul@sk.com,
kinseyho@google.com, joshua.hahnjy@gmail.com, yuanchu@google.com,
balbirs@nvidia.com, shivankg@amd.com, alokrathore20@gmail.com,
gost.dev@samsung.com, cpgs@samsung.com
Subject: Re: [RFC PATCH v4 3/9] mm: Hot page tracking and promotion
Date: Mon, 22 Dec 2025 15:47:21 +0530 [thread overview]
Message-ID: <158453976.61766398803526.JavaMail.epsvc@epcpadp1new> (raw)
In-Reply-To: <20251206101423.5004-4-bharata@amd.com>
[-- Attachment #1: Type: text/plain, Size: 4041 bytes --]
On 06/12/25 03:44PM, Bharata B Rao wrote:
>This introduces a sub-system for collecting memory access
>information from different sources. It maintains the hotness
>information based on the access history and time of access.
>
>Additionally, it provides per-lowertier-node kernel threads
>(named kmigrated) that periodically promote the pages that
>are eligible for promotion.
>
>Sub-systems that generate hot page access info can report that
>using this API:
>
>int pghot_record_access(unsigned long pfn, int nid, int src,
> unsigned long time)
>
>@pfn: The PFN of the memory accessed
>@nid: The accessing NUMA node ID
>@src: The temperature source (sub-system) that generated the
> access info
>@time: The access time in jiffies
>
>Some temperature sources may not provide the nid from which
>the page was accessed. This is true for sources that use
>page table scanning for PTE Accessed bit. For such sources,
>the default toptier node to which such pages should be promoted
>is hard coded.
>
>The hotness information is stored for every page of lower
>tier memory in an unsigned long variable that is part of
>mem_section data structure.
>
>kmigrated is a per-lowertier-node kernel thread that migrates
>the folios marked for migration in batches. Each kmigrated
>thread walks the PFN range spanning its node and checks
>for potential migration candidates.
>
>A bunch of tunables for enabling different hotness sources,
>setting target_nid, frequency threshold are provided in debugfs.
>
>Signed-off-by: Bharata B Rao <bharata@amd.com>
<snip>
>+++ b/include/linux/pghot.h
>@@ -0,0 +1,71 @@
>+/* SPDX-License-Identifier: GPL-2.0 */
>+#ifndef _LINUX_PGHOT_H
>+#define _LINUX_PGHOT_H
>+
>+/* Page hotness temperature sources */
>+enum pghot_src {
>+ PGHOT_HW_HINTS,
>+ PGHOT_PGTABLE_SCAN,
>+ PGHOT_HINT_FAULT,
>+};
>+
>+#ifdef CONFIG_PGHOT
>+/*
>+ * Bit positions to enable individual sources in pghot/records_enabled
>+ * of debugfs.
>+ */
>+enum pghot_src_enabed {
>+ PGHOT_HWHINTS_BIT = 0,
>+ PGHOT_PGTSCAN_BIT,
>+ PGHOT_HINTFAULT_BIT,
>+ PGHOT_MAX_BIT
>+};
>+
>+#define PGHOT_HWHINTS_ENABLED BIT(PGHOT_HWHINTS_BIT)
>+#define PGHOT_PGTSCAN_ENABLED BIT(PGHOT_PGTSCAN_BIT)
>+#define PGHOT_HINTFAULT_ENABLED BIT(PGHOT_HINTFAULT_BIT)
>+#define PGHOT_SRC_ENABLED_MASK GENMASK(PGHOT_MAX_BIT - 1, 0)
>+
>+#define PGHOT_DEFAULT_FREQ_WINDOW (5 * MSEC_PER_SEC)
>+#define PGHOT_DEFAULT_FREQ_THRESHOLD 2
>+
>+#define KMIGRATED_DEFAULT_SLEEP_MS 100
>+#define KMIGRATED_DEFAULT_BATCH_NR 512
>+
>+#define PGHOT_DEFAULT_NODE 0
>+
>+/*
>+ * Bits 0-31 are used to store nid, frequency and time.
>+ * Bits 32-62 are unused now.
>+ * Bit 63 is used to indicate the page is ready for migration.
>+ */
>+#define PGHOT_MIGRATE_READY 63
>+
>+#define PGHOT_NID_WIDTH 10
>+#define PGHOT_FREQ_WIDTH 3
>+/* time is stored in 19 bits which can represent up to 8.73s with HZ=1000 */
If we consider HZ = 1000 then using 19 bit time is coming 8.73 mins. I think by mistake you commented as 8.73 secs.
Suggetion:
If we are targeting to promote page in ~8 secs then 13 bits would be enough, that way we can handle hotness using 32 bits per pfn insead of 64 bits.
#define PGHOT_MIGRATE_READY 31
#define PGHOT_NID_WIDTH 10
#define PGHOT_FREQ_WIDTH 3
/* time is stored in 13 bits which can represent up to 8.19s with HZ=1000 */
#define PGHOT_TIME_WIDTH 13
>+#define PGHOT_TIME_WIDTH 19
>+
>+#define PGHOT_NID_SHIFT 0
>+#define PGHOT_FREQ_SHIFT (PGHOT_NID_SHIFT + PGHOT_NID_WIDTH)
>+#define PGHOT_TIME_SHIFT (PGHOT_FREQ_SHIFT + PGHOT_FREQ_WIDTH)
>+
>+#define PGHOT_NID_MASK ((1UL << PGHOT_NID_SHIFT) - 1)
>+#define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_SHIFT) - 1)
>+#define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_SHIFT) - 1)
Mask generation of freq, nid and time seems not correct. It should be
#define PGHOT_NID_MASK ((1UL << PGHOT_NID_WIDTH) - 1)
#define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_WIDTH) - 1)
#define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_WIDTH) - 1)
Can you please have a look?
Regards,
Alok Rathore
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
next prev parent reply other threads:[~2025-12-22 10:20 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-06 10:14 [RFC PATCH v4 0/9] mm: Hot page tracking and promotion infrastructure Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 1/9] mm: migrate: Allow misplaced migration without VMA too Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 2/9] migrate: implement migrate_misplaced_folios_batch Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 3/9] mm: Hot page tracking and promotion Bharata B Rao
[not found] ` <CGME20251222101745epcas5p43ca3a6a59efe996cd62769e8d57bb81d@epcas5p4.samsung.com>
2025-12-22 10:17 ` Alok Rathore [this message]
2025-12-06 10:14 ` [RFC PATCH v4 4/9] x86: ibs: In-kernel IBS driver for memory access profiling Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 5/9] x86: ibs: Enable IBS profiling for memory accesses Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 6/9] mm: mglru: generalize page table walk Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 7/9] mm: klruscand: use mglru scanning for page promotion Bharata B Rao
2025-12-06 10:14 ` [RFC PATCH v4 8/9] mm: sched: Move hot page promotion from NUMAB=2 to pghot tracking Bharata B Rao
[not found] ` <CGME20251222102716epcas5p45d0893afb074ef3fa4be0c912cd0e237@epcas5p4.samsung.com>
2025-12-22 10:26 ` Alok Rathore
2025-12-06 10:14 ` [RFC PATCH v4 9/9] mm: pghot: Add folio_mark_accessed() as hotness source Bharata B Rao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=158453976.61766398803526.JavaMail.epsvc@epcpadp1new \
--to=alok.rathore@samsung.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=alokrathore20@gmail.com \
--cc=balbirs@nvidia.com \
--cc=bharata@amd.com \
--cc=byungchul@sk.com \
--cc=cpgs@samsung.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=gost.dev@samsung.com \
--cc=gourry@gourry.net \
--cc=joshua.hahnjy@gmail.com \
--cc=kinseyho@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=nifan.cxl@gmail.com \
--cc=peterz@infradead.org \
--cc=raghavendra.kt@amd.com \
--cc=riel@surriel.com \
--cc=rientjes@google.com \
--cc=shivankg@amd.com \
--cc=sj@kernel.org \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=xuezhengchu@huawei.com \
--cc=yiannis@zptcorp.com \
--cc=ying.huang@linux.alibaba.com \
--cc=yuanchu@google.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox