On 06/12/25 03:44PM, Bharata B Rao wrote: >This introduces a sub-system for collecting memory access >information from different sources. It maintains the hotness >information based on the access history and time of access. > >Additionally, it provides per-lowertier-node kernel threads >(named kmigrated) that periodically promote the pages that >are eligible for promotion. > >Sub-systems that generate hot page access info can report that >using this API: > >int pghot_record_access(unsigned long pfn, int nid, int src, > unsigned long time) > >@pfn: The PFN of the memory accessed >@nid: The accessing NUMA node ID >@src: The temperature source (sub-system) that generated the > access info >@time: The access time in jiffies > >Some temperature sources may not provide the nid from which >the page was accessed. This is true for sources that use >page table scanning for PTE Accessed bit. For such sources, >the default toptier node to which such pages should be promoted >is hard coded. > >The hotness information is stored for every page of lower >tier memory in an unsigned long variable that is part of >mem_section data structure. > >kmigrated is a per-lowertier-node kernel thread that migrates >the folios marked for migration in batches. Each kmigrated >thread walks the PFN range spanning its node and checks >for potential migration candidates. > >A bunch of tunables for enabling different hotness sources, >setting target_nid, frequency threshold are provided in debugfs. > >Signed-off-by: Bharata B Rao >+++ b/include/linux/pghot.h >@@ -0,0 +1,71 @@ >+/* SPDX-License-Identifier: GPL-2.0 */ >+#ifndef _LINUX_PGHOT_H >+#define _LINUX_PGHOT_H >+ >+/* Page hotness temperature sources */ >+enum pghot_src { >+ PGHOT_HW_HINTS, >+ PGHOT_PGTABLE_SCAN, >+ PGHOT_HINT_FAULT, >+}; >+ >+#ifdef CONFIG_PGHOT >+/* >+ * Bit positions to enable individual sources in pghot/records_enabled >+ * of debugfs. >+ */ >+enum pghot_src_enabed { >+ PGHOT_HWHINTS_BIT = 0, >+ PGHOT_PGTSCAN_BIT, >+ PGHOT_HINTFAULT_BIT, >+ PGHOT_MAX_BIT >+}; >+ >+#define PGHOT_HWHINTS_ENABLED BIT(PGHOT_HWHINTS_BIT) >+#define PGHOT_PGTSCAN_ENABLED BIT(PGHOT_PGTSCAN_BIT) >+#define PGHOT_HINTFAULT_ENABLED BIT(PGHOT_HINTFAULT_BIT) >+#define PGHOT_SRC_ENABLED_MASK GENMASK(PGHOT_MAX_BIT - 1, 0) >+ >+#define PGHOT_DEFAULT_FREQ_WINDOW (5 * MSEC_PER_SEC) >+#define PGHOT_DEFAULT_FREQ_THRESHOLD 2 >+ >+#define KMIGRATED_DEFAULT_SLEEP_MS 100 >+#define KMIGRATED_DEFAULT_BATCH_NR 512 >+ >+#define PGHOT_DEFAULT_NODE 0 >+ >+/* >+ * Bits 0-31 are used to store nid, frequency and time. >+ * Bits 32-62 are unused now. >+ * Bit 63 is used to indicate the page is ready for migration. >+ */ >+#define PGHOT_MIGRATE_READY 63 >+ >+#define PGHOT_NID_WIDTH 10 >+#define PGHOT_FREQ_WIDTH 3 >+/* time is stored in 19 bits which can represent up to 8.73s with HZ=1000 */ If we consider HZ = 1000 then using 19 bit time is coming 8.73 mins. I think by mistake you commented as 8.73 secs. Suggetion: If we are targeting to promote page in ~8 secs then 13 bits would be enough, that way we can handle hotness using 32 bits per pfn insead of 64 bits. #define PGHOT_MIGRATE_READY 31 #define PGHOT_NID_WIDTH 10 #define PGHOT_FREQ_WIDTH 3 /* time is stored in 13 bits which can represent up to 8.19s with HZ=1000 */ #define PGHOT_TIME_WIDTH 13 >+#define PGHOT_TIME_WIDTH 19 >+ >+#define PGHOT_NID_SHIFT 0 >+#define PGHOT_FREQ_SHIFT (PGHOT_NID_SHIFT + PGHOT_NID_WIDTH) >+#define PGHOT_TIME_SHIFT (PGHOT_FREQ_SHIFT + PGHOT_FREQ_WIDTH) >+ >+#define PGHOT_NID_MASK ((1UL << PGHOT_NID_SHIFT) - 1) >+#define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_SHIFT) - 1) >+#define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_SHIFT) - 1) Mask generation of freq, nid and time seems not correct. It should be #define PGHOT_NID_MASK ((1UL << PGHOT_NID_WIDTH) - 1) #define PGHOT_FREQ_MASK ((1UL << PGHOT_FREQ_WIDTH) - 1) #define PGHOT_TIME_MASK ((1UL << PGHOT_TIME_WIDTH) - 1) Can you please have a look? Regards, Alok Rathore