* [PATCH v2 1/3] memory tiering: read last_cpupid correctly in do_huge_pmd_numa_page()
@ 2024-07-22 17:29 Zi Yan
2024-07-22 17:29 ` [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check Zi Yan
2024-07-22 17:29 ` [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled Zi Yan
0 siblings, 2 replies; 13+ messages in thread
From: Zi Yan @ 2024-07-22 17:29 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, Kefeng Wang,
linux-kernel, Zi Yan
last_cpupid is only available when memory tiering is off or the folio
is in toptier node. Complete the check to read last_cpupid when it is
available.
Before the fix, the default last_cpupid will be used even if memory
tiering mode is turned off at runtime instead of the actual value. This
can prevent task_numa_fault() from getting right numa fault stats, but
should not cause any crash. User might see performance changes after the
fix.
Reported-by: David Hildenbrand <david@redhat.com>
Closes: https://lore.kernel.org/linux-mm/9af34a6b-ca56-4a64-8aa6-ade65f109288@redhat.com/
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index f4be468e06a4..825317aee88e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1712,7 +1712,8 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
* For memory tiering mode, cpupid of slow memory page is used
* to record page access time. So use default value.
*/
- if (node_is_toptier(nid))
+ if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
+ node_is_toptier(nid))
last_cpupid = folio_last_cpupid(folio);
target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags);
if (target_nid == NUMA_NO_NODE)
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check
2024-07-22 17:29 [PATCH v2 1/3] memory tiering: read last_cpupid correctly in do_huge_pmd_numa_page() Zi Yan
@ 2024-07-22 17:29 ` Zi Yan
2024-07-23 5:54 ` Lorenzo Stoakes
2024-07-22 17:29 ` [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled Zi Yan
1 sibling, 1 reply; 13+ messages in thread
From: Zi Yan @ 2024-07-22 17:29 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, Kefeng Wang,
linux-kernel, Zi Yan
Instead of open coded check for if memory tiering mode is on and a folio
is in the top tier memory, use a function to encapsulate the check.
Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
---
include/linux/mm.h | 6 ++++++
kernel/sched/fair.c | 3 +--
mm/huge_memory.c | 6 ++----
mm/memory-tiers.c | 17 +++++++++++++++++
mm/memory.c | 3 +--
mm/mprotect.c | 3 +--
6 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c227f22ba810..048b2a56d8a3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1738,6 +1738,8 @@ static inline void vma_set_access_pid_bit(struct vm_area_struct *vma)
__set_bit(pid_bit, &vma->numab_state->pids_active[1]);
}
}
+
+bool folio_has_cpupid(struct folio *folio);
#else /* !CONFIG_NUMA_BALANCING */
static inline int folio_xchg_last_cpupid(struct folio *folio, int cpupid)
{
@@ -1791,6 +1793,10 @@ static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
static inline void vma_set_access_pid_bit(struct vm_area_struct *vma)
{
}
+static inline bool folio_has_cpupid(struct folio *folio)
+{
+ return true;
+}
#endif /* CONFIG_NUMA_BALANCING */
#if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..03de808cb3cc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1840,8 +1840,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
* The pages in slow memory node should be migrated according
* to hot/cold instead of private/shared.
*/
- if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
- !node_is_toptier(src_nid)) {
+ if (!folio_has_cpupid(folio)) {
struct pglist_data *pgdat;
unsigned long rate_limit;
unsigned int latency, th, def_th;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 825317aee88e..d925a93bb9ed 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1712,8 +1712,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
* For memory tiering mode, cpupid of slow memory page is used
* to record page access time. So use default value.
*/
- if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
- node_is_toptier(nid))
+ if (folio_has_cpupid(folio))
last_cpupid = folio_last_cpupid(folio);
target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags);
if (target_nid == NUMA_NO_NODE)
@@ -2066,8 +2065,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
toptier)
goto unlock;
- if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
- !toptier)
+ if (!folio_has_cpupid(folio))
folio_xchg_access_time(folio,
jiffies_to_msecs(jiffies));
}
diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
index 4775b3a3dabe..7f0360d4e3a0 100644
--- a/mm/memory-tiers.c
+++ b/mm/memory-tiers.c
@@ -6,6 +6,7 @@
#include <linux/memory.h>
#include <linux/memory-tiers.h>
#include <linux/notifier.h>
+#include <linux/sched/sysctl.h>
#include "internal.h"
@@ -50,6 +51,22 @@ static const struct bus_type memory_tier_subsys = {
.dev_name = "memory_tier",
};
+/**
+ * folio_has_cpupid - check if a folio has cpupid information
+ * @folio: folio to check
+ *
+ * folio's _last_cpupid field is repurposed by memory tiering. In memory
+ * tiering mode, cpupid of slow memory folio (not toptier memory) is used to
+ * record page access time.
+ *
+ * Return: the folio _last_cpupid is used as cpupid
+ */
+bool folio_has_cpupid(struct folio *folio)
+{
+ return !(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
+ node_is_toptier(folio_nid(folio));
+}
+
#ifdef CONFIG_MIGRATION
static int top_tier_adistance;
/*
diff --git a/mm/memory.c b/mm/memory.c
index 802d0d8a40f9..105e1a0157dd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5337,8 +5337,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
* For memory tiering mode, cpupid of slow memory page is used
* to record page access time. So use default value.
*/
- if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
- !node_is_toptier(nid))
+ if (!folio_has_cpupid(folio))
last_cpupid = (-1 & LAST_CPUPID_MASK);
else
last_cpupid = folio_last_cpupid(folio);
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 222ab434da54..787c3c2bf1b6 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -161,8 +161,7 @@ static long change_pte_range(struct mmu_gather *tlb,
if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
toptier)
continue;
- if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
- !toptier)
+ if (!folio_has_cpupid(folio))
folio_xchg_access_time(folio,
jiffies_to_msecs(jiffies));
}
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-22 17:29 [PATCH v2 1/3] memory tiering: read last_cpupid correctly in do_huge_pmd_numa_page() Zi Yan
2024-07-22 17:29 ` [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check Zi Yan
@ 2024-07-22 17:29 ` Zi Yan
2024-07-23 1:48 ` Kefeng Wang
1 sibling, 1 reply; 13+ messages in thread
From: Zi Yan @ 2024-07-22 17:29 UTC (permalink / raw)
To: Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, Kefeng Wang,
linux-kernel, Zi Yan
memory tiering can be enabled/disabled at runtime and
sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
it. In migrate_misplaced_folio(), the check is missing when
PGPROMOTE_SUCCESS is incremented. Add the missing check.
Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
mm/migrate.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index bdbb5bb04c91..b819809da470 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
putback_movable_pages(&migratepages);
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
- if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
+ if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
+ && !node_is_toptier(folio_nid(folio))
+ && node_is_toptier(node))
mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
nr_succeeded);
}
--
2.43.0
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-22 17:29 ` [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled Zi Yan
@ 2024-07-23 1:48 ` Kefeng Wang
2024-07-23 1:54 ` Zi Yan
0 siblings, 1 reply; 13+ messages in thread
From: Kefeng Wang @ 2024-07-23 1:48 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, linux-kernel
On 2024/7/23 1:29, Zi Yan wrote:
> memory tiering can be enabled/disabled at runtime and
> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
> it. In migrate_misplaced_folio(), the check is missing when
> PGPROMOTE_SUCCESS is incremented. Add the missing check.
>
> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> Signed-off-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
> mm/migrate.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index bdbb5bb04c91..b819809da470 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> putback_movable_pages(&migratepages);
> if (nr_succeeded) {
> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> + && !node_is_toptier(folio_nid(folio))
> + && node_is_toptier(node))
> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
> nr_succeeded);
The should be in advance of patch2, and change above to use
folio_has_cpupid() helper() too.
> }
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 1:48 ` Kefeng Wang
@ 2024-07-23 1:54 ` Zi Yan
2024-07-23 3:24 ` Kefeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Zi Yan @ 2024-07-23 1:54 UTC (permalink / raw)
To: Kefeng Wang, Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1903 bytes --]
On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
>
>
> On 2024/7/23 1:29, Zi Yan wrote:
> > memory tiering can be enabled/disabled at runtime and
> > sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
> > it. In migrate_misplaced_folio(), the check is missing when
> > PGPROMOTE_SUCCESS is incremented. Add the missing check.
> >
> > Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> > Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
> > Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> > Signed-off-by: Zi Yan <ziy@nvidia.com>
>
> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>
Thanks.
> > ---
> > mm/migrate.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index bdbb5bb04c91..b819809da470 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> > putback_movable_pages(&migratepages);
> > if (nr_succeeded) {
> > count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
> > - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> > + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> > + && !node_is_toptier(folio_nid(folio))
> > + && node_is_toptier(node))
> > mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
> > nr_succeeded);
>
> The should be in advance of patch2, and change above to use
> folio_has_cpupid() helper() too.
It shares the same logic of !folio_has_cpupid() but it might be confusing to
put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
cpupid has nothing to do with the stats here, thus I did not use the
function.
--
Best Regards,
Yan, Zi
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 1:54 ` Zi Yan
@ 2024-07-23 3:24 ` Kefeng Wang
2024-07-23 5:46 ` Huang, Ying
2024-07-23 10:17 ` David Hildenbrand
0 siblings, 2 replies; 13+ messages in thread
From: Kefeng Wang @ 2024-07-23 3:24 UTC (permalink / raw)
To: Zi Yan, Andrew Morton, linux-mm
Cc: David Hildenbrand, Huang, Ying, Baolin Wang, linux-kernel
On 2024/7/23 9:54, Zi Yan wrote:
> On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
>>
>>
>> On 2024/7/23 1:29, Zi Yan wrote:
>>> memory tiering can be enabled/disabled at runtime and
>>> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
>>> it. In migrate_misplaced_folio(), the check is missing when
>>> PGPROMOTE_SUCCESS is incremented. Add the missing check.
>>>
>>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>
>> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>
> Thanks.
>
>>> ---
>>> mm/migrate.c | 4 +++-
>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index bdbb5bb04c91..b819809da470 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>>> putback_movable_pages(&migratepages);
>>> if (nr_succeeded) {
>>> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>>> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
>>> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>>> + && !node_is_toptier(folio_nid(folio))
>>> + && node_is_toptier(node))
>>> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
>>> nr_succeeded);
>>
>> The should be in advance of patch2, and change above to use
>> folio_has_cpupid() helper() too.
>
> It shares the same logic of !folio_has_cpupid() but it might be confusing to
> put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
> cpupid has nothing to do with the stats here, thus I did not use the
> function.
If folio don't include access time, we do migrate it but it isn't a
promotion, so don't count it, other comments?
PS: Could we rename folio_has_cpupid() to folio_has_access_time(), even
without memory_tiering, we still have cpupid in folio, right?
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 3:24 ` Kefeng Wang
@ 2024-07-23 5:46 ` Huang, Ying
2024-07-23 10:17 ` David Hildenbrand
1 sibling, 0 replies; 13+ messages in thread
From: Huang, Ying @ 2024-07-23 5:46 UTC (permalink / raw)
To: Kefeng Wang
Cc: Zi Yan, Andrew Morton, linux-mm, David Hildenbrand, Baolin Wang,
linux-kernel
Kefeng Wang <wangkefeng.wang@huawei.com> writes:
> On 2024/7/23 9:54, Zi Yan wrote:
>> On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/7/23 1:29, Zi Yan wrote:
>>>> memory tiering can be enabled/disabled at runtime and
>>>> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
>>>> it. In migrate_misplaced_folio(), the check is missing when
>>>> PGPROMOTE_SUCCESS is incremented. Add the missing check.
>>>>
>>>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
>>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>
>>> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>
>> Thanks.
>>
>>>> ---
>>>> mm/migrate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index bdbb5bb04c91..b819809da470 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>>>> putback_movable_pages(&migratepages);
>>>> if (nr_succeeded) {
>>>> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>>>> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
>>>> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>>>> + && !node_is_toptier(folio_nid(folio))
>>>> + && node_is_toptier(node))
>>>> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
>>>> nr_succeeded);
>>>
>>> The should be in advance of patch2, and change above to use
>>> folio_has_cpupid() helper() too.
>> It shares the same logic of !folio_has_cpupid() but it might be
>> confusing to
>> put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
>> cpupid has nothing to do with the stats here, thus I did not use the
>> function.
>
> If folio don't include access time, we do migrate it but it isn't a
> promotion, so don't count it, other comments?
Personally, I prefer to use !node_is_toptier() && node_is_toptier()
here. That sounds more natural for me.
> PS: Could we rename folio_has_cpupid() to folio_has_access_time(),
> even without memory_tiering, we still have cpupid in folio, right?
--
Best Regards,
Huang, Ying
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check
2024-07-22 17:29 ` [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check Zi Yan
@ 2024-07-23 5:54 ` Lorenzo Stoakes
2024-07-23 10:14 ` David Hildenbrand
0 siblings, 1 reply; 13+ messages in thread
From: Lorenzo Stoakes @ 2024-07-23 5:54 UTC (permalink / raw)
To: Zi Yan
Cc: Andrew Morton, linux-mm, David Hildenbrand, Huang, Ying,
Baolin Wang, Kefeng Wang, linux-kernel
On Mon, Jul 22, 2024 at 01:29:16PM GMT, Zi Yan wrote:
> Instead of open coded check for if memory tiering mode is on and a folio
> is in the top tier memory, use a function to encapsulate the check.
>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Reviewed-by: "Huang, Ying" <ying.huang@intel.com>
> ---
> include/linux/mm.h | 6 ++++++
> kernel/sched/fair.c | 3 +--
> mm/huge_memory.c | 6 ++----
> mm/memory-tiers.c | 17 +++++++++++++++++
> mm/memory.c | 3 +--
> mm/mprotect.c | 3 +--
> 6 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index c227f22ba810..048b2a56d8a3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1738,6 +1738,8 @@ static inline void vma_set_access_pid_bit(struct vm_area_struct *vma)
> __set_bit(pid_bit, &vma->numab_state->pids_active[1]);
> }
> }
> +
> +bool folio_has_cpupid(struct folio *folio);
> #else /* !CONFIG_NUMA_BALANCING */
> static inline int folio_xchg_last_cpupid(struct folio *folio, int cpupid)
> {
> @@ -1791,6 +1793,10 @@ static inline bool cpupid_match_pid(struct task_struct *task, int cpupid)
> static inline void vma_set_access_pid_bit(struct vm_area_struct *vma)
> {
> }
> +static inline bool folio_has_cpupid(struct folio *folio)
> +{
> + return true;
> +}
> #endif /* CONFIG_NUMA_BALANCING */
>
> #if defined(CONFIG_KASAN_SW_TAGS) || defined(CONFIG_KASAN_HW_TAGS)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..03de808cb3cc 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -1840,8 +1840,7 @@ bool should_numa_migrate_memory(struct task_struct *p, struct folio *folio,
> * The pages in slow memory node should be migrated according
> * to hot/cold instead of private/shared.
> */
> - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
> - !node_is_toptier(src_nid)) {
> + if (!folio_has_cpupid(folio)) {
> struct pglist_data *pgdat;
> unsigned long rate_limit;
> unsigned int latency, th, def_th;
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 825317aee88e..d925a93bb9ed 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1712,8 +1712,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
> * For memory tiering mode, cpupid of slow memory page is used
> * to record page access time. So use default value.
> */
> - if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
> - node_is_toptier(nid))
> + if (folio_has_cpupid(folio))
> last_cpupid = folio_last_cpupid(folio);
> target_nid = numa_migrate_prep(folio, vmf, haddr, nid, &flags);
> if (target_nid == NUMA_NO_NODE)
> @@ -2066,8 +2065,7 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> toptier)
> goto unlock;
>
> - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
> - !toptier)
> + if (!folio_has_cpupid(folio))
> folio_xchg_access_time(folio,
> jiffies_to_msecs(jiffies));
> }
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 4775b3a3dabe..7f0360d4e3a0 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -6,6 +6,7 @@
> #include <linux/memory.h>
> #include <linux/memory-tiers.h>
> #include <linux/notifier.h>
> +#include <linux/sched/sysctl.h>
>
> #include "internal.h"
>
> @@ -50,6 +51,22 @@ static const struct bus_type memory_tier_subsys = {
> .dev_name = "memory_tier",
> };
>
> +/**
> + * folio_has_cpupid - check if a folio has cpupid information
> + * @folio: folio to check
> + *
> + * folio's _last_cpupid field is repurposed by memory tiering. In memory
> + * tiering mode, cpupid of slow memory folio (not toptier memory) is used to
> + * record page access time.
> + *
> + * Return: the folio _last_cpupid is used as cpupid
> + */
> +bool folio_has_cpupid(struct folio *folio)
> +{
> + return !(sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) ||
> + node_is_toptier(folio_nid(folio));
> +}
> +
The static version of folio_has_cpupid() is defined in include/linux/mm.h
if !CONFIG_NUMA_BALANCING but you define the function in memory-tiers.c
unconditionally, a file that is compiled predicated on CONFIG_NUMA.
So a config with !CONFIG_NUMA_BALANCING but CONFIG_NUMA set results in a
compilation error (I just hit it this morning in mm-unstable).
A minimal fix for this is to wrap the declaration in:
#ifdef CONFIG_NUMA_BALANCING
...
#endif
I've tried this locally and it resolves the issue.
> #ifdef CONFIG_MIGRATION
> static int top_tier_adistance;
> /*
> diff --git a/mm/memory.c b/mm/memory.c
> index 802d0d8a40f9..105e1a0157dd 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5337,8 +5337,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
> * For memory tiering mode, cpupid of slow memory page is used
> * to record page access time. So use default value.
> */
> - if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING) &&
> - !node_is_toptier(nid))
> + if (!folio_has_cpupid(folio))
> last_cpupid = (-1 & LAST_CPUPID_MASK);
> else
> last_cpupid = folio_last_cpupid(folio);
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 222ab434da54..787c3c2bf1b6 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -161,8 +161,7 @@ static long change_pte_range(struct mmu_gather *tlb,
> if (!(sysctl_numa_balancing_mode & NUMA_BALANCING_NORMAL) &&
> toptier)
> continue;
> - if (sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING &&
> - !toptier)
> + if (!folio_has_cpupid(folio))
> folio_xchg_access_time(folio,
> jiffies_to_msecs(jiffies));
> }
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check
2024-07-23 5:54 ` Lorenzo Stoakes
@ 2024-07-23 10:14 ` David Hildenbrand
2024-07-23 12:55 ` Zi Yan
0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2024-07-23 10:14 UTC (permalink / raw)
To: Lorenzo Stoakes, Zi Yan
Cc: Andrew Morton, linux-mm, Huang, Ying, Baolin Wang, Kefeng Wang,
linux-kernel
>
> The static version of folio_has_cpupid() is defined in include/linux/mm.h
> if !CONFIG_NUMA_BALANCING but you define the function in memory-tiers.c
> unconditionally, a file that is compiled predicated on CONFIG_NUMA.
>
> So a config with !CONFIG_NUMA_BALANCING but CONFIG_NUMA set results in a
> compilation error (I just hit it this morning in mm-unstable).
>
> A minimal fix for this is to wrap the declaration in:
>
> #ifdef CONFIG_NUMA_BALANCING
> ...
> #endif
>
> I've tried this locally and it resolves the issue.
Agreed, with that
Acked-by: David Hildenbrand <david@redhat.com>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 3:24 ` Kefeng Wang
2024-07-23 5:46 ` Huang, Ying
@ 2024-07-23 10:17 ` David Hildenbrand
2024-07-23 13:03 ` Zi Yan
1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2024-07-23 10:17 UTC (permalink / raw)
To: Kefeng Wang, Zi Yan, Andrew Morton, linux-mm
Cc: Huang, Ying, Baolin Wang, linux-kernel
On 23.07.24 05:24, Kefeng Wang wrote:
>
>
> On 2024/7/23 9:54, Zi Yan wrote:
>> On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/7/23 1:29, Zi Yan wrote:
>>>> memory tiering can be enabled/disabled at runtime and
>>>> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
>>>> it. In migrate_misplaced_folio(), the check is missing when
>>>> PGPROMOTE_SUCCESS is incremented. Add the missing check.
>>>>
>>>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
>>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>
>>> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>
>> Thanks.
>>
>>>> ---
>>>> mm/migrate.c | 4 +++-
>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index bdbb5bb04c91..b819809da470 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>>>> putback_movable_pages(&migratepages);
>>>> if (nr_succeeded) {
>>>> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>>>> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
>>>> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>>>> + && !node_is_toptier(folio_nid(folio))
>>>> + && node_is_toptier(node))
>>>> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
>>>> nr_succeeded);
>>>
>>> The should be in advance of patch2, and change above to use
>>> folio_has_cpupid() helper() too.
>>
>> It shares the same logic of !folio_has_cpupid() but it might be confusing to
>> put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
>> cpupid has nothing to do with the stats here, thus I did not use the
>> function.
>
> If folio don't include access time, we do migrate it but it isn't a
> promotion, so don't count it, other comments?
>
> PS: Could we rename folio_has_cpupid() to folio_has_access_time(), even
> without memory_tiering, we still have cpupid in folio, right?
Maybe call it "folio_use_cpupid()" or sth like that? The "has" is a bit
misleading, because the folio has a cpuid in any case, no?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check
2024-07-23 10:14 ` David Hildenbrand
@ 2024-07-23 12:55 ` Zi Yan
0 siblings, 0 replies; 13+ messages in thread
From: Zi Yan @ 2024-07-23 12:55 UTC (permalink / raw)
To: David Hildenbrand, Lorenzo Stoakes
Cc: Andrew Morton, linux-mm, Huang, Ying, Baolin Wang, Kefeng Wang,
linux-kernel
[-- Attachment #1: Type: text/plain, Size: 786 bytes --]
On Tue Jul 23, 2024 at 6:14 AM EDT, David Hildenbrand wrote:
> >
> > The static version of folio_has_cpupid() is defined in include/linux/mm.h
> > if !CONFIG_NUMA_BALANCING but you define the function in memory-tiers.c
> > unconditionally, a file that is compiled predicated on CONFIG_NUMA.
> >
> > So a config with !CONFIG_NUMA_BALANCING but CONFIG_NUMA set results in a
> > compilation error (I just hit it this morning in mm-unstable).
> >
> > A minimal fix for this is to wrap the declaration in:
> >
> > #ifdef CONFIG_NUMA_BALANCING
> > ...
> > #endif
> >
> > I've tried this locally and it resolves the issue.
Will fix it. Thanks.
>
> Agreed, with that
>
> Acked-by: David Hildenbrand <david@redhat.com>
Thanks.
--
Best Regards,
Yan, Zi
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 10:17 ` David Hildenbrand
@ 2024-07-23 13:03 ` Zi Yan
2024-07-24 1:22 ` Kefeng Wang
0 siblings, 1 reply; 13+ messages in thread
From: Zi Yan @ 2024-07-23 13:03 UTC (permalink / raw)
To: David Hildenbrand, Kefeng Wang, Andrew Morton, linux-mm
Cc: Huang, Ying, Baolin Wang, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 3125 bytes --]
On Tue Jul 23, 2024 at 6:17 AM EDT, David Hildenbrand wrote:
> On 23.07.24 05:24, Kefeng Wang wrote:
> >
> >
> > On 2024/7/23 9:54, Zi Yan wrote:
> >> On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
> >>>
> >>>
> >>> On 2024/7/23 1:29, Zi Yan wrote:
> >>>> memory tiering can be enabled/disabled at runtime and
> >>>> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
> >>>> it. In migrate_misplaced_folio(), the check is missing when
> >>>> PGPROMOTE_SUCCESS is incremented. Add the missing check.
> >>>>
> >>>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
> >>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
> >>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
> >>>
> >>> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >>>
> >> Thanks.
> >>
> >>>> ---
> >>>> mm/migrate.c | 4 +++-
> >>>> 1 file changed, 3 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/mm/migrate.c b/mm/migrate.c
> >>>> index bdbb5bb04c91..b819809da470 100644
> >>>> --- a/mm/migrate.c
> >>>> +++ b/mm/migrate.c
> >>>> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
> >>>> putback_movable_pages(&migratepages);
> >>>> if (nr_succeeded) {
> >>>> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
> >>>> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
> >>>> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
> >>>> + && !node_is_toptier(folio_nid(folio))
> >>>> + && node_is_toptier(node))
> >>>> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
> >>>> nr_succeeded);
> >>>
> >>> The should be in advance of patch2, and change above to use
> >>> folio_has_cpupid() helper() too.
> >>
> >> It shares the same logic of !folio_has_cpupid() but it might be confusing to
> >> put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
> >> cpupid has nothing to do with the stats here, thus I did not use the
> >> function.
> >
> > If folio don't include access time, we do migrate it but it isn't a
> > promotion, so don't count it, other comments?
> >
> > PS: Could we rename folio_has_cpupid() to folio_has_access_time(), even
> > without memory_tiering, we still have cpupid in folio, right?
folio_has_access_time() would be the opposite of folio_has_cpupid().
If memory tiering is off (either at compile time or dynamically), a
folio has cpupid all the time.
>
> Maybe call it "folio_use_cpupid()" or sth like that? The "has" is a bit
> misleading, because the folio has a cpuid in any case, no?
The folio's cpupid field is reused to record page access time, when the folio
is !node_is_toptier() and memory tiering mode is on.
In sum, using folio_use_access_time() as !folio_has_cpupid() seems
better to me, since it covers the special use of folio's cpupid field.
Let me know your thoughts.
--
Best Regards,
Yan, Zi
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 854 bytes --]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled.
2024-07-23 13:03 ` Zi Yan
@ 2024-07-24 1:22 ` Kefeng Wang
0 siblings, 0 replies; 13+ messages in thread
From: Kefeng Wang @ 2024-07-24 1:22 UTC (permalink / raw)
To: Zi Yan, David Hildenbrand, Andrew Morton, linux-mm
Cc: Huang, Ying, Baolin Wang, linux-kernel
On 2024/7/23 21:03, Zi Yan wrote:
> On Tue Jul 23, 2024 at 6:17 AM EDT, David Hildenbrand wrote:
>> On 23.07.24 05:24, Kefeng Wang wrote:
>>>
>>>
>>> On 2024/7/23 9:54, Zi Yan wrote:
>>>> On Mon Jul 22, 2024 at 9:48 PM EDT, Kefeng Wang wrote:
>>>>>
>>>>>
>>>>> On 2024/7/23 1:29, Zi Yan wrote:
>>>>>> memory tiering can be enabled/disabled at runtime and
>>>>>> sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING is used to check
>>>>>> it. In migrate_misplaced_folio(), the check is missing when
>>>>>> PGPROMOTE_SUCCESS is incremented. Add the missing check.
>>>>>>
>>>>>> Reported-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>> Closes: https://lore.kernel.org/linux-mm/f4ae2c9c-fe40-4807-bdb2-64cf2d716c1a@huawei.com/
>>>>>> Fixes: 33024536bafd ("memory tiering: hot page selection with hint page fault latency")
>>>>>> Signed-off-by: Zi Yan <ziy@nvidia.com>
>>>>>
>>>>> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>>>>
>>>> Thanks.
>>>>
>>>>>> ---
>>>>>> mm/migrate.c | 4 +++-
>>>>>> 1 file changed, 3 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>>>> index bdbb5bb04c91..b819809da470 100644
>>>>>> --- a/mm/migrate.c
>>>>>> +++ b/mm/migrate.c
>>>>>> @@ -2630,7 +2630,9 @@ int migrate_misplaced_folio(struct folio *folio, struct vm_area_struct *vma,
>>>>>> putback_movable_pages(&migratepages);
>>>>>> if (nr_succeeded) {
>>>>>> count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
>>>>>> - if (!node_is_toptier(folio_nid(folio)) && node_is_toptier(node))
>>>>>> + if ((sysctl_numa_balancing_mode & NUMA_BALANCING_MEMORY_TIERING)
>>>>>> + && !node_is_toptier(folio_nid(folio))
>>>>>> + && node_is_toptier(node))
>>>>>> mod_node_page_state(pgdat, PGPROMOTE_SUCCESS,
>>>>>> nr_succeeded);
>>>>>
>>>>> The should be in advance of patch2, and change above to use
>>>>> folio_has_cpupid() helper() too.
>>>>
>>>> It shares the same logic of !folio_has_cpupid() but it might be confusing to
>>>> put !folio_has_cpupid(folio) && node_is_toptier(node) here. folio's
>>>> cpupid has nothing to do with the stats here, thus I did not use the
>>>> function.
>>>
>>> If folio don't include access time, we do migrate it but it isn't a
>>> promotion, so don't count it, other comments?
>>>
>>> PS: Could we rename folio_has_cpupid() to folio_has_access_time(), even
>>> without memory_tiering, we still have cpupid in folio, right?
>
> folio_has_access_time() would be the opposite of folio_has_cpupid().
> If memory tiering is off (either at compile time or dynamically), a
> folio has cpupid all the time.
>
>>
>> Maybe call it "folio_use_cpupid()" or sth like that? The "has" is a bit
>> misleading, because the folio has a cpuid in any case, no?
>
> The folio's cpupid field is reused to record page access time, when the folio
> is !node_is_toptier() and memory tiering mode is on.
>
> In sum, using folio_use_access_time() as !folio_has_cpupid() seems
> better to me, since it covers the special use of folio's cpupid field.
>
It sounds good, thanks.
> Let me know your thoughts.
>
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-07-24 1:23 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22 17:29 [PATCH v2 1/3] memory tiering: read last_cpupid correctly in do_huge_pmd_numa_page() Zi Yan
2024-07-22 17:29 ` [PATCH v2 2/3] memory tiering: introduce folio_has_cpupid() check Zi Yan
2024-07-23 5:54 ` Lorenzo Stoakes
2024-07-23 10:14 ` David Hildenbrand
2024-07-23 12:55 ` Zi Yan
2024-07-22 17:29 ` [PATCH v2 3/3] memory tiering: count PGPROMOTE_SUCCESS when mem tiering is enabled Zi Yan
2024-07-23 1:48 ` Kefeng Wang
2024-07-23 1:54 ` Zi Yan
2024-07-23 3:24 ` Kefeng Wang
2024-07-23 5:46 ` Huang, Ying
2024-07-23 10:17 ` David Hildenbrand
2024-07-23 13:03 ` Zi Yan
2024-07-24 1:22 ` Kefeng Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox