From: Donet Tom <donettom@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Aneesh Kumar <aneesh.kumar@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Mel Gorman <mgorman@suse.de>, Feng Tang <feng.tang@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@surriel.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Matthew Wilcox <willy@infradead.org>,
Vlastimil Babka <vbabka@suse.cz>,
Dan Williams <dan.j.williams@intel.com>,
Hugh Dickins <hughd@google.com>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy
Date: Mon, 25 Mar 2024 10:32:18 +0530 [thread overview]
Message-ID: <61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com> (raw)
In-Reply-To: <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 3/25/24 08:18, Huang, Ying wrote:
> Donet Tom <donettom@linux.ibm.com> writes:
>
>> On 3/22/24 14:02, Huang, Ying wrote:
>>> Donet Tom <donettom@linux.ibm.com> writes:
>>>
>>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
>>>> nodes") added support for migrate on protnone reference with MPOL_BIND
>>>> memory policy. This allowed numa fault migration when the executing node
>>>> is part of the policy mask for MPOL_BIND. This patch extends migration
>>>> support to MPOL_PREFERRED_MANY policy.
>>>>
>>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag
>>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use
>>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier,
>>>> the kernel should not allocate pages from the slower memory tier via
>>>> allocation control zonelist fallback. Instead, we should move cold pages
>>>> from the faster memory node via memory demotion. For a page allocation,
>>>> kswapd is only woken up after we try to allocate pages from all nodes in
>>>> the allocation zone list. This implies that, without using memory
>>>> policies, we will end up allocating hot pages in the slower memory tier.
>>>>
>>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add
>>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better
>>>> allocation control when we have memory tiers in the system. With
>>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only
>>>> of faster memory nodes. When we fail to allocate pages from the faster
>>>> memory node, kswapd would be woken up, allowing demotion of cold pages
>>>> to slower memory nodes.
>>>>
>>>> With the current kernel, such usage of memory policies implies we can't
>>>> do page promotion from a slower memory tier to a faster memory tier
>>>> using numa fault. This patch fixes this issue.
>>>>
>>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node
>>>> mask, we allow numa migration to the executing nodes. If the executing
>>>> node is not in the policy node mask, we do not allow numa migration.
>>> Can we provide more information about this? I suggest to use an
>>> example, for instance, pages may be distributed among multiple sockets
>>> unexpectedly.
>> Thank you for your suggestion. However, this commit message explains all the scenarios.
> Yes. The commit message is correct and covers many cases. What I
> suggested is to describe why we do that? An examples can not covers all
> possibility, but it is easy to be understood. For example, something as
> below?
>
> For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3
> in socket 1. N0, N1, N3 have fast memory and CPU, while N2 has slow
> memory and no CPU. For a workload, we may use MPOL_PREFERRED_MANY with
> nodemask with N0 and N1 set because the workload runs on CPUs of socket
> 0 at most times. Then, even if the workload runs on CPUs of N3
> occasionally, we will not try to migrate the workload pages from N2 to
> N3 because users may want to avoid cross-socket access as much as
> possible in the long term.
>
>> For example, Consider a system with 3 numa nodes (N0,N1 and N6).
>> N0 and N1 are tier1 DRAM nodes and N6 is tier 2 PMEM node.
>>
>> Scenario 1: The process is executing on N1,
>> If the executing node is in the policy node mask,
>> Curr Loc Pages - The numa node where page present(folio node)
>> ==================================================================================
>> Process Policy Curr Loc Pages Observations
>> -----------------------------------------------------------------------------------
>> N1 N0 N1 N6 N0 Pages Migrated from N0 to N1
>> N1 N0 N1 N6 N6 Pages Migrated from N6 to N1
>> N1 N0 N1 N1 Pages Migrated from N1 to N6
> Pages are not Migrating ?
Sorry .This is a mistake. In this case Pages are not migrating.
Thanks
Donet.
>
>> N1 N0 N1 N6 Pages Migrated from N6 to N1
>> ------------------------------------------------------------------------------------
>> Scenario 2: The process is executing on N1,
>> If the executing node is NOT in the policy node mask,
>> Curr Loc Pages - The numa node where page present(folio node)
>> ===================================================================================
>> Process Policy Curr Loc Pages Observations
>> -----------------------------------------------------------------------------------
>> N1 N0 N6 N0 Pages are not Migrating
>> N1 N0 N6 N6 Pages are not migration,
>> N1 N0 N0 Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> Scenario 3: The process is executing on N1,
>> If the executing node and folio nodes are NOT in the policy node mask,
>> Curr Loc Pages - The numa node where page present (folio node)
>> ====================================================================================
>> Thread Policy Curr Loc Pages Observations
>> ------------------------------------------------------------------------------------
>> N1 N0 N6 Pages are not Migrating
>> N1 N6 N0 Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> We can conclude that even if the pages are distributed among multiple sockets,
>> if the executing node is in the policy node mask, we allow numa migration to the
>> executing nodes. If the executing node is not in the policy node mask,
>> we do not allow numa migration.
>>
> [snip]
>
> --
> Best Regards,
> Huang, Ying
prev parent reply other threads:[~2024-03-25 5:03 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <cover.1711002865.git.donettom@linux.ibm.com>
2024-03-21 11:29 ` [PATCH v3 1/2] mm/mempolicy: Use numa_node_id() instead of cpu_to_node() Donet Tom
2024-03-21 11:29 ` [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Donet Tom
2024-03-22 8:32 ` Huang, Ying
2024-03-22 10:05 ` Donet Tom
2024-03-25 2:48 ` Huang, Ying
2024-03-25 5:00 ` Donet Tom
2024-03-25 5:02 ` Donet Tom [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com \
--to=donettom@linux.ibm.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=feng.tang@intel.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=riel@surriel.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox