linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Donet Tom <donettom@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Aneesh Kumar <aneesh.kumar@kernel.org>,
	Michal Hocko <mhocko@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Mel Gorman <mgorman@suse.de>, Feng Tang <feng.tang@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>, Rik van Riel <riel@surriel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Matthew Wilcox <willy@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dan Williams <dan.j.williams@intel.com>,
	Hugh Dickins <hughd@google.com>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy
Date: Mon, 25 Mar 2024 10:32:18 +0530	[thread overview]
Message-ID: <61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com> (raw)
In-Reply-To: <875xxbqb51.fsf@yhuang6-desk2.ccr.corp.intel.com>


On 3/25/24 08:18, Huang, Ying wrote:
> Donet Tom <donettom@linux.ibm.com> writes:
>
>> On 3/22/24 14:02, Huang, Ying wrote:
>>> Donet Tom <donettom@linux.ibm.com> writes:
>>>
>>>> commit bda420b98505 ("numa balancing: migrate on fault among multiple bound
>>>> nodes") added support for migrate on protnone reference with MPOL_BIND
>>>> memory policy. This allowed numa fault migration when the executing node
>>>> is part of the policy mask for MPOL_BIND. This patch extends migration
>>>> support to MPOL_PREFERRED_MANY policy.
>>>>
>>>> Currently, we cannot specify MPOL_PREFERRED_MANY with the mempolicy flag
>>>> MPOL_F_NUMA_BALANCING. This causes issues when we want to use
>>>> NUMA_BALANCING_MEMORY_TIERING. To effectively use the slow memory tier,
>>>> the kernel should not allocate pages from the slower memory tier via
>>>> allocation control zonelist fallback. Instead, we should move cold pages
>>>> from the faster memory node via memory demotion. For a page allocation,
>>>> kswapd is only woken up after we try to allocate pages from all nodes in
>>>> the allocation zone list. This implies that, without using memory
>>>> policies, we will end up allocating hot pages in the slower memory tier.
>>>>
>>>> MPOL_PREFERRED_MANY was added by commit b27abaccf8e8 ("mm/mempolicy: add
>>>> MPOL_PREFERRED_MANY for multiple preferred nodes") to allow better
>>>> allocation control when we have memory tiers in the system. With
>>>> MPOL_PREFERRED_MANY, the user can use a policy node mask consisting only
>>>> of faster memory nodes. When we fail to allocate pages from the faster
>>>> memory node, kswapd would be woken up, allowing demotion of cold pages
>>>> to slower memory nodes.
>>>>
>>>> With the current kernel, such usage of memory policies implies we can't
>>>> do page promotion from a slower memory tier to a faster memory tier
>>>> using numa fault. This patch fixes this issue.
>>>>
>>>> For MPOL_PREFERRED_MANY, if the executing node is in the policy node
>>>> mask, we allow numa migration to the executing nodes. If the executing
>>>> node is not in the policy node mask, we do not allow numa migration.
>>> Can we provide more information about this?  I suggest to use an
>>> example, for instance, pages may be distributed among multiple sockets
>>> unexpectedly.
>> Thank you for your suggestion. However, this commit message explains all the scenarios.
> Yes.  The commit message is correct and covers many cases.  What I
> suggested is to describe why we do that?  An examples can not covers all
> possibility, but it is easy to be understood.  For example, something as
> below?
>
> For example, on a 2-sockets system, there are N0, N1, N2 in socket 0, N3
> in socket 1.  N0, N1, N3 have fast memory and CPU, while N2 has slow
> memory and no CPU.  For a workload, we may use MPOL_PREFERRED_MANY with
> nodemask with N0 and N1 set because the workload runs on CPUs of socket
> 0 at most times.  Then, even if the workload runs on CPUs of N3
> occasionally, we will not try to migrate the workload pages from N2 to
> N3 because users may want to avoid cross-socket access as much as
> possible in the long term.
>
>> For example, Consider a system with 3 numa nodes (N0,N1 and N6).
>> N0 and N1 are tier1 DRAM nodes  and N6 is tier 2 PMEM node.
>>
>> Scenario 1: The process is executing on N1,
>>              If the executing node is in the policy node mask,
>>              Curr Loc Pages - The numa node where page present(folio node)
>> ==================================================================================
>> Process      Policy          Curr Loc Pages                 Observations
>> -----------------------------------------------------------------------------------
>> N1           N0 N1 N6              N0                   Pages Migrated from N0 to N1
>> N1           N0 N1 N6              N6                   Pages Migrated from N6 to N1
>> N1           N0 N1                 N1                   Pages Migrated from N1 to N6
> Pages are not Migrating ?

Sorry .This is a mistake. In this case Pages are not migrating.

Thanks
Donet.

>
>> N1           N0 N1                 N6                   Pages Migrated from N6 to N1
>> ------------------------------------------------------------------------------------
>> Scenario 2:  The process is executing on N1,
>>               If the executing node is NOT in the policy node mask,
>>               Curr Loc Pages - The numa node where page present(folio node)
>> ===================================================================================
>> Process       Policy       Curr Loc Pages       Observations
>> -----------------------------------------------------------------------------------
>> N1            N0 N6             N0              Pages are not Migrating
>> N1            N0 N6             N6              Pages are not migration,
>> N1            N0                N0              Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> Scenario 3: The process is executing on N1,
>>              If the executing node and folio nodes are  NOT in the policy node mask,
>>              Curr Loc Pages - The numa node where page present (folio node)
>> ====================================================================================
>> Thread    Policy       Curr Loc Pages           Observations
>> ------------------------------------------------------------------------------------
>> N1          N0               N6                 Pages are not Migrating
>> N1          N6               N0                 Pages are not Migrating
>> ------------------------------------------------------------------------------------
>>
>> We can conclude that even if the pages are distributed among multiple sockets,
>> if the executing node is in the policy node mask, we allow numa migration to the
>> executing nodes. If the executing node is not in the policy node mask,
>> we do not allow numa migration.
>>
> [snip]
>
> --
> Best Regards,
> Huang, Ying


      parent reply	other threads:[~2024-03-25  5:03 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1711002865.git.donettom@linux.ibm.com>
2024-03-21 11:29 ` [PATCH v3 1/2] mm/mempolicy: Use numa_node_id() instead of cpu_to_node() Donet Tom
2024-03-21 11:29 ` [PATCH v3 2/2] mm/numa_balancing:Allow migrate on protnone reference with MPOL_PREFERRED_MANY policy Donet Tom
2024-03-22  8:32   ` Huang, Ying
2024-03-22 10:05     ` Donet Tom
2024-03-25  2:48       ` Huang, Ying
2024-03-25  5:00         ` Donet Tom
2024-03-25  5:02         ` Donet Tom [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=61054afa-9f18-45f1-987d-e6f242012096@linux.ibm.com \
    --to=donettom@linux.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox