Re: [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Ying Huang <ying.huang@intel.com>
To: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>,
	linux-mm@kvack.org,  akpm@linux-foundation.org
Cc: Wei Xu <weixugc@google.com>, Greg Thelen <gthelen@google.com>,
	Yang Shi <shy828301@gmail.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Tim C Chen <tim.c.chen@intel.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Michal Hocko <mhocko@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	 Hesham Almatary <hesham.almatary@huawei.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Alistair Popple <apopple@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Feng Tang <feng.tang@intel.com>,
	Jagdish Gediya <jvgediya@linux.ibm.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	David Rientjes <rientjes@google.com>
Subject: Re: [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers
Date: Wed, 08 Jun 2022 16:32:46 +0800	[thread overview]
Message-ID: <cc9566421dedf10b5b7149d093992797540c31e2.camel@intel.com> (raw)
In-Reply-To: <232817e0-24fd-e022-6c92-c260f7f01f8a@linux.ibm.com>

On Wed, 2022-06-08 at 13:58 +0530, Aneesh Kumar K V wrote:
> On 6/8/22 12:56 PM, Ying Huang wrote:
> > On Mon, 2022-06-06 at 14:03 +0530, Aneesh Kumar K V wrote:
> > > On 6/6/22 12:54 PM, Ying Huang wrote:
> > > > On Mon, 2022-06-06 at 09:22 +0530, Aneesh Kumar K V wrote:
> > > > > On 6/6/22 8:41 AM, Ying Huang wrote:
> > > > > > On Fri, 2022-06-03 at 19:12 +0530, Aneesh Kumar K.V wrote:
> > > > > > > With memory tiers support we can have memory on NUMA nodes
> > > > > > > in the top tier from which we want to avoid promotion tracking NUMA
> > > > > > > faults. Update node_is_toptier to work with memory tiers. To
> > > > > > > avoid taking locks, a nodemask is maintained for all demotion
> > > > > > > targets. All NUMA nodes are by default top tier nodes and as
> > > > > > > we add new lower memory tiers NUMA nodes get added to the
> > > > > > > demotion targets thereby moving them out of the top tier.
> > > > > > 
> > > > > > Check the usage of node_is_toptier(),
> > > > > > 
> > > > > > - migrate_misplaced_page()
> > > > > >      node_is_toptier() is used to check whether migration is a promotion.
> > > > > > We can avoid to use it.  Just compare the rank of the nodes.
> > > > > > 
> > > > > > - change_pte_range() and change_huge_pmd()
> > > > > >      node_is_toptier() is used to avoid scanning fast memory (DRAM) pages
> > > > > > for promotion.  So I think we should change the name to node_is_fast()
> > > > > > as follows,
> > > > > > 
> > > > > > static inline bool node_is_fast(int node)
> > > > > > {
> > > > > > 	return NODE_DATA(node)->mt_rank >= MEMORY_RANK_DRAM;
> > > > > > }
> > > > > > 
> > > > > 
> > > > > But that gives special meaning to MEMORY_RANK_DRAM. As detailed in other
> > > > > patches, absolute value of rank doesn't carry any meaning. It is only
> > > > > the relative value w.r.t other memory tiers that decide whether it is
> > > > > fast or not. Agreed by default memory tiers get built with
> > > > > MEMORY_RANK_DRAM. But userspace can change the rank value of 'memtier1'
> > > > > Hence to determine a node is consisting of fast memory is essentially
> > > > > figuring out whether node is the top most tier in memory hierarchy and
> > > > > not just the memory tier rank value is >= MEMORY_RANK_DRAM?
> > > > 
> > > > In a system with 3 tiers,
> > > > 
> > > > HBM	0
> > > > DRAM	1
> > > > PMEM	2
> > > > 
> > > > In your implementation, only HBM will be considered fast.  But what we
> > > > need is to consider both HBM and DRAM fast.  Because we use NUMA
> > > > balancing to promote PMEM pages to DRAM.  It's unnecessary to scan HBM
> > > > and DRAM pages for that.  And there're no requirements to promote DRAM
> > > > pages to HBM with NUMA balancing.
> > > > 
> > > > I can understand that the memory tiers are more dynamic now.  For
> > > > requirements of NUMA balancing, we need the lowest memory tier (rank)
> > > > where there's at least one node with CPU.  The nodes in it and the
> > > > higher tiers will be considered fast.
> > > > 
> > > 
> > > is this good (not tested)?
> > > /*
> > >    * build the allowed promotion mask. Promotion is allowed
> > >    * from higher memory tier to lower memory tier only if
> > >    * lower memory tier doesn't include compute. We want to
> > >    * skip promotion from a memory tier, if any node which is
> > >    * part of that memory tier have CPUs. Once we detect such
> > >    * a memory tier, we consider that tier as top tier from
> > >    * which promotion is not allowed.
> > >    */
> > > list_for_each_entry_reverse(memtier, &memory_tiers, list) {
> > > 	nodes_and(allowed, node_state[N_CPU], memtier->nodelist);
> > > 	if (nodes_empty(allowed))
> > > 		nodes_or(promotion_mask, promotion_mask, allowed);
> > > 	else
> > > 		break;
> > > }
> > > 
> > > and then
> > > 
> > > static inline bool node_is_toptier(int node)
> > > {
> > > 
> > > 	return !node_isset(node, promotion_mask);
> > > }
> > > 
> > 
> > This should work.  But it appears unnatural.  So, I don't think we
> > should avoid to add more and more node masks to mitigate the design
> > decision that we cannot access memory tier information directly.  All
> > these becomes simple and natural, if we can access memory tier
> > information directly.
> > 
> 
> how do you derive whether node is toptier details if we have memtier 
> details in pgdat?

pgdat -> memory tier -> rank

Then we can compare this rank with the fast memory rank.  The fast
memory rank can be calculated dynamically at appropriate places.

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2022-06-08  8:32 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-03 13:42 [PATCH v5 0/9] mm/demotion: Memory tiers and demotion Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 1/9] mm/demotion: Add support for explicit memory tiers Aneesh Kumar K.V
2022-06-07 18:43   ` Tim Chen
2022-06-07 20:18     ` Wei Xu
2022-06-08  4:30     ` Aneesh Kumar K V
2022-06-08  6:06       ` Ying Huang
2022-06-08  4:37     ` Aneesh Kumar K V
2022-06-08  6:10       ` Ying Huang
2022-06-08  8:04         ` Aneesh Kumar K V
2022-06-07 21:32   ` Yang Shi
2022-06-08  1:34     ` Ying Huang
2022-06-08 16:37       ` Yang Shi
2022-06-09  6:52         ` Ying Huang
2022-06-08  4:58     ` Aneesh Kumar K V
2022-06-08  6:18       ` Ying Huang
2022-06-08 16:42       ` Yang Shi
2022-06-09  8:17         ` Aneesh Kumar K V
2022-06-09 16:04           ` Yang Shi
2022-06-08 14:11   ` Johannes Weiner
2022-06-08 14:21     ` Aneesh Kumar K V
2022-06-08 15:55     ` Johannes Weiner
2022-06-08 16:13       ` Aneesh Kumar K V
2022-06-08 18:16         ` Johannes Weiner
2022-06-09  2:33           ` Aneesh Kumar K V
2022-06-09 13:55             ` Johannes Weiner
2022-06-09 14:22               ` Jonathan Cameron
2022-06-09 20:41                 ` Johannes Weiner
2022-06-10  6:15                   ` Ying Huang
2022-06-10  9:57                   ` Jonathan Cameron
2022-06-13 14:05                     ` Johannes Weiner
2022-06-13 14:23                       ` Aneesh Kumar K V
2022-06-13 15:50                         ` Johannes Weiner
2022-06-14  6:48                           ` Ying Huang
2022-06-14  8:01                           ` Aneesh Kumar K V
2022-06-14 18:56                             ` Johannes Weiner
2022-06-15  6:23                               ` Aneesh Kumar K V
2022-06-16  1:11                               ` Ying Huang
2022-06-16  3:45                                 ` Wei Xu
2022-06-16  4:47                                   ` Aneesh Kumar K V
2022-06-16  5:51                                     ` Ying Huang
2022-06-17 10:41                                 ` Jonathan Cameron
2022-06-20  1:54                                   ` Huang, Ying
2022-06-14 16:45                       ` Jonathan Cameron
2022-06-21  8:27                         ` Aneesh Kumar K V
2022-06-03 13:42 ` [PATCH v5 2/9] mm/demotion: Expose per node memory tier to sysfs Aneesh Kumar K.V
2022-06-07 20:15   ` Tim Chen
2022-06-08  4:55     ` Aneesh Kumar K V
2022-06-08  6:42       ` Ying Huang
2022-06-08 16:06       ` Tim Chen
2022-06-08 16:15         ` Aneesh Kumar K V
2022-06-03 13:42 ` [PATCH v5 3/9] mm/demotion: Move memory demotion related code Aneesh Kumar K.V
2022-06-06 13:39   ` Bharata B Rao
2022-06-03 13:42 ` [PATCH v5 4/9] mm/demotion: Build demotion targets based on explicit memory tiers Aneesh Kumar K.V
2022-06-07 22:51   ` Tim Chen
2022-06-08  5:02     ` Aneesh Kumar K V
2022-06-08  6:52     ` Ying Huang
2022-06-08  6:50   ` Ying Huang
2022-06-08  8:19     ` Aneesh Kumar K V
2022-06-08  8:00   ` Ying Huang
2022-06-03 13:42 ` [PATCH v5 5/9] mm/demotion/dax/kmem: Set node's memory tier to MEMORY_TIER_PMEM Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 6/9] mm/demotion: Add support for removing node from demotion memory tiers Aneesh Kumar K.V
2022-06-07 23:40   ` Tim Chen
2022-06-08  6:59   ` Ying Huang
2022-06-08  8:20     ` Aneesh Kumar K V
2022-06-08  8:23       ` Ying Huang
2022-06-08  8:29         ` Aneesh Kumar K V
2022-06-08  8:34           ` Ying Huang
2022-06-03 13:42 ` [PATCH v5 7/9] mm/demotion: Demote pages according to allocation fallback order Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 8/9] mm/demotion: Add documentation for memory tiering Aneesh Kumar K.V
2022-06-03 13:42 ` [PATCH v5 9/9] mm/demotion: Update node_is_toptier to work with memory tiers Aneesh Kumar K.V
2022-06-06  3:11   ` Ying Huang
2022-06-06  3:52     ` Aneesh Kumar K V
2022-06-06  7:24       ` Ying Huang
2022-06-06  8:33         ` Aneesh Kumar K V
2022-06-08  7:26           ` Ying Huang
2022-06-08  8:28             ` Aneesh Kumar K V
2022-06-08  8:32               ` Ying Huang [this message]
2022-06-08 14:37                 ` Aneesh Kumar K.V
2022-06-08 20:14                   ` Tim Chen
2022-06-10  6:04                   ` Ying Huang
2022-06-06  4:53 ` [PATCH] mm/demotion: Add sysfs ABI documentation Aneesh Kumar K.V
2022-06-08 13:57 ` [PATCH v5 0/9] mm/demotion: Memory tiers and demotion Johannes Weiner
2022-06-08 14:20   ` Aneesh Kumar K V
2022-06-09  8:53     ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cc9566421dedf10b5b7149d093992797540c31e2.camel@intel.com \
    --to=ying.huang@intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=brice.goglin@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=feng.tang@intel.com \
    --cc=gthelen@google.com \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=weixugc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox