linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jonghyeon Kim <tome01@ajou.ac.kr>
To: David Hildenbrand <david@redhat.com>
Cc: dan.j.williams@intel.com, vishal.l.verma@intel.com,
	dave.jiang@intel.com, akpm@linux-foundation.org,
	nvdimm@lists.linux.dev, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 1/2] mm/memory_hotplug: Export shrink span functions for zone and node
Date: Thu, 27 Jan 2022 18:41:42 +0900	[thread overview]
Message-ID: <20220127094142.GA31409@swarm08> (raw)
In-Reply-To: <5d02ea0e-aca6-a64b-23de-bc9307572d17@redhat.com>

On Wed, Jan 26, 2022 at 06:04:50PM +0100, David Hildenbrand wrote:
> On 26.01.22 18:00, Jonghyeon Kim wrote:
> > Export shrink_zone_span() and update_pgdat_span() functions to head
> > file. We need to update real number of spanned pages for NUMA nodes and
> > zones when we add memory device node such as device dax memory.
> > 
> 
> Can you elaborate a bit more what you intend to fix?
> 
> Memory onlining/offlining is reponsible for updating the node/zone span,
> and that's triggered when the dax/kmem mamory gets onlined/offlined.
> 
Sure, sorry for the lack of explanation of the intended fix.

Before onlining nvdimm memory using dax(devdax or fsdax), these memory belong to
cpu NUMA nodes, which extends span pages of node/zone as a ZONE_DEVICE. So there
is no problem because node/zone contain these additional non-visible memory
devices to the system.
But, if we online dax-memory, zone[ZONE_DEVICE] of CPU NUMA node is hot-plugged
to new NUMA node(but CPU-less). I think there is no need to hold
zone[ZONE_DEVICE] pages on the original node.

Additionally, spanned pages are also used to calculate the end pfn of a node.
Thus, it is needed to maintain accurate page stats for node/zone.

My machine contains two CPU-socket consisting of DRAM and Intel DCPMM
(DC persistent memory modules) with App-Direct mode.

Below are my test results.

Before memory onlining:

	# ndctl create-namespace --mode=devdax
	# ndctl create-namespace --mode=devdax
	# cat /proc/zoneinfo | grep -E "Node|spanned" | paste - -
	Node 0, zone      DMA	        spanned  4095
	Node 0, zone    DMA32	        spanned  1044480
	Node 0, zone   Normal	        spanned  7864320
	Node 0, zone  Movable	        spanned  0
	Node 0, zone   Device	        spanned  66060288
	Node 1, zone      DMA	        spanned  0
	Node 1, zone    DMA32	        spanned  0
	Node 1, zone   Normal	        spanned  8388608
	Node 1, zone  Movable	        spanned  0
	Node 1, zone   Device	        spanned  66060288

After memory onlining:

	# daxctl reconfigure-device --mode=system-ram --no-online dax0.0
	# daxctl reconfigure-device --mode=system-ram --no-online dax1.0

	# cat /proc/zoneinfo | grep -E "Node|spanned" | paste - -
	Node 0, zone      DMA	        spanned  4095
	Node 0, zone    DMA32	        spanned  1044480
	Node 0, zone   Normal	        spanned  7864320
	Node 0, zone  Movable	        spanned  0
	Node 0, zone   Device	        spanned  66060288
	Node 1, zone      DMA	        spanned  0
	Node 1, zone    DMA32	        spanned  0
	Node 1, zone   Normal	        spanned  8388608
	Node 1, zone  Movable	        spanned  0
	Node 1, zone   Device	        spanned  66060288
	Node 2, zone      DMA	        spanned  0
	Node 2, zone    DMA32	        spanned  0
	Node 2, zone   Normal	        spanned  65011712
	Node 2, zone  Movable	        spanned  0
	Node 2, zone   Device	        spanned  0
	Node 3, zone      DMA	        spanned  0
	Node 3, zone    DMA32	        spanned  0
	Node 3, zone   Normal	        spanned  65011712
	Node 3, zone  Movable	        spanned  0
	Node 3, zone   Device	        spanned  0

As we can see, Node 0 and 1 still have zone_device pages after memory onlining.
This causes problem that Node 0 and Node 2 have same end of pfn values, also 
Node 1 and Node 3 have same problem.

> > Signed-off-by: Jonghyeon Kim <tome01@ajou.ac.kr>
> > ---
> >  include/linux/memory_hotplug.h | 3 +++
> >  mm/memory_hotplug.c            | 6 ++++--
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> > index be48e003a518..25c7f60c317e 100644
> > --- a/include/linux/memory_hotplug.h
> > +++ b/include/linux/memory_hotplug.h
> > @@ -337,6 +337,9 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn,
> >  extern void remove_pfn_range_from_zone(struct zone *zone,
> >  				       unsigned long start_pfn,
> >  				       unsigned long nr_pages);
> > +extern void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> > +			     unsigned long end_pfn);
> > +extern void update_pgdat_span(struct pglist_data *pgdat);
> >  extern bool is_memblock_offlined(struct memory_block *mem);
> >  extern int sparse_add_section(int nid, unsigned long pfn,
> >  		unsigned long nr_pages, struct vmem_altmap *altmap);
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 2a9627dc784c..38f46a9ef853 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -389,7 +389,7 @@ static unsigned long find_biggest_section_pfn(int nid, struct zone *zone,
> >  	return 0;
> >  }
> >  
> > -static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> > +void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> >  			     unsigned long end_pfn)
> >  {
> >  	unsigned long pfn;
> > @@ -428,8 +428,9 @@ static void shrink_zone_span(struct zone *zone, unsigned long start_pfn,
> >  		}
> >  	}
> >  }
> > +EXPORT_SYMBOL_GPL(shrink_zone_span);
> 
> Exporting both as symbols feels very wrong. This is memory
> onlining/offlining internal stuff.

I agree with you that your comment. I will find another approach to avoid
directly using onlining/offlining internal stuff while updating node/zone span.


Thanks,
Jonghyeon
> 
> 
> 
> -- 
> Thanks,
> 
> David / dhildenb
> 


  reply	other threads:[~2022-01-27  9:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26 17:00 Jonghyeon Kim
2022-01-26 17:00 ` [PATCH 2/2] dax/kmem: Update spanned page stat of origin device node Jonghyeon Kim
2022-01-27  0:29   ` kernel test robot
2022-01-27  5:29   ` kernel test robot
2022-01-26 17:04 ` [PATCH 1/2] mm/memory_hotplug: Export shrink span functions for zone and node David Hildenbrand
2022-01-27  9:41   ` Jonghyeon Kim [this message]
2022-01-27  9:54     ` David Hildenbrand
2022-01-28  4:19       ` Jonghyeon Kim
2022-01-28  8:10         ` David Hildenbrand
2022-02-03  2:22           ` Jonghyeon Kim
2022-02-03  8:19             ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220127094142.GA31409@swarm08 \
    --to=tome01@ajou.ac.kr \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox