From: Michal Hocko <mhocko@suse.com>
To: Wei Yang <richard.weiyang@gmail.com>
Cc: akpm@linux-foundation.org, dave.hansen@intel.com, linux-mm@kvack.org
Subject: Re: [PATCH] mm, page_alloc: fix calculation of pgdat->nr_zones
Date: Mon, 19 Nov 2018 10:48:32 +0100 [thread overview]
Message-ID: <20181119094832.GC22247@dhcp22.suse.cz> (raw)
In-Reply-To: <20181117022022.9956-1-richard.weiyang@gmail.com>
On Sat 17-11-18 10:20:22, Wei Yang wrote:
> Function init_currently_empty_zone() will adjust pgdat->nr_zones and set
> it to 'zone_idx(zone) + 1' unconditionally. This is correct in the
> normal case, while not exact in hot-plug situation.
>
> This function is used in two places:
>
> * free_area_init_core()
> * move_pfn_range_to_zone()
>
> In the first case, we are sure zone index increase monotonically. While
> in the second one, this is under users control.
>
> One way to reproduce this is:
> ----------------------------
>
> 1. create a virtual machine with empty node1
>
> -m 4G,slots=32,maxmem=32G \
> -smp 4,maxcpus=8 \
> -numa node,nodeid=0,mem=4G,cpus=0-3 \
> -numa node,nodeid=1,mem=0G,cpus=4-7
>
> 2. hot-add cpu 3-7
>
> cpu-add [3-7]
>
> 2. hot-add memory to nod1
>
> object_add memory-backend-ram,id=ram0,size=1G
> device_add pc-dimm,id=dimm0,memdev=ram0,node=1
>
> 3. online memory with following order
>
> echo online_movable > memory47/state
> echo online > memory40/state
>
> After this, node1 will have its nr_zones equals to (ZONE_NORMAL + 1)
> instead of (ZONE_MOVABLE + 1).
Maybe it is just me but the above was quite hard to grasp. So just to
clarify. The underlying problem is that initialization of any existing
empty zone will override the previous node wide setting
(pgdat->nr_zones). The fix is to only update nr_zones when a higher zone
is added.
Fixes: f1dd2cd13c4b ("mm, memory_hotplug: do not associate hotadded memory to zones until online")
I haven't checked the code prior to this rework but I suspect it was
really the above one to change the picture.
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Thanks!
> ---
> mm/page_alloc.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5b7cd20dbaef..2d3c54201255 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5823,8 +5823,10 @@ void __meminit init_currently_empty_zone(struct zone *zone,
> unsigned long size)
> {
> struct pglist_data *pgdat = zone->zone_pgdat;
> + int zone_idx = zone_idx(zone) + 1;
>
> - pgdat->nr_zones = zone_idx(zone) + 1;
> + if (zone_idx > pgdat->nr_zones)
> + pgdat->nr_zones = zone_idx;
>
> zone->zone_start_pfn = zone_start_pfn;
>
> --
> 2.15.1
>
--
Michal Hocko
SUSE Labs
next prev parent reply other threads:[~2018-11-19 9:48 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-17 2:20 Wei Yang
2018-11-19 6:38 ` Anshuman Khandual
2018-11-20 3:22 ` Wei Yang
2018-11-19 9:48 ` Michal Hocko [this message]
2018-11-19 13:38 ` Michal Hocko
2018-12-04 9:05 ` Sasha Levin
2018-12-04 9:11 ` Wei Yang
2018-11-19 10:07 ` osalvador
2018-11-19 10:20 ` Michal Hocko
2018-11-19 14:15 ` Wei Yang
2018-11-19 14:23 ` Michal Hocko
2018-11-19 21:44 ` Wei Yang
2018-11-19 21:47 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20181119094832.GC22247@dhcp22.suse.cz \
--to=mhocko@suse.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=linux-mm@kvack.org \
--cc=richard.weiyang@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox