linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, vbabka@suse.com, david@redhat.com,
	Vlastimil Babka <vbabka@suse.cz>
Subject: Re: [PATCH STABLE 4.9] mm: Avoid calling build_all_zonelists_init under hotplug context
Date: Thu, 20 Aug 2020 10:14:26 +0200	[thread overview]
Message-ID: <20200820081426.GF4049659@kroah.com> (raw)
In-Reply-To: <20200818122446.GA15067@dhcp22.suse.cz>

On Tue, Aug 18, 2020 at 02:24:46PM +0200, Michal Hocko wrote:
> On Tue 18-08-20 13:00:46, Oscar Salvador wrote:
> > Recently a customer of ours experienced a crash when booting the
> > system while enabling memory-hotplug.
> > 
> > The problem is that Normal zones on different nodes don't get their private
> > zone->pageset allocated, and keep sharing the initial boot_pageset.
> > The sharing between zones is normally safe as explained by the comment for
> > boot_pageset - it's a percpu structure, and manipulations are done with
> > disabled interrupts, and boot_pageset is set up in a way that any page placed
> > on its pcplist is immediately flushed to shared zone's freelist, because
> > pcp->high == 1.
> > However, the hotplug operation updates pcp->high to a higher value as it
> > expects to be operating on a private pageset.
> > 
> > The problem is in build_all_zonelists(), which is called when the first range
> > of pages is onlined for the Normal zone of node X or Y:
> > 
> > 	if (system_state == SYSTEM_BOOTING) {
> > 		build_all_zonelists_init();
> > 	} else {
> > 	#ifdef CONFIG_MEMORY_HOTPLUG
> > 		if (zone)
> > 			setup_zone_pageset(zone);
> > 	#endif
> > 		/* we have to stop all cpus to guarantee there is no user
> > 		of zonelist */
> > 		stop_machine(__build_all_zonelists, pgdat, NULL);
> > 		/* cpuset refresh routine should be here */
> > 	}
> > 
> > When called during hotplug, it should execute the setup_zone_pageset(zone)
> > which allocates the private pageset.
> > However, with memhp_default_state=online, this happens early while
> > system_state == SYSTEM_BOOTING is still true, hence this step is skipped.
> > (and build_all_zonelists_init() is probably unsafe anyway at this point).
> > 
> > Another hotplug operation on the same zone then leads to zone_pcp_update(zone)
> > called from online_pages(), which updates the pcp->high for the shared
> > boot_pageset to a value higher than 1.
> > At that point, pages freed from Node X and Y Normal zones can end up on the same
> > pcplist and from there they can be freed to the wrong zone's freelist,
> > leading to the corruption and crashes.
> > 
> > Please, note that upstream has fixed that differently (and unintentionally) by
> > adding another boot state (SYSTEM_SCHEDULING), which is set before smp_init().
> > That should happen before memory hotplug events even with memhp_default_state=online.
> > Backporting that would be too intrusive.
> > 
> > Signed-off-by: Oscar Salvador <osalvador@suse.de>
> > Debugged-by: Vlastimil Babka <vbabka@suse.cz>
> 
> Yes, I believe this is the easiest and the least scary way to fix the
> issue for stable kernel users. Feel free to add
> Acked-by: Michal Hocko <mhocko@suse.com> # for stable trees
> 
> for that purpose.

Now queued up, thanks!

greg k-h


  reply	other threads:[~2020-08-20  8:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-18 11:00 Oscar Salvador
2020-08-18 12:24 ` Michal Hocko
2020-08-20  8:14   ` Greg KH [this message]
2020-08-18 12:26 ` David Hildenbrand
2020-08-20  8:39 ` Patch "mm: Avoid calling build_all_zonelists_init under hotplug context" has been added to the 4.9-stable tree gregkh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200820081426.GF4049659@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=vbabka@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox