From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4EF5C433E1 for ; Thu, 20 Aug 2020 08:39:20 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9616F2173E for ; Thu, 20 Aug 2020 08:39:20 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="VIwy3G3b" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9616F2173E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2204C6B005A; Thu, 20 Aug 2020 04:39:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1D2818D0003; Thu, 20 Aug 2020 04:39:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E8DA6B005D; Thu, 20 Aug 2020 04:39:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0169.hostedemail.com [216.40.44.169]) by kanga.kvack.org (Postfix) with ESMTP id EC7FB6B005A for ; Thu, 20 Aug 2020 04:39:19 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id AA57D362F for ; Thu, 20 Aug 2020 08:39:19 +0000 (UTC) X-FDA: 77170297638.18.jar16_4b100372702f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 78A08100ED0DB for ; Thu, 20 Aug 2020 08:39:19 +0000 (UTC) X-HE-Tag: jar16_4b100372702f X-Filterd-Recvd-Size: 8653 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 20 Aug 2020 08:39:18 +0000 (UTC) Received: from localhost (83-86-89-107.cable.dynamic.v4.ziggo.nl [83.86.89.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8F60420855; Thu, 20 Aug 2020 08:39:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1597912758; bh=QixPNO5YXFPBH12WAvCH/kauvHn7VfKyZnoK1i/I3es=; h=Subject:To:Cc:From:Date:In-Reply-To:From; b=VIwy3G3bonzePSNr4yP9xfVOb51tpjBb69rFAQeI6P0Z0A3afvC5qBKDuyVxQuAMb 1n8xRfM2pcKGcFfgbkvpevFYnGkHUwYJN1crDAdN7/eupX2Bpfb9x7d7Zhh0XA/wlG XjHDjFR5N2PJiuBUQ0lPzBlJM5q6XQe04r9VGAxM= Subject: Patch "mm: Avoid calling build_all_zonelists_init under hotplug context" has been added to the 4.9-stable tree To: david@redhat.com,gregkh@linuxfoundation.org,linux-mm@kvack.org,mhocko@suse.com,osalvador@suse.de,vbabka@suse.com,vbabka@suse.cz Cc: From: Date: Thu, 20 Aug 2020 10:39:27 +0200 In-Reply-To: <20200818110046.6664-1-osalvador@suse.de> Message-ID: <1597912767118228@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 X-stable: commit X-Patchwork-Hint: ignore X-Rspamd-Queue-Id: 78A08100ED0DB X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a note to let you know that I've just added the patch titled mm: Avoid calling build_all_zonelists_init under hotplug context to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=3Dlinux/kernel/git/stable/stable-queue.g= it;a=3Dsummary The filename of the patch is: mm-avoid-calling-build_all_zonelists_init-under-hotplug-context.patc= h and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From osalvador@suse.de Thu Aug 20 10:13:12 2020 From: Oscar Salvador Date: Tue, 18 Aug 2020 13:00:46 +0200 Subject: mm: Avoid calling build_all_zonelists_init under hotplug context To: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, vb= abka@suse.com, david@redhat.com, Oscar Salvador , Vlas= timil Babka Message-ID: <20200818110046.6664-1-osalvador@suse.de> From: Oscar Salvador Recently a customer of ours experienced a crash when booting the system while enabling memory-hotplug. The problem is that Normal zones on different nodes don't get their priva= te zone->pageset allocated, and keep sharing the initial boot_pageset. The sharing between zones is normally safe as explained by the comment fo= r boot_pageset - it's a percpu structure, and manipulations are done with disabled interrupts, and boot_pageset is set up in a way that any page pl= aced on its pcplist is immediately flushed to shared zone's freelist, because pcp->high =3D=3D 1. However, the hotplug operation updates pcp->high to a higher value as it expects to be operating on a private pageset. The problem is in build_all_zonelists(), which is called when the first r= ange of pages is onlined for the Normal zone of node X or Y: if (system_state =3D=3D SYSTEM_BOOTING) { build_all_zonelists_init(); } else { #ifdef CONFIG_MEMORY_HOTPLUG if (zone) setup_zone_pageset(zone); #endif /* we have to stop all cpus to guarantee there is no user of zonelist */ stop_machine(__build_all_zonelists, pgdat, NULL); /* cpuset refresh routine should be here */ } When called during hotplug, it should execute the setup_zone_pageset(zone= ) which allocates the private pageset. However, with memhp_default_state=3Donline, this happens early while system_state =3D=3D SYSTEM_BOOTING is still true, hence this step is skip= ped. (and build_all_zonelists_init() is probably unsafe anyway at this point). Another hotplug operation on the same zone then leads to zone_pcp_update(= zone) called from online_pages(), which updates the pcp->high for the shared boot_pageset to a value higher than 1. At that point, pages freed from Node X and Y Normal zones can end up on t= he same pcplist and from there they can be freed to the wrong zone's freelist, leading to the corruption and crashes. Please, note that upstream has fixed that differently (and unintentionall= y) by adding another boot state (SYSTEM_SCHEDULING), which is set before smp_in= it(). That should happen before memory hotplug events even with memhp_default_s= tate=3Donline. Backporting that would be too intrusive. Signed-off-by: Oscar Salvador Debugged-by: Vlastimil Babka Acked-by: Michal Hocko # for stable trees Signed-off-by: Greg Kroah-Hartman --- include/linux/mmzone.h | 3 ++- init/main.c | 2 +- mm/memory_hotplug.c | 10 +++++----- mm/page_alloc.c | 7 ++++--- 4 files changed, 12 insertions(+), 10 deletions(-) --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -756,7 +756,8 @@ static inline bool is_dev_zone(const str #include =20 extern struct mutex zonelists_mutex; -void build_all_zonelists(pg_data_t *pgdat, struct zone *zone); +void build_all_zonelists(pg_data_t *pgdat, struct zone *zone, + bool hotplug_context); void wakeup_kswapd(struct zone *zone, int order, enum zone_type classzon= e_idx); bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned lo= ng mark, int classzone_idx, unsigned int alloc_flags, --- a/init/main.c +++ b/init/main.c @@ -512,7 +512,7 @@ asmlinkage __visible void __init start_k smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); =20 - build_all_zonelists(NULL, NULL); + build_all_zonelists(NULL, NULL, false); page_alloc_init(); =20 pr_notice("Kernel command line: %s\n", boot_command_line); --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1125,7 +1125,7 @@ int __ref online_pages(unsigned long pfn mutex_lock(&zonelists_mutex); if (!populated_zone(zone)) { need_zonelists_rebuild =3D 1; - build_all_zonelists(NULL, zone); + build_all_zonelists(NULL, zone, true); } =20 ret =3D walk_system_ram_range(pfn, nr_pages, &onlined_pages, @@ -1146,7 +1146,7 @@ int __ref online_pages(unsigned long pfn if (onlined_pages) { node_states_set_node(nid, &arg); if (need_zonelists_rebuild) - build_all_zonelists(NULL, NULL); + build_all_zonelists(NULL, NULL, true); else zone_pcp_update(zone); } @@ -1220,7 +1220,7 @@ static pg_data_t __ref *hotadd_new_pgdat * to access not-initialized zonelist, build here. */ mutex_lock(&zonelists_mutex); - build_all_zonelists(pgdat, NULL); + build_all_zonelists(pgdat, NULL, true); mutex_unlock(&zonelists_mutex); =20 /* @@ -1276,7 +1276,7 @@ int try_online_node(int nid) =20 if (pgdat->node_zonelists->_zonerefs->zone =3D=3D NULL) { mutex_lock(&zonelists_mutex); - build_all_zonelists(NULL, NULL); + build_all_zonelists(NULL, NULL, true); mutex_unlock(&zonelists_mutex); } =20 @@ -2016,7 +2016,7 @@ repeat: if (!populated_zone(zone)) { zone_pcp_reset(zone); mutex_lock(&zonelists_mutex); - build_all_zonelists(NULL, NULL); + build_all_zonelists(NULL, NULL, true); mutex_unlock(&zonelists_mutex); } else zone_pcp_update(zone); --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4608,7 +4608,7 @@ int numa_zonelist_order_handler(struct c user_zonelist_order =3D oldval; } else if (oldval !=3D user_zonelist_order) { mutex_lock(&zonelists_mutex); - build_all_zonelists(NULL, NULL); + build_all_zonelists(NULL, NULL, false); mutex_unlock(&zonelists_mutex); } } @@ -4988,11 +4988,12 @@ build_all_zonelists_init(void) * (2) call of __init annotated helper build_all_zonelists_init * [protected by SYSTEM_BOOTING]. */ -void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone) +void __ref build_all_zonelists(pg_data_t *pgdat, struct zone *zone, + bool hotplug_context) { set_zonelist_order(); =20 - if (system_state =3D=3D SYSTEM_BOOTING) { + if (system_state =3D=3D SYSTEM_BOOTING && !hotplug_context) { build_all_zonelists_init(); } else { #ifdef CONFIG_MEMORY_HOTPLUG Patches currently in stable-queue which might be from osalvador@suse.de a= re queue-4.9/mm-avoid-calling-build_all_zonelists_init-under-hotplug-context= .patch