From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14C15C32772 for ; Tue, 23 Aug 2022 13:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8A53C8D0002; Tue, 23 Aug 2022 09:57:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 854A48D0001; Tue, 23 Aug 2022 09:57:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6F49B8D0002; Tue, 23 Aug 2022 09:57:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 5D58F8D0001 for ; Tue, 23 Aug 2022 09:57:40 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2C80C1C6844 for ; Tue, 23 Aug 2022 13:57:40 +0000 (UTC) X-FDA: 79831010280.06.BC3E604 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf27.hostedemail.com (Postfix) with ESMTP id A68F24004B for ; Tue, 23 Aug 2022 13:57:39 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 878A11FA1A; Tue, 23 Aug 2022 13:57:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1661263058; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=A6Ntbor202f+r9wgDqWvfnxYBkjm8Sd5fWE0iaMJ85E=; b=tnow2TAiKtJdAa9JpHzrPftrY0Gapa5tCXWQYbaoLWQ45YWci5yo/INuo7Tr6Fh643O+cu exzqE1tCJjTYFrmx/k21X7gYqRvtbDvgwfqQaT7kCpyebg1r0RoyE7yqIGfpAqrpCqYNyb uoiZa/mxbjCskO+Ab7EH167MIhFkeOE= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 72AE913AB7; Tue, 23 Aug 2022 13:57:38 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LuQ6G9LcBGOPcgAAMHmgww (envelope-from ); Tue, 23 Aug 2022 13:57:38 +0000 Date: Tue, 23 Aug 2022 15:57:38 +0200 From: Michal Hocko To: David Hildenbrand Cc: Mel Gorman , Patrick Daly , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Juergen Gross Subject: Re: Race condition in build_all_zonelists() when offlining movable zone Message-ID: References: <11f91089-1958-c7eb-126f-af32130d9f8a@redhat.com> <20220823083349.5c2aolc6xgfhp3k7@suse.de> <20220823094950.ocjyur2h3mqnqbeg@suse.de> <0fc01e47-51f3-baf7-2d46-72291422f695@redhat.com> <20220823110946.o3eawk3kghaykcim@suse.de> <20220823125850.o3nhkmikmv7vyxq4@suse.de> <37031749-ff57-2f90-5c90-f16473f31e37@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <37031749-ff57-2f90-5c90-f16473f31e37@redhat.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661263059; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A6Ntbor202f+r9wgDqWvfnxYBkjm8Sd5fWE0iaMJ85E=; b=lJ7ZpsW89nslexubZfybD//wVdgn7oH+z2OATpqSPxosfkeW+fS3uTKiaiB3XnBqKFOBi+ KonWLUw8bdC+GNjNqccun07RfoTISsrRCEuZVQgVD0uPZ0/Y6HTvpHgQYLeqizVWGtEfFO +zxjMcZ2+N+5kzf0bo7aMqdnJ+WPqMY= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=tnow2TAi; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661263059; a=rsa-sha256; cv=none; b=yhPhsCUHUX5KM9cejgbblLTe6HqF5F0LTX8iI9+ydpznMUnyjpe2BmUafIvzop20ebwxpF GIVWSOwmpQ/ooEl3m+tvuiQs52flF/SKO7Zs5BqAOgQj5c099w/hjSAmTbaSaW0yKfvnXK Icm9yaIyHGrr/lCDgJ7QphzqOOeaAAM= X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=tnow2TAi; spf=pass (imf27.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A68F24004B X-Stat-Signature: dhjnimgjh686k1gx9fscgszqjrfhdc1i X-HE-Tag: 1661263059-5122 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 23-08-22 15:50:23, David Hildenbrand wrote: > On 23.08.22 15:25, Michal Hocko wrote: > > On Tue 23-08-22 13:58:50, Mel Gorman wrote: > >> On Tue, Aug 23, 2022 at 02:18:27PM +0200, Michal Hocko wrote: > >>> On Tue 23-08-22 12:09:46, Mel Gorman wrote: > >>>> On Tue, Aug 23, 2022 at 12:34:09PM +0200, David Hildenbrand wrote: > >>>>>> @@ -6553,7 +6576,7 @@ static void __build_all_zonelists(void *data) > >>>>>> #endif > >>>>>> } > >>>>>> > >>>>>> - spin_unlock(&lock); > >>>>>> + write_sequnlock(&zonelist_update_seq); > >>>>>> } > >>>>>> > >>>>>> static noinline void __init > >>>>>> > >>>>> > >>>>> LGTM. The "retry_cpuset" label might deserve a better name now. > >>>>> > >>>> > >>>> Good point ... "restart"? > >>>> > >>>>> Would > >>>>> > >>>>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones > >>>>> with pages managed by the buddy allocator") > >>>>> > >>>>> be correct? > >>>>> > >>>> > >>>> Not specifically because the bug is due to a zone being completely removed > >>>> resulting in a rebuild. This race probably existed ever since memory > >>>> hotremove could theoritically remove a complete zone. A Cc: Stable would > >>>> be appropriate as it'll apply with fuzz back to at least 5.4.210 but beyond > >>>> that, it should be driven by a specific bug report showing that hot-remove > >>>> of a full zone was possible and triggered the race. > >>> > >>> I do not think so. 6aa303defb74 has changed the zonelist building and > >>> changed the check from pfn range (populated) to managed (with a memory). > >> > >> I'm not 100% convinced. The present_pages should have been the spanned range > >> minus any holes that exist in the zone. If the zone is completely removed, > >> the span should be zero meaning present and managed are both zero. No? > > > > IIRC, and David will correct me if I am mixing this up. The difference > > is that zonelists are rebuilt during memory offlining and that is when > > managed pages are removed from the allocator. Zone itself still has that > > physical range populated and so this patch would have made a difference. > > To recap, memory offlining adjusts managed+present pages of the zone > essentially in one go. If after the adjustments, the zone is no longer > populated (present==0), we rebuild the zone lists. > > Once that's done, we try shrinking the zone (start+spanned pages) -- > which results in zone_start_pfn == 0 if there are no more pages. That > happens *after* rebuilding the zonelists via remove_pfn_range_from_zone(). > > > Note that populated_zone() checks for present_pages. The actual zone > span (e.g., spanned_pages) is a different story and not of interest when > building zones or wanting to allocate memory. > > > > > Now, you are right that this is likely possible even without that commit > > but it is highly unlikely because physical hotremove is a very rare > > operation and the race window would be so large that it would be likely > > unfeasible. > > I think I agree that 6aa303defb74 is most likely not the origin of this. > It could only have been the origin in weird corner cases where we > actually succeed offlining one memory block (adjust present+managed) and > end up with managed=0 and present!=0 -- which barely happens in > practice: especially for ZONE_MOVABLE. (yeah, there is memory ballooning > that adjusts managed pages dynamically and might provoke such a > situation on ZONE_MOVABLE) OK, thanks for the correction David. Then I would agree that Fixes tag could be more confusing than helpful and your above summary would be a great part of the changelog. Thanks! -- Michal Hocko SUSE Labs