From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 522F7C32772 for ; Tue, 23 Aug 2022 13:50:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D5E758D0002; Tue, 23 Aug 2022 09:50:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0D328D0001; Tue, 23 Aug 2022 09:50:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BAFE18D0002; Tue, 23 Aug 2022 09:50:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AC2E18D0001 for ; Tue, 23 Aug 2022 09:50:30 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 717B51C69F7 for ; Tue, 23 Aug 2022 13:50:30 +0000 (UTC) X-FDA: 79830992220.11.11E0223 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 4529CA003E for ; Tue, 23 Aug 2022 13:50:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1661262628; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8fkoqqH+lDAJFzVvmiR7vRVoE2F5q6gx6EKKDILI6KI=; b=AfJA/o2f6IJtVHdUN1miuR3FZaowAXHvG/K3TL9ZxEVCoERJqBlkWbS9HewBY3NJnduw9D us+e5ERfbPyeawbu84GXidsk5VQaoscWa/6il9klOh3nYYs3OLu7EyZweCMv/HGoeC09l3 jq4t0SwiX0wEy1i3Ead0flPGAWUKCgk= Received: from mail-wr1-f70.google.com (mail-wr1-f70.google.com [209.85.221.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-623-RIjYL92_MT2GisIs61tMew-1; Tue, 23 Aug 2022 09:50:25 -0400 X-MC-Unique: RIjYL92_MT2GisIs61tMew-1 Received: by mail-wr1-f70.google.com with SMTP id w2-20020adfbac2000000b00225688186e5so315085wrg.8 for ; Tue, 23 Aug 2022 06:50:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc; bh=8fkoqqH+lDAJFzVvmiR7vRVoE2F5q6gx6EKKDILI6KI=; b=sq0RdBruTqOZGsecQOeciDoImO0zN0P0wzOkwuU67H3hyLqq+i1lIexjcv/05JZGKw JaDsI15D6MVchFASb/ipabgaa12PqjUWtSGLdz+BV8+E1ULBynZN+NHmEmZe6rFFOVBz +h9tUS7iVLYQn8U5nBWCCcxyN/cKEO0X7o4zwDE+e+6qzzTaTxTFxB51meS4T3wYTZep jEcnuIW1Npds/blgM4LmkthL+l8sNNIN21dyedFrlpGNzSr1OcmuU1ltK0S4VoyJblvb 50HQpcgYVj2RaXXyf5Kvo6ZB6dd6WmnvNX7C3/xykHxr3lhnr8rxNfIlVDtBRfqTJfwD DXLA== X-Gm-Message-State: ACgBeo0DhCa1GmuK836MV3DWCBa7+YQzBWR5tRFxV7s9A1/GpjrQ05D1 +mPoLDfOhzAjMhqFO00mKyp+3wID277Wy1pfpCVvlp4nOLKOasm9+dudRhCf4zWOYI4W+2fRx98 D9RaWNcLLlRk= X-Received: by 2002:a5d:4b88:0:b0:225:3a27:415c with SMTP id b8-20020a5d4b88000000b002253a27415cmr10936196wrt.340.1661262624509; Tue, 23 Aug 2022 06:50:24 -0700 (PDT) X-Google-Smtp-Source: AA6agR6OOXroRrUmGGC3ydeMNpUbe6pSbgB5b3oyOMrex6OmAz8Et57O+3+IJYv6RTLOhK3c58Pbkg== X-Received: by 2002:a5d:4b88:0:b0:225:3a27:415c with SMTP id b8-20020a5d4b88000000b002253a27415cmr10936183wrt.340.1661262624213; Tue, 23 Aug 2022 06:50:24 -0700 (PDT) Received: from ?IPV6:2003:cb:c70b:1600:c48b:1fab:a330:5182? (p200300cbc70b1600c48b1faba3305182.dip0.t-ipconnect.de. [2003:cb:c70b:1600:c48b:1fab:a330:5182]) by smtp.gmail.com with ESMTPSA id w5-20020a5d5445000000b0021e571a99d5sm13973738wrv.17.2022.08.23.06.50.23 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 23 Aug 2022 06:50:23 -0700 (PDT) Message-ID: <37031749-ff57-2f90-5c90-f16473f31e37@redhat.com> Date: Tue, 23 Aug 2022 15:50:23 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 To: Michal Hocko , Mel Gorman Cc: Patrick Daly , linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, Juergen Gross References: <20220817104028.uin7cmkb4qlpgfbi@suse.de> <11f91089-1958-c7eb-126f-af32130d9f8a@redhat.com> <20220823083349.5c2aolc6xgfhp3k7@suse.de> <20220823094950.ocjyur2h3mqnqbeg@suse.de> <0fc01e47-51f3-baf7-2d46-72291422f695@redhat.com> <20220823110946.o3eawk3kghaykcim@suse.de> <20220823125850.o3nhkmikmv7vyxq4@suse.de> From: David Hildenbrand Organization: Red Hat Subject: Re: Race condition in build_all_zonelists() when offlining movable zone In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="AfJA/o2f"; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661262629; a=rsa-sha256; cv=none; b=5WX95g00MXj640LIsxVh8mlrGlq9qISQ+KObyCK1AyRR8t3K5iCBqi0yFJlgma3AEaBogw MJ/rfETDE0ZxnFcoyO5X70GcYPlQUeI2TJPGAwZSsUL5xato+ADdWnK4+NE15pT7KrykYp RPXGpUV3FfjhIt/uUJp6VxG3xI3pAFo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661262629; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8fkoqqH+lDAJFzVvmiR7vRVoE2F5q6gx6EKKDILI6KI=; b=l5Hivwqhb8qvRlqSXsZoxmTmidQtHi6M7I8l9PiwXz8iT2m+3rewoSYKoXI/lS/9vl2RsT SUnWYFkDYI0nSmMAkM0QyNPnRuCpxpQf6HwOCmwY5kxWBTtjnDEl6pwLrqfye2ZPHgIV4r 0td8ZMiWftsnklaI/epWvJ3SnFBMByE= X-Stat-Signature: xzfuqhqarsw3x7exq5645cwa93yu589e X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 4529CA003E Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="AfJA/o2f"; spf=pass (imf25.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1661262629-666181 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000058, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 23.08.22 15:25, Michal Hocko wrote: > On Tue 23-08-22 13:58:50, Mel Gorman wrote: >> On Tue, Aug 23, 2022 at 02:18:27PM +0200, Michal Hocko wrote: >>> On Tue 23-08-22 12:09:46, Mel Gorman wrote: >>>> On Tue, Aug 23, 2022 at 12:34:09PM +0200, David Hildenbrand wrote: >>>>>> @@ -6553,7 +6576,7 @@ static void __build_all_zonelists(void *data) >>>>>> #endif >>>>>> } >>>>>> >>>>>> - spin_unlock(&lock); >>>>>> + write_sequnlock(&zonelist_update_seq); >>>>>> } >>>>>> >>>>>> static noinline void __init >>>>>> >>>>> >>>>> LGTM. The "retry_cpuset" label might deserve a better name now. >>>>> >>>> >>>> Good point ... "restart"? >>>> >>>>> Would >>>>> >>>>> Fixes: 6aa303defb74 ("mm, vmscan: only allocate and reclaim from zones >>>>> with pages managed by the buddy allocator") >>>>> >>>>> be correct? >>>>> >>>> >>>> Not specifically because the bug is due to a zone being completely removed >>>> resulting in a rebuild. This race probably existed ever since memory >>>> hotremove could theoritically remove a complete zone. A Cc: Stable would >>>> be appropriate as it'll apply with fuzz back to at least 5.4.210 but beyond >>>> that, it should be driven by a specific bug report showing that hot-remove >>>> of a full zone was possible and triggered the race. >>> >>> I do not think so. 6aa303defb74 has changed the zonelist building and >>> changed the check from pfn range (populated) to managed (with a memory). >> >> I'm not 100% convinced. The present_pages should have been the spanned range >> minus any holes that exist in the zone. If the zone is completely removed, >> the span should be zero meaning present and managed are both zero. No? > > IIRC, and David will correct me if I am mixing this up. The difference > is that zonelists are rebuilt during memory offlining and that is when > managed pages are removed from the allocator. Zone itself still has that > physical range populated and so this patch would have made a difference. To recap, memory offlining adjusts managed+present pages of the zone essentially in one go. If after the adjustments, the zone is no longer populated (present==0), we rebuild the zone lists. Once that's done, we try shrinking the zone (start+spanned pages) -- which results in zone_start_pfn == 0 if there are no more pages. That happens *after* rebuilding the zonelists via remove_pfn_range_from_zone(). Note that populated_zone() checks for present_pages. The actual zone span (e.g., spanned_pages) is a different story and not of interest when building zones or wanting to allocate memory. > > Now, you are right that this is likely possible even without that commit > but it is highly unlikely because physical hotremove is a very rare > operation and the race window would be so large that it would be likely > unfeasible. I think I agree that 6aa303defb74 is most likely not the origin of this. It could only have been the origin in weird corner cases where we actually succeed offlining one memory block (adjust present+managed) and end up with managed=0 and present!=0 -- which barely happens in practice: especially for ZONE_MOVABLE. (yeah, there is memory ballooning that adjusts managed pages dynamically and might provoke such a situation on ZONE_MOVABLE) -- Thanks, David / dhildenb