From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D28C8C433ED for ; Mon, 12 Apr 2021 14:12:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F2C16128E for ; Mon, 12 Apr 2021 14:12:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F2C16128E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C6FBB6B0073; Mon, 12 Apr 2021 10:12:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C1FF08D0002; Mon, 12 Apr 2021 10:12:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A98B86B0075; Mon, 12 Apr 2021 10:12:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 8EFA46B0073 for ; Mon, 12 Apr 2021 10:12:16 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 3EA484853 for ; Mon, 12 Apr 2021 14:12:16 +0000 (UTC) X-FDA: 78023904672.23.B615703 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id B0F76200025D for ; Mon, 12 Apr 2021 14:12:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1618236735; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lyi96NyD+D+WfuS5i0V0sBLczlKBBvqVl8O4Fn9CDhU=; b=VLxngrE2XwSivaLCU5tc1Vj1sfkAeXlt+Cfu9P822tcsqFX415S10ba5L1F5gV2a+sv3cP FmlZpkVXB7MpQocVswTXg5aB75I6bwOlT3GT/fuz9kaxEQaQbu/cDV2gV7/WKhivaR5fQJ Bo0tHy3EXB/tAyQi7uAVdzoUnvv1eZs= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-24-y5lfgAAdMCStZHIAcHAoSQ-1; Mon, 12 Apr 2021 10:12:13 -0400 X-MC-Unique: y5lfgAAdMCStZHIAcHAoSQ-1 Received: by mail-wm1-f69.google.com with SMTP id f15-20020a7bc8cf0000b029010bb9489e25so2553675wml.0 for ; Mon, 12 Apr 2021 07:12:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:organization :message-id:date:user-agent:mime-version:in-reply-to :content-language:content-transfer-encoding; bh=lyi96NyD+D+WfuS5i0V0sBLczlKBBvqVl8O4Fn9CDhU=; b=o/Oh4NMOTTAlsMgUxyQWat5bO8q4nbJExFupAtNCgP+L7JtlO7Av6Q6m06Us89tY2B pXDsR4bMLLYyjIrOObKI4GFvB0t0130wkom/tW1six/yPawCGKhtR2KnmupWZ3DoS+wV Knhvw4olt94FIp4pX7ifCtegCYhJra6DAUka+K4/Tvx8vQmCpUTNPXx72zECT6jT20+N wc6K+5JOP4wEKxh+I+vUige4vbgjuKKhu/Aaqrin4HtJ+tR3gQsbk9TLQpfwJHIAI0TF 8BeLWkuzQEIDH7DFxVEkZxgvnD1MSPwVwXrgHZyghujdF8m2dO8cgjJIvk25JI8Yt+Pw 4Cfg== X-Gm-Message-State: AOAM532cWcDPsyzoCUpSUKCD3Zodl4JLB4/rqvOWadLfMcYb8V1EVi0N o4jsLtYRo9g0bkzaWoyuYDOQMtESPr+qaxllvxlN92v4eT86Xzvuh6Cq/8Udg17vagKjG+glczF zbdfaW6c5oIw= X-Received: by 2002:adf:f14b:: with SMTP id y11mr30921900wro.79.1618236732734; Mon, 12 Apr 2021 07:12:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxANxA+uBgwsvkXeVvSUMO/GTY7oQqczZlRKrWC3N/pmo2Ecge8ErYByv50q+ApvHfKVv79Qg== X-Received: by 2002:adf:f14b:: with SMTP id y11mr30921880wro.79.1618236732532; Mon, 12 Apr 2021 07:12:12 -0700 (PDT) Received: from [192.168.3.132] (p5b0c66cb.dip0.t-ipconnect.de. [91.12.102.203]) by smtp.gmail.com with ESMTPSA id a3sm18045233wru.40.2021.04.12.07.12.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 12 Apr 2021 07:12:12 -0700 (PDT) Subject: Re: [PATCH v2 resend] mm/memory_hotplug: Make unpopulated zones PCP structures unreachable during hot remove To: Mel Gorman , Vlastimil Babka Cc: Andrew Morton , Oscar Salvador , "Michael S. Tsirkin" , Alexander Duyck , Minchan Kim , Michal Hocko , Linux-MM , LKML References: <20210412120842.GY3697@techsingularity.net> <20210412140852.GZ3697@techsingularity.net> From: David Hildenbrand Organization: Red Hat Message-ID: Date: Mon, 12 Apr 2021 16:12:11 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210412140852.GZ3697@techsingularity.net> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=iso-8859-15; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Stat-Signature: pawcqnrs6rzb9g44eyxqby86nu87yyj4 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B0F76200025D Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618236727-373584 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 12.04.21 16:08, Mel Gorman wrote: > On Mon, Apr 12, 2021 at 02:40:18PM +0200, Vlastimil Babka wrote: >> On 4/12/21 2:08 PM, Mel Gorman wrote: >>> zone_pcp_reset allegedly protects against a race with drain_pages >>> using local_irq_save but this is bogus. local_irq_save only operates >>> on the local CPU. If memory hotplug is running on CPU A and drain_pages >>> is running on CPU B, disabling IRQs on CPU A does not affect CPU B and >>> offers no protection. >>> >>> This patch deletes IRQ disable/enable on the grounds that IRQs protect >>> nothing and assumes the existing hotplug paths guarantees the PCP cannot be >>> used after zone_pcp_enable(). That should be the case already because all >>> the pages have been freed and there is no page to put on the PCP lists. >>> >>> Signed-off-by: Mel Gorman >> >> Yeah the irq disabling here is clearly bogus, so: >> >> Acked-by: Vlastimil Babka >> > > Thanks! > >> But I think Michal has a point that we might best leave the pagesets around, by >> a future change. I'm have some doubts that even with your reordering of the >> reset/destroy after zonelist rebuild in v1 they cant't be reachable. We have no >> protection between zonelist rebuild and zonelist traversal, and that's why we >> just leave pgdats around. >> >> So I can imagine a task racing with memory hotremove might see watermarks as ok >> in get_page_from_freelist() for the zone and proceeds to try_this_zone:, then >> gets stalled/scheduled out while hotremove rebuilds the zonelist and destroys >> the pcplists, then the first task is resumed and proceeds with rmqueue_pcplist(). >> >> So that's very rare thus not urgent, and this patch doesn't make it less rare so >> not a reason to block it. >> > > After v1 of the patch, the race was reduced to the point between the > zone watermark check and the rmqueue_pcplist but yes, it still existed. > Closing it completely was either complex or expensive. Setting > zone->pageset = &boot_pageset before the free would shrink the race > further but that still leaves a potential memory ordering issue. > > While fixable, it's either complex, expensive or both so yes, just leaving > the pageset structures in place would be much more straight-forward > assuming the structures were not allocated in the zone that is being > hot-removed. As things stand, I had trouble even testing zone hot-remove > as there was always a few pages left behind and I did not chase down > why. Can you elaborate? I can reliably trigger zone present pages going to 0 by just hotplugging a DIMM, onlining the memory block devices to the MOVABLE zone, followed by offlining the memory block again. -- Thanks, David / dhildenb