From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78469CD1297 for ; Wed, 10 Apr 2024 07:52:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DF8516B007B; Wed, 10 Apr 2024 03:52:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DA8286B0082; Wed, 10 Apr 2024 03:52:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C96646B0083; Wed, 10 Apr 2024 03:52:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AC3926B007B for ; Wed, 10 Apr 2024 03:52:27 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3FE02A0EC7 for ; Wed, 10 Apr 2024 07:52:27 +0000 (UTC) X-FDA: 81992854734.04.2092651 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf27.hostedemail.com (Postfix) with ESMTP id F0DC14001A for ; Wed, 10 Apr 2024 07:52:19 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712735540; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dZ92IcDzaasuO0YodDUOoG0rdT6hfsgrVThxKwc8GMM=; b=7tboIOeTLno3UF8sqy5gZdVo+FM5dGogVwXMjzAfgspwmtY4vzwnnBkQdNyrcx2oTnj3zd WACYkNcQDlF7jDydHsy0sZfi/R1d96P4McohWpQQi5npvsBTzXBoKZvMNuNt3t6Il26fYA s2H8w2oQK93ZKo+y56c7RhQVs6PZeEA= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf27.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712735540; a=rsa-sha256; cv=none; b=Aem+vde45/WxjmtdLeY39cxbqIl8SeHfgVzY9YbxXbLeTHz0p2UqxrFzdf65V9kzHNwMXn DD9nyyYN5pNsdi8BOH2mVUinoPy9fwL/KmhWRg9lbbCBhAFt6LhNd/ckH/gQzP1rnM0v20 vHbIhyHkvrZtyr5HWRKhrm7+XTgxyC0= Received: from mail.maildlp.com (unknown [172.19.163.17]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4VDw2W64L9z29dP9; Wed, 10 Apr 2024 15:49:23 +0800 (CST) Received: from canpemm500002.china.huawei.com (unknown [7.192.104.244]) by mail.maildlp.com (Postfix) with ESMTPS id A34C81A0172; Wed, 10 Apr 2024 15:52:15 +0800 (CST) Received: from [10.173.135.154] (10.173.135.154) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Wed, 10 Apr 2024 15:52:15 +0800 Subject: Re: [PATCH] mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled To: Oscar Salvador CC: , , , References: <20240407085456.2798193-1-linmiaohe@huawei.com> From: Miaohe Lin Message-ID: <13aa38af-46a1-3894-32bd-c3eb6ef67359@huawei.com> Date: Wed, 10 Apr 2024 15:52:14 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.135.154] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500002.china.huawei.com (7.192.104.244) X-Rspamd-Queue-Id: F0DC14001A X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: dscfh96zqf3r6csprdb6cdm15dwd5949 X-HE-Tag: 1712735539-454955 X-HE-Meta: U2FsdGVkX1/4qRunm83N0iFv9YKSoYH61HFN3TbFQ7HDAe2lrvWypeAxt0fdsMrNVJiGf/lX3kDQVHMiLINyHa2wppxPgod9vlJuRdt8frt05mTepj+XYdaec2IN6sZJNFXj8X8JgyLiuRsRkObtFwYTK/fYiS2ARKxayH1Z05R7IZw36xXs2lclZtDApnHIWMgg4GBlT/rQZpsC0LS/FJxXGFru//WURpxMRHMM420kli8ZQRtwUw0DDRZb4TFQzIzEymWGDWxUsyv2hqXJcZ4dTYwK1i+ZyKRdH1S6HvzbXebmuKo2eVO5oErcNCPYfFwh411V8MuQo38eLdjFRWnV4nGL9yKRft2FWS9R9mcgxbvGMhisxi5+A73EoqBxH7HMo2Osp0Liw2iE/W3/cg4y/6hdyDwJtUQ0DIYjXjuIx2K+nMfg1SedT5WyBcceNo4YZVeWUcyAqKx1OiHjenl42OxpRoaLUOefToX40VzPjiG07GEa6DT8qe16XiNKjLLf0ZdZbeSy0LzmFTIfnRHzsZk5tJHI2H6Yq5MI0V1rMO3FzNe97vw6tLoaMOtJkBo3gR1qLnNmTPAoKCA5LDUXu9DMqJyQkGXyhRmywHbf7YZLOvUjU/kKv0IrErad89UDzZda5cLW/bkcR+tNykZvXbg1CmTuFDp3xb0hwWtm/riF3YlLyMMAKXerYBdl9pg30/Ldkntv3SYIZLa1PTil/44FWX/ma7wqOJKFO5URDWCSg7O0PUrFJlgJmqmKiCNElI1HolXeluCmalP3m40VBpIrqzaZD8QhCFO9CNV5Hp6M0ZWnE56zNtgCmXpsBdC9Hk3ohi58DpDzQFhKyFlEQ9ZNLhZVoQNdsED7gt5gb1XKR0AB+3bDDdCAWGoRnByD59szubpUEtFqTjBAoEBgLepYWrloBOW/VJtd99HetKl+zLRTKZ2/+bAhO/PYqnOPIna4yAbWezpC/v4 8IbsrqZh m1Mm693xtvy6Aoj14UG/xhdor7XgEaRTKvw9uH+oL/WnXz78ySMlPmKOw+TC9a5+lyVhQrasDg/XtFD+DK+9M10q5nym5icxhmO2t+ZD9w9KmZaFYKaR30PAB/BXnZCsK2EAvhzjyZcz2nSGf2tb17BG9U8SBmSj6XC/txfi6tTlODR/0y+4O7X4t5uaLsZjivBgqEDZrL2+qeKE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/10 0:10, Oscar Salvador wrote: > On Tue, Apr 09, 2024 at 04:10:22PM +0200, Oscar Salvador wrote: >> On Sun, Apr 07, 2024 at 04:54:56PM +0800, Miaohe Lin wrote: >>> In short, below scene breaks the lock dependency chain: >>> >>> memory_failure >>> __page_handle_poison >>> zone_pcp_disable -- lock(pcp_batch_high_lock) >>> dissolve_free_huge_page >>> __hugetlb_vmemmap_restore_folio >>> static_key_slow_dec >>> cpus_read_lock -- rlock(cpu_hotplug_lock) >>> >>> Fix this by calling drain_all_pages() instead. >>> >>> Signed-off-by: Miaohe Lin >> >> Acked-by: Oscar Salvador Thanks. > > On a second though, > > disabling pcp via zone_pcp_disable() was a deterministic approach. > Now, with drain_all_pages() we drain PCP queues to buddy, but nothing > guarantees that those pages do not end up in a PCP queue again before we > the call to take_page_off_budy() if we > need refilling, right? AFAICS, iff check_pages_enabled static key is enabled and in hard offline mode, check_new_pages() will prevent those pages from ending up in a PCP queue again when refilling PCP list. Because PageHWPoison pages will be taken as 'bad' pages and skipped when refill PCP list. > > I guess we can live with that because we will let the system know that we > failed to isolate that page. We're trying best to isolate that page anyway. :) Thanks for your thought. . > >