From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74745C04FFE for ; Fri, 17 May 2024 07:02:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 871D86B007B; Fri, 17 May 2024 03:02:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 822B36B0083; Fri, 17 May 2024 03:02:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 710766B0085; Fri, 17 May 2024 03:02:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 531C16B007B for ; Fri, 17 May 2024 03:02:00 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CD5BCA045A for ; Fri, 17 May 2024 07:01:59 +0000 (UTC) X-FDA: 82126993158.03.A4E7061 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id A70EEC001A for ; Fri, 17 May 2024 07:01:57 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1715929318; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Kdbc7rrFn/Qsa38F+lXiceHhLF8KdmgFlLVjbfUyEVM=; b=51JGfAljkrXAyNQyT68AJvv6P6BN5jJ8gT/aGRHGup1gfIjVHN2sIov09YUmn2fzPnlocN /1UAKNaW3sa+qwgUA2oH+NorDQHSYIqWCcvERWDNaU631go0QFog6P2ajykjcRYipMQ8Gg 62aUV8ZAJfSC7ixU4p1i+f6Sow9Tqj4= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; spf=pass (imf22.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1715929318; a=rsa-sha256; cv=none; b=bo/YAHM8Cr+JDNj2Y8QO0HyGOLFlKlalUXYKQXhWoiGW/YVg2XL37eIyn9vGREsBn6arS7 WUIrlX3Mu+f0pgtXtrnx7VKtopkAG1guQAWoghkrXoRaAkgrPa4X6QE5RV804ODD2bz5f1 2CYaCiSgcHu9Ax85Qvkusxi4oJmtv3U= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AB8731424; Fri, 17 May 2024 00:02:20 -0700 (PDT) Received: from [10.163.37.42] (unknown [10.163.37.42]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 833643F7A6; Fri, 17 May 2024 00:01:53 -0700 (PDT) Message-ID: Date: Fri, 17 May 2024 12:32:06 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3] mm/huge_memory: don't unpoison huge_zero_folio To: Miaohe Lin , akpm@linux-foundation.org Cc: shy828301@gmail.com, nao.horiguchi@gmail.com, xuyu@linux.alibaba.com, david@redhat.com, osalvador@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240516122608.22610-1-linmiaohe@huawei.com> Content-Language: en-US From: Anshuman Khandual In-Reply-To: <20240516122608.22610-1-linmiaohe@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: A70EEC001A X-Stat-Signature: 83ceki77y4wmkxj1orjbhnx1dyf34s95 X-HE-Tag: 1715929317-890738 X-HE-Meta: U2FsdGVkX1+HDmM8X4dJTYKvtiQPmgGjsD1CDDE6gHtFDLv8RwzBhLdEu64iU+ku5pId/4CjOrNTmb57rqHm3fIaYIA371rCwXM2tBlGkOh3DTfFJoPVJb1Nm8yKW6uLc4OewrOnXARMua4m2WnQkCY9V/nv5WWEyDvoP7tf2UTUkYtAS4X+eHcN3sZw+7se7CNT2EYAI3fOW/sek8FypZtplvg81oTduj+rxSIbrTeB4DYXOgAD8phoKsAzl1PW8zVflLfMmck6rG/sKde+LRveR5TydkMtaEgJfM80If8G+tjrd316yIh7k+1rsMG7rVaiNwh55fn4ZVV/Js8Ghp7Bizf6WKyEldlGmLkBIF63zkkA024NdZseqXY0XTCfooMmUDlB1r2j8nUSa+8SGehy0ZMcfS/z5gCdfX26ja7GeAtPrjUeIZ/g6OyFR6lf/4njQ5MRKTXTlAXbu8MLajM62fmffJ9xPAf9pStD6xE80PbNBHZqrXDfMmnOppEcnIkwdklLRIaEVP1BS3h/cZ7+yQ2RuXOqTBkxre56pE1hBQkT6tk4gbCZexWE0J+xK7JlSN4c1awuovHmqOtkp0BhPIBTajIMT91PPRGavJh7PQiBClfdkQPcCabIQX2T70dixV6SsbxK0W56cJ01w6hfwpcAB3pEUuDv2EFtE2zvN3WjwNqnMz5PWKhGucwfuqSv5dlvZxjyqX6FgeB9dQYl3GLoMfRh31bIco7foOjbh0KThU3a/zvJ8laHiX5jPGmtn7AeMleqiV7OSyJRNJQtCHg/jE8gzHePJEsmJ8T2Q3hXmXCPDXYjjjPOVsuh0UxLil9pVMKygWgLWUm7IPWElAI4T6VQtPk58G4guqmIWvmZK8blIpr3rYwjZSuVPfA9SdukZJhBPs2pO8uwhAefcyWLvEB95Co5UbtfGVzJABa56PNSVEuU0kxPNUblHP+lyQmWd2GnC7aTnhZ aEepZLj+ 8SR3A2Hw1F8rAqNDqEciiYHTAZVyEf1t9jLv9N/lNcaVJDFCoE3ByMKH4x2njmTT4o4KYdOB+jk5VbWvjZJYuDqNVa3HTGMZ6WJ6I0x0RR2LvmGiQWW7TL4Tb+oGeZy/YCH2FDMzMDJOks8blJ/GZiEKNWaMvf65kSV6+5iRLfkTyTfPPd4181862A3y3nvdLgWtLPq+fi0c2PdSz1T4lBgm3r1645LI6hUDaXroGc9nqLdm3SoTuBjTKKWK010+ICYPb07d8aTb3BWRWOZfkMIK2CLEon5eiQb4JERYfEmH1ZimPz9NMiwsdbSesvXJl37jvsoY73zTQ/mP2DzVZ89NXgS5+Xk6XtZ3aA1FDGVIysIdoXkMERWiGWvtO1aebmPBz0coe2Alk7OF5aIR8Oqjov8wCaJs1WUpm9sMP0zxNef+8wFLqLemEHiXNYu5TOerR X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/16/24 17:56, Miaohe Lin wrote: > When I did memory failure tests recently, below panic occurs: > > kernel BUG at include/linux/mm.h:1135! > invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 > RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 > RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 > RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 > RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 > RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 > R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 > FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 > Call Trace: > > do_shrink_slab+0x14f/0x6a0 > shrink_slab+0xca/0x8c0 > shrink_node+0x2d0/0x7d0 > balance_pgdat+0x33a/0x720 > kswapd+0x1f3/0x410 > kthread+0xd5/0x100 > ret_from_fork+0x2f/0x50 > ret_from_fork_asm+0x1a/0x30 > > Modules linked in: mce_inject hwpoison_inject > ---[ end trace 0000000000000000 ]--- > RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 > RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 > RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 > RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 > RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 > R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 > FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 > > The root cause is that HWPoison flag will be set for huge_zero_folio > without increasing the folio refcnt. But then unpoison_memory() will > decrease the folio refcnt unexpectly as it appears like a successfully Small nit, a typo in here ^^^^^ s/unexpectly/unexpectedly/. > hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) > when releasing huge_zero_folio. > > Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. > We're not prepared to unpoison huge_zero_folio yet. > > Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") The target commit looks right. > Signed-off-by: Miaohe Lin > Acked-by: David Hildenbrand > Reviewed-by: Yang Shi > Reviewed-by: Oscar Salvador > Cc: > --- > v3: > Move up is_huge_zero_folio() check and change return value to > -EOPNOTSUPP per Oscar. > Collect Reviewed-by and Acked-by tag. Thanks. > v2: > Change to simply check for the huge zero page per David. Thanks. > --- > mm/memory-failure.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index 16ada4fb02b7..a9fe9eda593f 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -2546,6 +2546,13 @@ int unpoison_memory(unsigned long pfn) > goto unlock_mutex; > } > > + if (is_huge_zero_folio(folio)) { > + unpoison_pr_info("Unpoison: huge zero page is not supported %#lx\n", > + pfn, &unpoison_rs); > + ret = -EOPNOTSUPP; > + goto unlock_mutex; > + } > + > if (!PageHWPoison(p)) { > unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", > pfn, &unpoison_rs); This patch applies on latest linux-next but not on latest mainline as is_huge_zero_folio() is absent there. Reviewed-by: Anshuman Khandual