From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 337D9C83F34 for ; Sun, 20 Jul 2025 02:23:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49F546B008C; Sat, 19 Jul 2025 22:23:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 475066B0092; Sat, 19 Jul 2025 22:23:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 38B036B0093; Sat, 19 Jul 2025 22:23:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 261546B008C for ; Sat, 19 Jul 2025 22:23:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 6AF4EB9610 for ; Sun, 20 Jul 2025 02:23:04 +0000 (UTC) X-FDA: 83683045488.14.04011E0 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf14.hostedemail.com (Postfix) with ESMTP id A7C59100004 for ; Sun, 20 Jul 2025 02:23:02 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=sUyWXpK7; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752978182; a=rsa-sha256; cv=none; b=U468PShXC5ikO1m/61QG4hvlPPXiTukIadSiFTvQvoGMDIA4z3r+UOw5Ct1lp4dDZrfEf6 +zQlv57JP5loLXg5vu2wDGh9jdAhi8mZbVik6RLDhJ6oCWtNDyD/M97XYNulVkNwyFl41X zefoMkhMyc+jhHW7kTqjv7VMQVF6LIo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=korg header.b=sUyWXpK7; dmarc=none; spf=pass (imf14.hostedemail.com: domain of akpm@linux-foundation.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=akpm@linux-foundation.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752978182; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5nwD5JtH7f3OpHCwtVYaCouM4JDGRuYCu64TBK7P9n0=; b=A9oUvk6Pgk+p2sndnYKR/0+UEIfETxrRTTHrCRFk+ajdkOrUXiFGjO3SkbeQDaW2p87VN5 RIrce9bgOu9ctvVOXmpeljjN7jU3hRvUMdbcahQbVWOwdAXeEwdRLrndaU70n6g5VaNBWd HG+N8ebHJ+o8rkvZ/+9MmoXZnbCj9V8= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 9B1C85C1D0B; Sun, 20 Jul 2025 02:23:01 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DC4E9C4CEE3; Sun, 20 Jul 2025 02:23:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linux-foundation.org; s=korg; t=1752978181; bh=r4BxG2i5ml7Q8DwROg0hwUaB4hI5MhUrv7gAVMJ2PKs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=sUyWXpK7U4cMqzv9tNk2aScbS7JxLBvRhfnjoqqZ7tV/SThAUbDox1prsL2rq704M yWTD1+s1UvIQ08VEBRTIJlhyZ1xADuAv1FxXDNGfRicsyy1b9VnkHj+4323yuCocBi 3VuddJx8SLm043zYdhSpxfo9m7efzIOYUKVwJ8zs= Date: Sat, 19 Jul 2025 19:23:00 -0700 From: Andrew Morton To: David Hildenbrand Cc: Zi Yan , "Pankaj Raghav (Samsung)" , Matthew Wilcox , Luis Chamberlain , Jinjiang Tu , Oscar Salvador , linmiaohe@huawei.com, mhocko@kernel.org, linux-mm@kvack.org, wangkefeng.wang@huawei.com Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range Message-Id: <20250719192300.9e32c35ddc49f11c7954306b@linux-foundation.org> In-Reply-To: <66bc7274-ec2a-423a-8094-b8d4cc9438fe@redhat.com> References: <61325284-d1d6-a973-8aa7-c0f226db95fa@huawei.com> <7b2c054b-fc33-4127-aaa9-9edf6a63e142@redhat.com> <924d9d25-e53c-f159-6ec0-e1fd4e96d6e2@huawei.com> <4c5d4fd5-5582-11d8-9fee-24828ac1913d@huawei.com> <8c9719f0-c072-40bb-b7f6-6f2cc41a31dc@redhat.com> <1D589FE5-3515-4ED5-B12E-D5CE23BA5D13@nvidia.com> <641F5B0B-2B48-46FA-AC58-3A8A4BEB1448@nvidia.com> <3702f6b0-27a9-4ca1-adbd-fb1e2985b2d3@redhat.com> <345f7ae6-b2d6-44cd-b8b6-2bdd4b33e9d6@redhat.com> <66bc7274-ec2a-423a-8094-b8d4cc9438fe@redhat.com> X-Mailer: Sylpheed 3.8.0beta1 (GTK+ 2.24.33; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: A7C59100004 X-Stat-Signature: ep63khdwajh57ofbnz6p4158ehxgqqqk X-HE-Tag: 1752978182-63557 X-HE-Meta: U2FsdGVkX1/AGEYbY9wYx45yUtB0Kcx64ll5KS+POjvCQ5S7wuAdgmbxMt+Mub1/e6lzFDHwRNofDW4DTDxDLkpR6bgN92q4O4aZF5o7w9NsoPgWQYuoX9S3epBYVMNkc3F8mnJeMqDLgvn7qSriYcstE9BuUMqgmp5xF9Vbl127anouKKn3+2v2MNUIrhBduYGxR/XsUsuBRRYKF5VRznQK/fhkbGl+/LUrBb7TWir3VcVsd96geXZlO4rVtLTgLV6ssFjwIbcVCJNHvv+PgxrT4RAPB2sk6Baaev+jkDwvWU8QdyvLP1bJLUQ2L36FKOQnl6QP5xIiAOQHMXEibu434qByhaE9nJE+HFbjwYvEZPS0pJ2gPyPqWRHoIM8GsMpb+dpHID6bFy9NEUaSK1OC/mnmIicDIgxDGemMKQ5KfaaBdZdR6fOaLvOBFwACAnI5wPtyfP2SojHnNZrFE5S6b4I4nVhFCYDMMVLj5ErKn5b/WO9CDDICugR94UuxLGBaeh+J1NGGaoh5F6l0qDGSSJMxkxupiGEJbEMeOY7p4xU3JWD4gr0gNc1USWhVv1zEwXsvueaUCixGQ79z0ezZyklq5vT6Z2F6OdwAyTnjX5EgI14oI40z/TqjTvkGQUP2ZrE9ZUyFODh/OKtDuktx0mX1hPUscLUFoHamI9uMOokIcPtTr1Pr2eQ0dNQLnutKVVRoiP+BoBnRgcgHcnAG4XuMg0BUFdpWDlgkwYYcFqndhnzUNWrvHl43coq9p9vzzNJ61xJsDFfqAJpe+pKQ+sOUXK51yKJF/sJHGkXDBhUAujyz45opx+ZWCAby1nek8IC1/K6GN67+g0IbrcMiBVpnb+fYTy6jVV1Bkv6zTNtn4NGqt2DUWjuDQWuhglV9zawqchnsgrRz8ndZTBlXcoH+4jXifqu/1z03q2eHL610yer41wGNxM89r146UrNcLzvTpFBkaToVhxH dwe89Xeg 5pOcb4dxVzso2mE8p4e9+Kd6lI9ERd1qq+i7Qc/PvbIwZCU/wa6/2THQwgZq4Enmz3BphS1JwAKF6Lb6B2hG1vqkIOmzCedL72iRQFICJ37H0sfgTxLSciHhgo3h60JD/xFNmxMs7IxgjrV8oiDf20KJxma9qtHOtOl/KbWwZHYSRQsigSRAydLhNJWNjEI/6Eo6el4jCzEfbfpcxVmB/i8ZbkuwMDjGUeCv3wzzJlxVD4HKkxBhwUpjl3QhXiLYJ7ByzVHaLYiXXmWMKfpLDBDy1y/K+XkuqqN4xgvDnt2HD5vpDGE5rY4EHWOr6qWIxnD34pWUcevUitvEIzVnJJNTyXO1hxoGzOhYjEFkJ966ZkuGREE4zA268Psz9+coaYl1j X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I continue to retain the original patch in mm-hotfixes as part of akpm's lame bug-tracking system. 3 weeks in -next. And I just added a cc:stable to it because December 2018. I don't expect many real-world users will be putting fake delays in memory_failure(), but it's there. So what do we do here? Add a TODO, merge it under the better-than-it-was-before theory and move on? From: Jinjiang Tu Subject: mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range Date: Fri, 27 Jun 2025 20:57:47 +0800 In do_migrate_range(), the hwpoisoned folio may be large folio, which can't be handled by unmap_poisoned_folio(). I can reproduce this issue in qemu after adding delay in memory_failure() BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:try_to_unmap_one+0x16a/0xfc0 rmap_walk_anon+0xda/0x1f0 try_to_unmap+0x78/0x80 ? __pfx_try_to_unmap_one+0x10/0x10 ? __pfx_folio_not_mapped+0x10/0x10 ? __pfx_folio_lock_anon_vma_read+0x10/0x10 unmap_poisoned_folio+0x60/0x140 do_migrate_range+0x4d1/0x600 ? slab_memory_callback+0x6a/0x190 ? notifier_call_chain+0x56/0xb0 offline_pages+0x3e6/0x460 memory_subsys_offline+0x130/0x1f0 device_offline+0xba/0x110 acpi_bus_offline+0xb7/0x130 acpi_scan_hot_remove+0x77/0x290 acpi_device_hotplug+0x1e0/0x240 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x186/0x340 In this case, just make offline_pages() fail. Also, do_migrate_range() may be called between memory_failure() setting the hwposion flag and isolation of the folio from the lru, so remove WARN_ON(). Also, in other places unmap_poisoned_folio() is called when the folio is isolated, so obey that in do_migrate_range(). Link: https://lkml.kernel.org/r/20250627125747.3094074-3-tujinjiang@huawei.com Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined") Signed-off-by: Jinjiang Tu Cc: David Hildenbrand Cc: Kefeng Wang Cc: Miaohe Lin Cc: Michal Hocko Cc: Oscar Salvador Cc: Signed-off-by: Andrew Morton --- mm/memory_hotplug.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) --- a/mm/memory_hotplug.c~mm-memory_hotplug-fix-hwpoisoned-large-folio-handling-in-do_migrate_range +++ a/mm/memory_hotplug.c @@ -1795,7 +1795,7 @@ found: return 0; } -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) { struct folio *folio; unsigned long pfn; @@ -1819,8 +1819,10 @@ static void do_migrate_range(unsigned lo pfn = folio_pfn(folio) + folio_nr_pages(folio) - 1; if (folio_contain_hwpoisoned_page(folio)) { - if (WARN_ON(folio_test_lru(folio))) - folio_isolate_lru(folio); + if (folio_test_large(folio) && !folio_test_hugetlb(folio)) + goto err_out; + if (folio_test_lru(folio) && !folio_isolate_lru(folio)) + goto err_out; if (folio_mapped(folio)) { folio_lock(folio); unmap_poisoned_folio(folio, pfn, false); @@ -1877,6 +1879,11 @@ put_folio: putback_movable_pages(&source); } } + return 0; +err_out: + folio_put(folio); + putback_movable_pages(&source); + return -EBUSY; } static int __init cmdline_parse_movable_node(char *p) @@ -2041,11 +2048,9 @@ int offline_pages(unsigned long start_pf ret = scan_movable_pages(pfn, end_pfn, &pfn); if (!ret) { - /* - * TODO: fatal migration failures should bail - * out - */ - do_migrate_range(pfn, end_pfn); + ret = do_migrate_range(pfn, end_pfn); + if (ret) + break; } } while (!ret); _