From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFB1BC8303C for ; Mon, 7 Jul 2025 11:51:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 447568D000D; Mon, 7 Jul 2025 07:51:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3F7C58D0002; Mon, 7 Jul 2025 07:51:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2E7348D000D; Mon, 7 Jul 2025 07:51:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 17E3D8D0002 for ; Mon, 7 Jul 2025 07:51:43 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 91478592B4 for ; Mon, 7 Jul 2025 11:51:42 +0000 (UTC) X-FDA: 83637304044.27.35BF101 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf29.hostedemail.com (Postfix) with ESMTP id ABC5B120006 for ; Mon, 7 Jul 2025 11:51:39 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751889100; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yWAJDzFKZhbKoXjmZITugS1X9N/ZDYmGhILcnS+d+eM=; b=3FEHhKkKJHJ/AhjnBaeL98P8fqgIQ+lC3MbLo0UcVf7E0CjwJ3Af4aje1qcNFGFHM1qPMo aJjUJDItDPLuhBS0FqD93j7Bhfl+XSkXYNww/iNZHH9PjG9uiSwhnMynaUuDdGBxvrZ9G1 FrpsRrjIEHDe1M9wZwr8qMYL1V97A7o= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf29.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751889100; a=rsa-sha256; cv=none; b=utjtIAQERhLshtLIeG5qisdBqiTZEHUGDvi1JNcSP2AMrJwsa/Ihj1q29FXnfofN/UDChv jZJzoHE61rYcsz+4ExJfdlbFjObTPmF2JSfPKVNosfoGaiBqA1KKlrbldfn4tbqMuKyh2W llV7a9LAjow7zx2ojrG2UP/AhF9t7NU= Received: from mail.maildlp.com (unknown [172.19.163.174]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4bbMvt6qKgz13MXn; Mon, 7 Jul 2025 19:48:58 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id 4AEB31400D1; Mon, 7 Jul 2025 19:51:35 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Mon, 7 Jul 2025 19:51:34 +0800 Message-ID: <924d9d25-e53c-f159-6ec0-e1fd4e96d6e2@huawei.com> Date: Mon, 7 Jul 2025 19:51:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: fix hwpoisoned large folio handling in do_migrate_range To: David Hildenbrand , Oscar Salvador CC: , , , , References: <20250627125747.3094074-1-tujinjiang@huawei.com> <20250627125747.3094074-3-tujinjiang@huawei.com> <373d02c5-2b62-8543-b786-8fd591ad56eb@huawei.com> <61325284-d1d6-a973-8aa7-c0f226db95fa@huawei.com> <7b2c054b-fc33-4127-aaa9-9edf6a63e142@redhat.com> From: Jinjiang Tu In-Reply-To: <7b2c054b-fc33-4127-aaa9-9edf6a63e142@redhat.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: kwepems500002.china.huawei.com (7.221.188.17) To kwepemo200002.china.huawei.com (7.202.195.209) X-Rspamd-Queue-Id: ABC5B120006 X-Stat-Signature: gr578wiyaarag9wjntra7xgwm6m9ztnj X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1751889099-916325 X-HE-Meta: U2FsdGVkX1+rQqa0QLfvmKJNSjG6Ej35TV6uKbZh2ri8qr5Y2adkwUth/YMdr8UjH/2yMBcc9v1F9/IXQg/QTfuZc8lMiCZITyodF4iUZIUQkO1ZKfyHjtx+HK6WSSzJDOIzJxiM2YUQsq3Yo93vB//8mDDOHkjXc1ZlbUjxJXVORleQHgzvp9Cf2uHI+VTWjJBNCGDXzcSBFYD6BisOtHAX0NsEmCgRZJRS5gftFt6iK+2Tt0Hr69P6Mg63xmNBip3DUrgtT5tRoK9kmdaZha8G9561PIIPg7wSWzJ1sFP/lqqfFaJziWUg1xSejC8wHT8q4SxBec09dGCKH0A4TrgzMM6xOeWc2CXMrgjxCBwUMe45LYC7L48b/TP9vrQnZhbgfI75OgIM6q7TDTqW+wZQjEmDm13r+iUrFlGSX1OQkCAldehkJqeNwRie6DNUHBJ4L9iUraEz/HNuC6rrFs+1Neux8Tmh/cQTZA8wM4IWjIP2k//AuvxuH+4Zv+2duhWPEQdIZuJvIcZ6CGhPqKTku86APttndSwkaQyhTktBbYGrfBFYOlDV1LL2y9WuhgbQDCNfOYi44SFV39gr18YxoRTy4SlWE87oMrcFDwzchoz2QUmq3ty9iQ2GUDPv3eRjeLtuSAWH2/zdp6st72sxb2Fbrk7CG+/tyxidRJquAY+xVq9ziFZHhq8+llMcpHX+TG5ivLUj/tAJk0L1IDf/yZN51///k2wmnFCHRz5QLplcB3CBAWMd6atif3SZ+/Wd+oRBdgo2NmjdR+7QkZqVANe+wlNUIQ57eAvNwIAcIicnuYCzpnGxJVDz5TkqRD7bJy4wSXijxkjINbfs/6TG2mJBlqyNt3Me6T3lt2U8Ums6pNcEDvY2EipWpf2EQnhIL8KrWMyaXEMlXsAP+yeqj/2Icmg/du5OUIC8Ic33Q9gVfKWPs9b3QH0tvJf3fyMmvFfoSZX+NS3zQRP msV85e23 QRtbQ+56k1rpmGwU9XeeG3oSYsFI8NVAuxTuX222cfMxkBfLaWxbnabZ4yqndfM/Xy82c0nUanCn/MJmeb05PDw3owev9v9k0uVQ3d/w5RMJohe+J6QQ0ERmnRZz3WKO0IlBu51roF+s+Jta03FPJgFCYrgyr/TYxPyWJQ2ob3CBePOknE62G462m+2auIgjcdszkDD3KFwE2ygDsVQOYh6nY8IlyITiaJEh/o40JTIevg99nzbhdUL0t63S5ntpv5Bzj6XgoqxxZaWw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/7/3 17:06, David Hildenbrand 写道: > On 03.07.25 10:24, Jinjiang Tu wrote: >> >> 在 2025/7/3 15:57, David Hildenbrand 写道: >>> On 03.07.25 09:46, Jinjiang Tu wrote: >>>> >>>> 在 2025/7/1 22:21, Oscar Salvador 写道: >>>>> On Fri, Jun 27, 2025 at 08:57:47PM +0800, Jinjiang Tu wrote: >>>>>> In do_migrate_range(), the hwpoisoned folio may be large folio, >>>>>> which >>>>>> can't be handled by unmap_poisoned_folio(). >>>>>> >>>>>> I can reproduce this issue in qemu after adding delay in >>>>>> memory_failure() >>>>>> >>>>>> BUG: kernel NULL pointer dereference, address: 0000000000000000 >>>>>> Workqueue: kacpi_hotplug acpi_hotplug_work_fn >>>>>> RIP: 0010:try_to_unmap_one+0x16a/0xfc0 >>>>>>     >>>>>>     rmap_walk_anon+0xda/0x1f0 >>>>>>     try_to_unmap+0x78/0x80 >>>>>>     ? __pfx_try_to_unmap_one+0x10/0x10 >>>>>>     ? __pfx_folio_not_mapped+0x10/0x10 >>>>>>     ? __pfx_folio_lock_anon_vma_read+0x10/0x10 >>>>>>     unmap_poisoned_folio+0x60/0x140 >>>>>>     do_migrate_range+0x4d1/0x600 >>>>>>     ? slab_memory_callback+0x6a/0x190 >>>>>>     ? notifier_call_chain+0x56/0xb0 >>>>>>     offline_pages+0x3e6/0x460 >>>>>>     memory_subsys_offline+0x130/0x1f0 >>>>>>     device_offline+0xba/0x110 >>>>>>     acpi_bus_offline+0xb7/0x130 >>>>>>     acpi_scan_hot_remove+0x77/0x290 >>>>>>     acpi_device_hotplug+0x1e0/0x240 >>>>>>     acpi_hotplug_work_fn+0x1a/0x30 >>>>>>     process_one_work+0x186/0x340 >>>>>> >>>>>> In this case, just make offline_pages() fail. >>>>>> >>>>>> Besides, do_migrate_range() may be called between memory_failure set >>>>>> hwposion flag and ioslate the folio from lru, so remove WARN_ON(). >>>>>> In other >>>>>> places, unmap_poisoned_folio() is called when the folio is >>>>>> isolated, obey >>>>>> it in do_migrate_range() too. >>>>>> >>>>>> Fixes: b15c87263a69 ("hwpoison, memory_hotplug: allow hwpoisoned >>>>>> pages to be offlined") >>>>>> Signed-off-by: Jinjiang Tu >>>>> ... >>>>>> @@ -2041,11 +2048,9 @@ int offline_pages(unsigned long start_pfn, >>>>>> unsigned long nr_pages, >>>>>>                    ret = scan_movable_pages(pfn, end_pfn, &pfn); >>>>>>                 if (!ret) { >>>>>> -                /* >>>>>> -                 * TODO: fatal migration failures should bail >>>>>> -                 * out >>>>>> -                 */ >>>>>> -                do_migrate_range(pfn, end_pfn); >>>>>> +                ret = do_migrate_range(pfn, end_pfn); >>>>>> +                if (ret) >>>>>> +                    break; >>>>> I am not really sure about this one. >>>>> I get the reason you're adding it, but note that migrate_pages() can >>>>> also return >>>>> "fatal" errors and we don't propagate that. >>>>> >>>>> The moto has always been to migrate as much as possible, and this >>>>> changes this >>>>> behaviour. >>>> If we just skip to next pfn, offline_pages() will deadloop meaningless >>>> util received signal. >>> >>> Yeah, that's also not good, >>> >>>> It seems there is no document to guarantee memory offline have to >>>> migrate as much as possible. >>> >>> We should try offlining as good as possible. But if there is something >>> we just cannot possibly migrate, there is no sense in retrying. >>> >>> Now, could we run into this case here because we are racing with other >>> code, and actually retrying again could make it work? >>> >>> Remind me again: how exactly do we arrive at this point of having a >>> large folio that is hwpoisoned but still mapped? >>> >>> In memory_failure(), we do on a  large folio >>> >>> 1) folio_set_has_hwpoisoned >>> 2) try_to_split_thp_page >>> 3) if splitting fails, kill_procs_now >> If 2) is executed when do_migrate_range() increment the refcount of the >> folio, the split fails, and retry is meaningless. > > kill_procs_now will kill all processes, effectively unmapping the > folio in that case? > > So retrying would later just ... get us an unmapped folio and we can > make progress? > kill_procs_now()->collect_procs() collects the tasks to kill. But not all tasks that maps the folio will be collected, collect_procs_anon()->task_early_kill()->find_early_kill_thread() will not select the task (not current) if PF_MCE_PROCESS isn't set and sysctl_memory_failure_early_kill isn't enabled (this is the default behaviour).