From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA875C35FFC for ; Thu, 20 Mar 2025 02:51:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3E3A280002; Wed, 19 Mar 2025 22:51:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEE8E280001; Wed, 19 Mar 2025 22:51:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB67C280002; Wed, 19 Mar 2025 22:51:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AB7BD280001 for ; Wed, 19 Mar 2025 22:51:05 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B789F809D7 for ; Thu, 20 Mar 2025 02:51:06 +0000 (UTC) X-FDA: 83240402532.14.6448BD7 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf19.hostedemail.com (Postfix) with ESMTP id 328811A0009 for ; Thu, 20 Mar 2025 02:51:02 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742439064; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gIVP9l+fN0UiV/7LKQ4EnNQc8i3BKx0eyVxvYbgovKg=; b=tIsmKbNSudA5Mw2lmlRddE4IxdVPnher1ZwyVsjVfUBPZrtaWRBOAXWXkLpQzsh6AhcTLB z7ZjGCRawsHJzcWPURBcN3ZlyO2SCwdPL8gvv9in87NfYOrDP0sSH9boW3BGVIujTmk2eo GTABxyIbNNowvCecn1HYcE4sP32KiJI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742439064; a=rsa-sha256; cv=none; b=d1khjb/bS8bNzYNZ77F/NC5fYcVcoIMlPMQbb4QoL4gHeJCWBJWqZJJCBkPpfGHFjj3EVi 8u49tb03B7bUj2YlBWwa6+40UJ0Ml3G4l4cATFKR5lxtks2CbfXjh7q1X4o5eNZ7IjY0Bq 0FdFJvq/2nIJj9ZA5MSSx8mrpL/qqV0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; spf=pass (imf19.hostedemail.com: domain of linmiaohe@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=linmiaohe@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com Received: from mail.maildlp.com (unknown [172.19.162.112]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4ZJ92B5C2pz2SSlC; Thu, 20 Mar 2025 10:46:26 +0800 (CST) Received: from kwepemd200019.china.huawei.com (unknown [7.221.188.193]) by mail.maildlp.com (Postfix) with ESMTPS id 199A0140142; Thu, 20 Mar 2025 10:50:58 +0800 (CST) Received: from [10.173.127.72] (10.173.127.72) by kwepemd200019.china.huawei.com (7.221.188.193) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 20 Mar 2025 10:50:57 +0800 Subject: Re: [PATCH 2/2] mm/vmscan: don't try to reclaim hwpoison folio To: Jinjiang Tu CC: , , , , , References: <20250318083939.987651-1-tujinjiang@huawei.com> <20250318083939.987651-3-tujinjiang@huawei.com> From: Miaohe Lin Message-ID: <8a5ecc18-55f2-4c58-e52e-7fda91e27088@huawei.com> Date: Thu, 20 Mar 2025 10:50:56 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20250318083939.987651-3-tujinjiang@huawei.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.173.127.72] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemd200019.china.huawei.com (7.221.188.193) X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 328811A0009 X-Stat-Signature: ytgdk1tc13w5g3w7ct4izw34anf5h534 X-HE-Tag: 1742439062-989722 X-HE-Meta: U2FsdGVkX18my+3RLVMvGsL6RTJP4FsM/MKUwO1Sk05AMres+KT/pbSCqdisbrsyhCX3uZs1leL0A3T6H5YVSXP4f7HdgEC8SAnF7BjSzfLZ/TGYPqIRclXKT7V6Cruj7YS1oTdqePJMcAu3rGsupnxFqZ/1MvO4p2xzfzNelbESzBjcx3z+G7TD4+BQPbodDVCZlE5KP4pCG0mf0md77CqMpyfioRjEOGBN0SF/CPT3Bo98NYcfmeJ+Zj83WywVpzMN/NnwUfQN5bI155+V0+UhrzwIXxdAiIGko9chuN9NptKEQQktmNM030AFtW65oZdQRS18xB1bG8yzev6peVkqNhZHJ18DPNcY5KuWsibTiLqOzsBfUR3lZK+gK2dxzTrkRMhvtf9dnsYChZTk3lkzsgBmh8jlqvj+Se81x6RzRUgkzirfpclUZAY7Uq4gWXnxiIyhwRkDLJbjDyvxcaouGMEM/POb/DBDMJC2UQnKQw9XCZrvimpzPY1zJs4c539nXa5AKeY+IMqImP8Wv+cSPWdYG8Ulk9zw+OxjQqJuPVyZe9pHozc4Tkm/G8qYpBr6tACM24rQwOD/oKII4SfmwccbxT1LHxvI8alSMCsRxj/J0THb+PkF2XRyk6jy0SlTztKEI1LMqBKPc/48e1FrgCcM3/fVwB+QdmeW76PCIF7TSY3gZgkjhq5GGdiQ629A5uTU8Ybv5rKTyOH/6O9YUcsscOA53a0bwePaPnoHf62zbRCV/JJ0IFU2xLLdz2N7SKqVZQD3O/ufKJBhtLwNojzLSxlWmQ/+2961iWXbQngqN+KmdL+GBv4Iw+k4AWXvtXgybVsWnCnac5rStTEQ67pQALL1G1o1I2u0D1VZrpM7ar9SvbgXdKiXzOHp3j8TNjcrN+xl5mCSeB3CN3FUSbp0BTwT/s7gnPqA3I8uC85ru8rP6gaAVd9mJpSqf/fMH9NUdzMx9Ql0Lew sSblCWsF Rf3awjX8IDCOvEmPCbAWA1XqhYeLFKLcpEOvqPWNs+aPDni3YRwqJg7Z28onLCXCBR+/Su4sgYMUcvkG0Al7kK8x6ISgf6atLWNmdfnVm3R97MvhWMcuTwCex9gU+qv3p/kUTsgmCrwMGaVNHSXNvMz8CJXiX4hk9LH18z2zx5PBn8l0/RZnPGqYvS5hls0N7l4NU0Zm+Qb2/VFjCnyFTpWWrpJj5mx8c4d+o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/3/18 16:39, Jinjiang Tu wrote: > Syzkaller reports a bug as follows: Thanks for your fix. > > Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000 > Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users > Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed > page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e > memcg:ffff0000dd6d9000 > anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff) > raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9 > raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000 > page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio)) > ------------[ cut here ]------------ > kernel BUG at mm/swap_state.c:184! > Internal error: Oops - BUG: 00000000f2000800 [#1] SMP > Modules linked in: > CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3 > Hardware name: linux,dummy-virt (DT) > pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) > pc : add_to_swap+0xbc/0x158 > lr : add_to_swap+0xbc/0x158 > sp : ffff800087f37340 > x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780 > x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0 > x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4 > x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000 > x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c > x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b > x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000 > x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001 > x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000 > Call trace: > add_to_swap+0xbc/0x158 > shrink_folio_list+0x12ac/0x2648 > shrink_inactive_list+0x318/0x948 > shrink_lruvec+0x450/0x720 > shrink_node_memcgs+0x280/0x4a8 > shrink_node+0x128/0x978 > balance_pgdat+0x4f0/0xb20 > kswapd+0x228/0x438 > kthread+0x214/0x230 > ret_from_fork+0x10/0x20 > There are too many races in memory_failure to handle... > I can reproduce this issue with the following steps: > 1) When a dirty swapcache page is isolated by reclaim process and the page > isn't locked, inject memory failure for the page. me_swapcache_dirty() > clears uptodate flag and tries to delete from lru, but fails. Reclaim > process will put the hwpoisoned page back to lru. The hwpoisoned page is put back to lru list due to memory_failure holding the extra page refcnt? > 2) The process that maps the hwpoisoned page exits, the page is deleted > the page will never be freed and will be in the lru forever. Again, memory_failure holds the extra page refcnt so... > 3) If we trigger a reclaim again and tries to reclaim the page, > add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is > cleared. > > To fix it, skip the hwpoisoned page in shrink_folio_list(). Besides, the > hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap > it in shrink_folio_list(), otherwise the folio will fail to be unmaped > by hwpoison_user_mappings() since the folio isn't in lru list. > > Signed-off-by: Jinjiang Tu Acked-by: Miaohe Lin Thanks. .