From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7E8FC28B30 for ; Thu, 20 Mar 2025 03:38:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AD04280002; Wed, 19 Mar 2025 23:37:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95CFE280001; Wed, 19 Mar 2025 23:37:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 824EF280002; Wed, 19 Mar 2025 23:37:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6127B280001 for ; Wed, 19 Mar 2025 23:37:58 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 435AF8059A for ; Thu, 20 Mar 2025 03:37:59 +0000 (UTC) X-FDA: 83240520678.27.ECAEFE1 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf19.hostedemail.com (Postfix) with ESMTP id 4545E1A000A for ; Thu, 20 Mar 2025 03:37:55 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf19.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742441877; a=rsa-sha256; cv=none; b=jP0CIjwm2kU0k/BV7/8J8OZ8qSqmDFurda2hbXbhmjX7/cDeysMlVR0ZLdz3wZQEUFMLmr NbSTCjtH4PKYBF1VAiM8TTR922Ps+Xw4PqOMDu68Kod1iEW5O0UUS+2/6rV/HrWST5aiRU Fkq2ZoXqey5Sh1lPkD9PBMT53ekRByE= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf19.hostedemail.com: domain of tujinjiang@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=tujinjiang@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742441877; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xmuvSE0Ce0D0hpX70ZsH8BvKklve7k+9Eb8Wj8q2FEY=; b=Aw5AxtT/JRxz6Mp0uBwuRn9uplRLD//HXGtvs+GYK+1EoosZh8Xoz2TBRGamQllaoCeNvG GK3toIaOY+fnAB/WdHMsFkqUsDdLWbIcyHDGRvFMFY9rgT5kpVCIHHHPs2XxR+JCoIh4DS m9HeQ7cDh1lxsE5KdqDnh35eQZuxYSQ= Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4ZJB5M5DVFzCs8k; Thu, 20 Mar 2025 11:34:15 +0800 (CST) Received: from kwepemo200002.china.huawei.com (unknown [7.202.195.209]) by mail.maildlp.com (Postfix) with ESMTPS id E978B140123; Thu, 20 Mar 2025 11:37:51 +0800 (CST) Received: from [10.174.179.13] (10.174.179.13) by kwepemo200002.china.huawei.com (7.202.195.209) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Thu, 20 Mar 2025 11:37:51 +0800 Message-ID: <84c9a11c-9ae6-dd27-9fa9-6ab580c649a5@huawei.com> Date: Thu, 20 Mar 2025 11:37:50 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: [PATCH 2/2] mm/vmscan: don't try to reclaim hwpoison folio To: Miaohe Lin CC: , , , , , References: <20250318083939.987651-1-tujinjiang@huawei.com> <20250318083939.987651-3-tujinjiang@huawei.com> <8a5ecc18-55f2-4c58-e52e-7fda91e27088@huawei.com> From: Jinjiang Tu In-Reply-To: <8a5ecc18-55f2-4c58-e52e-7fda91e27088@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.13] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemo200002.china.huawei.com (7.202.195.209) X-Stat-Signature: yu7sjckdiknjd4d8qx9jtauz6oknwihc X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 4545E1A000A X-Rspam-User: X-HE-Tag: 1742441875-428763 X-HE-Meta: U2FsdGVkX18wQruXzsmbxcfoY30D7AWTf0Jiqp22ZtIA9Zm3WEG9qdADBSGAgG6d/i776agEi1l2lotyYnOTLYzTOm/wh0r9f9PVPLzWCyDKTLMb08pwQY1UU+/jM5M7DSqc2MCpsYPk6FZKOmBj/mINSGZzFE4K40q8en7Lnx3qPapw0aCw98eG+VpYsYui2tNfgvfBtg2p4ekVfsqO32XafrNLd0tUuEoRUwte+pFTHyQWBZ4Z0VptNKJgjdgQG017NgXai+HVs5Hzdgqep2B6Z31Bn4XndLQTGOIM6LD/Ng0Ii+sx0Cmnf56mM0CloHTZuepWDq+lChYRU0/BxFkoS3ukL9weLlOZZfqwAN7MsnSlgv2FXblu9DJhFrxzzKv0VXY13llpme5ZoaO9fs7Y+8M8Z3FZgKtoxT/0pczkqdGZFafal4x8pfbSEaG+xMAi20DTFaby5BIXy0VL0PXPLeATKo1YCXhF7HkvsYt/euxfVpWGrp7H5gvzyjtwak21P+ZvXEP+rZME8+0R/JWHVotaZQq+PYhs6SB6vH7uGREsmnQh8nD+1HrsAL0geblM+8JIuFMKvQntJETkcnPxwY8zloDx8pYLYQaSJ97cERnIny7wuA8lPfSExSBvqfhe6YqU/IF8FRlPKBQ+rDk/vYavHX1bD/rFRFWy/7PsTg809UWeL4JNsm0Lbzj3o1cdBOTwOt9PLNT61oIdiSTHcHT5XuZanxqdmuCLwFwhR96BWLH98Pn/1+vYOkYpL3qKoCqwrRt0EgCN92RwIS//2fNwHAxggpx2M3MQq1JklRVaLt4xC8jiNV/p2tg7ySAyPNsfMRK7zO9Z2Y34SP2HPb/dWPrzI3OzDwPcGChw9ONAlIXhNIX4paFBT4TL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/3/20 10:50, Miaohe Lin 写道: > On 2025/3/18 16:39, Jinjiang Tu wrote: >> Syzkaller reports a bug as follows: > Thanks for your fix. > >> Injecting memory failure for pfn 0x18b00e at process virtual address 0x20ffd000 >> Memory failure: 0x18b00e: dirty swapcache page still referenced by 2 users >> Memory failure: 0x18b00e: recovery action for dirty swapcache page: Failed >> page: refcount:2 mapcount:0 mapping:0000000000000000 index:0x20ffd pfn:0x18b00e >> memcg:ffff0000dd6d9000 >> anon flags: 0x5ffffe00482011(locked|dirty|arch_1|swapbacked|hwpoison|node=0|zone=2|lastcpupid=0xfffff) >> raw: 005ffffe00482011 dead000000000100 dead000000000122 ffff0000e232a7c9 >> raw: 0000000000020ffd 0000000000000000 00000002ffffffff ffff0000dd6d9000 >> page dumped because: VM_BUG_ON_FOLIO(!folio_test_uptodate(folio)) >> ------------[ cut here ]------------ >> kernel BUG at mm/swap_state.c:184! >> Internal error: Oops - BUG: 00000000f2000800 [#1] SMP >> Modules linked in: >> CPU: 0 PID: 60 Comm: kswapd0 Not tainted 6.6.0-gcb097e7de84e #3 >> Hardware name: linux,dummy-virt (DT) >> pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> pc : add_to_swap+0xbc/0x158 >> lr : add_to_swap+0xbc/0x158 >> sp : ffff800087f37340 >> x29: ffff800087f37340 x28: fffffc00052c0380 x27: ffff800087f37780 >> x26: ffff800087f37490 x25: ffff800087f37c78 x24: ffff800087f377a0 >> x23: ffff800087f37c50 x22: 0000000000000000 x21: fffffc00052c03b4 >> x20: 0000000000000000 x19: fffffc00052c0380 x18: 0000000000000000 >> x17: 296f696c6f662865 x16: 7461646f7470755f x15: 747365745f6f696c >> x14: 6f6621284f494c4f x13: 0000000000000001 x12: ffff600036d8b97b >> x11: 1fffe00036d8b97a x10: ffff600036d8b97a x9 : dfff800000000000 >> x8 : 00009fffc9274686 x7 : ffff0001b6c5cbd3 x6 : 0000000000000001 >> x5 : ffff0000c25896c0 x4 : 0000000000000000 x3 : 0000000000000000 >> x2 : 0000000000000000 x1 : ffff0000c25896c0 x0 : 0000000000000000 >> Call trace: >> add_to_swap+0xbc/0x158 >> shrink_folio_list+0x12ac/0x2648 >> shrink_inactive_list+0x318/0x948 >> shrink_lruvec+0x450/0x720 >> shrink_node_memcgs+0x280/0x4a8 >> shrink_node+0x128/0x978 >> balance_pgdat+0x4f0/0xb20 >> kswapd+0x228/0x438 >> kthread+0x214/0x230 >> ret_from_fork+0x10/0x20 >> > There are too many races in memory_failure to handle... > >> I can reproduce this issue with the following steps: >> 1) When a dirty swapcache page is isolated by reclaim process and the page >> isn't locked, inject memory failure for the page. me_swapcache_dirty() >> clears uptodate flag and tries to delete from lru, but fails. Reclaim >> process will put the hwpoisoned page back to lru. > The hwpoisoned page is put back to lru list due to memory_failure holding the extra page refcnt? Yes > >> 2) The process that maps the hwpoisoned page exits, the page is deleted >> the page will never be freed and will be in the lru forever. > Again, memory_failure holds the extra page refcnt so... > >> 3) If we trigger a reclaim again and tries to reclaim the page, >> add_to_swap() will trigger VM_BUG_ON_FOLIO due to the uptodate flag is >> cleared. >> >> To fix it, skip the hwpoisoned page in shrink_folio_list(). Besides, the >> hwpoison folio may not be unmapped by hwpoison_user_mappings() yet, unmap >> it in shrink_folio_list(), otherwise the folio will fail to be unmaped >> by hwpoison_user_mappings() since the folio isn't in lru list. >> >> Signed-off-by: Jinjiang Tu > Acked-by: Miaohe Lin Thanks for your review. > > Thanks. > .