From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9820CE7717F for ; Tue, 17 Dec 2024 02:02:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DAF326B0088; Mon, 16 Dec 2024 21:02:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D5F706B0089; Mon, 16 Dec 2024 21:02:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C270E6B008A; Mon, 16 Dec 2024 21:02:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A47F56B0088 for ; Mon, 16 Dec 2024 21:02:33 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 52A951202B4 for ; Tue, 17 Dec 2024 02:02:33 +0000 (UTC) X-FDA: 82902801492.14.2CDCB61 Received: from szxga04-in.huawei.com (szxga04-in.huawei.com [45.249.212.190]) by imf12.hostedemail.com (Postfix) with ESMTP id 955924001D for ; Tue, 17 Dec 2024 02:02:17 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=liushixin2@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734400938; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LOiziythovLwn/K0AyHaeNTAWJpifqMqsWpw6zz7V9g=; b=CnUsktQY/p1Ofo796sQVH8UJvLqrzqPUdCQuqSLRSHC9fe1NU3Vu2r2HE7PFbmsF5WgNfd b2HS2tXULz2Fywk8vYJ7cvmGysetEaMXYvrUFx3zzrYrcrOWOHWDKatk1Y3N7H8L5b99K2 kFtfKs7w8LiUJcITtlg7wb98oE7Yhs4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734400938; a=rsa-sha256; cv=none; b=gBnlWYu7A5ptxdmhYNkxHJGZo3UaGJKeJZSKxn+JWBDYeQUgd5DMzTrib6NMe+ZAm+h43/ bWjBGqyCPNh32hQu9AH8lP6GyGbkVgxxUCTIGRA93SCrYcE7VrOL3YaHBRCyLjYHQXzzyn 0rv+x7eZcjOcoRcX8j5DsHXGvmZSkJY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of liushixin2@huawei.com designates 45.249.212.190 as permitted sender) smtp.mailfrom=liushixin2@huawei.com Received: from mail.maildlp.com (unknown [172.19.163.44]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4YC0PP6fd4z2Dj6Y; Tue, 17 Dec 2024 09:59:53 +0800 (CST) Received: from kwepemg200013.china.huawei.com (unknown [7.202.181.64]) by mail.maildlp.com (Postfix) with ESMTPS id DD81A1400DD; Tue, 17 Dec 2024 10:02:26 +0800 (CST) Received: from [10.174.179.24] (10.174.179.24) by kwepemg200013.china.huawei.com (7.202.181.64) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.11; Tue, 17 Dec 2024 10:02:26 +0800 Subject: Re: [PATCH] mm: hugetlb: independent PMD page table shared count To: David Hildenbrand , Andrew Morton , Muchun Song , Kenneth W Chen , Kefeng Wang , Nanyong Sun References: <20241214104401.1052550-1-liushixin2@huawei.com> <8e59d2bd-77d3-41bc-83b7-532b018db4e2@redhat.com> CC: , From: Liu Shixin Message-ID: <00edd087-8df6-343a-95bf-ca23381085a8@huawei.com> Date: Tue, 17 Dec 2024 10:02:25 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <8e59d2bd-77d3-41bc-83b7-532b018db4e2@redhat.com> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.179.24] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemg200013.china.huawei.com (7.202.181.64) X-Stat-Signature: 3o8prndyuosm4qbrt6nz1brycrgsk61c X-Rspamd-Queue-Id: 955924001D X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1734400937-338642 X-HE-Meta: U2FsdGVkX1/XO2Kymvvl9sBjS4qiys/zSq6FPMRusGD75hb8AtkipB7NasKtX+HWTMlMQX8nRNoa4LxRXaHgNQEmuXrLB6LUXLCYXgEVp4lyRbVErLbZBc18YobLkTDB3J9OHX8FsC6ZcQnfOy+zHL5QUYEg4SQE99EH+tVF49JP9eIVY+IfpwQMIOFEuqgUb7YzwQPE+QIjx73IqXWm5fxQHhlGy6uwmXFOl2XYs4Hguy1gMmWZ2QfjUhs+sP5nyRWi0mfCZAuw0823GhH5WFNMAAExAfyce1fGlbArgM8xLOFUqUhQGP8ZQBRy2NJoPRvvGTrcMBIHP0E11FoN/ASvAlZUZ5cO7e/Ybvat2NG7xkapBkPiuflYexy4/2NpO2EUPICKktJvIkoY0AuNUaAP53MeqVWkwBuMP5s1k9/dpJMa/gw+ea2L2wjYRMaJ+EvhFQV/V1smYcSeOYL6V5zx1Sq3BCRypEoUXnKSocq/YkudjD4trPcKbsF4N8g597ufmKfQ3aRY3s3k2MBD6ChBahv/wc8C1e4ckqhbnKBLCnAeIJT1nZH/JHjCXWiHEsrev6We2lT7RenIpMBKuXOn00u/NXpZkDbQlJBUVfNG5/nEdYFCwr4gGmaBpvTySO2koitAeJPWrf9d6uaO/GzxyhKaIOMPtU1rlEDoERWm3huWWMq+X7+vhJVfJE4zpvMaa7xb7E1uihk0yXW2L98ERc+99bgUWPwsDvMaPuYzA+RFvXGCI1m62L+RzOVXOWNsYlnCuWBrnPZOSTqVteVbCDDURWjNp0QoMl8rZXcY24Sgq4ptj3JkXVQ3gB1soWTn/nEdj8E0tj6ZPFwhJrxozj6fQac/rS0avE7UJLrMs6atlRbIMoTrnrVAxYfrcLiukvBAAHxPnltYlu3TTwuc3O8Mwtth1VHWeJY9mkTAaACA6wgLLtfOrKiLHhv+BPSUCoHyq0E0eJxNB1A N1v6qGxJ wI//aprm7GVQVyhaGXg0jhxPF5VaeZMDQlO08yDwyYw+K+4ae56yoc347GeGvyl2F40AhYc2LlyldjZT+ZKo7TMcuCoplVTCtSjVXRRAie1d8mN0QTXQFWNX1oVCo4WY70IrP X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/16 23:34, David Hildenbrand wrote: > On 14.12.24 11:44, Liu Shixin wrote: >> The folio refcount may be increased unexpectly through try_get_folio() by >> caller such as split_huge_pages. In huge_pmd_unshare(), we use refcount to >> check whether a pmd page table is shared. The check is incorrect if the >> refcount is increased by the above caller, and this can cause the page >> table leaked: > > Are you sure it is "leaked" ? > > I assume what happens is that we end up freeing a page table without calling its constructor. That's why page freeing code complains about "nonzero mapcount" (overlayed by something else). 1. The page table itself will be discarded after reporting the "nonzero mapcount". 2. The HugeTLB page mapped by the page table miss freeing since we treat the page table as shared and a shared page table will not be to unmap. > > > > BUG: Bad page state in process sh pfn:109324 >> page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324 >> flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff) >> page_type: f2(table) >> raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000 >> raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000 >> page dumped because: nonzero mapcount >> ... >> CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G B 6.13.0-rc2master+ #7 >> Tainted: [B]=BAD_PAGE >> Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 >> Call trace: >> show_stack+0x20/0x38 (C) >> dump_stack_lvl+0x80/0xf8 >> dump_stack+0x18/0x28 >> bad_page+0x8c/0x130 >> free_page_is_bad_report+0xa4/0xb0 >> free_unref_page+0x3cc/0x620 >> __folio_put+0xf4/0x158 >> split_huge_pages_all+0x1e0/0x3e8 >> split_huge_pages_write+0x25c/0x2d8 >> full_proxy_write+0x64/0xd8 >> vfs_write+0xcc/0x280 >> ksys_write+0x70/0x110 >> __arm64_sys_write+0x24/0x38 >> invoke_syscall+0x50/0x120 >> el0_svc_common.constprop.0+0xc8/0xf0 >> do_el0_svc+0x24/0x38 >> el0_svc+0x34/0x128 >> el0t_64_sync_handler+0xc8/0xd0 >> el0t_64_sync+0x190/0x198 >> >> The issue may be triggered by damon, offline_page, page_idle etc. which >> will increase the refcount of page table. > > Right, many do have a racy folio_test_lru() check in there, that prevents "most of the harm", but not all of them. Yes, this makes the problems nearly impossible to happen for some function, but not really safe. thanks, > >