From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E533C77B7C for ; Fri, 12 May 2023 07:20:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB67C6B0071; Fri, 12 May 2023 03:20:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E650A6B0074; Fri, 12 May 2023 03:20:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D2D5D6B0075; Fri, 12 May 2023 03:20:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C3C4B6B0071 for ; Fri, 12 May 2023 03:20:53 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 870B280E2D for ; Fri, 12 May 2023 07:20:53 +0000 (UTC) X-FDA: 80780755986.01.99C296E Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf26.hostedemail.com (Postfix) with ESMTP id 4218A14000F for ; Fri, 12 May 2023 07:20:50 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=XQlFmqWu; spf=pass (imf26.hostedemail.com: domain of junxiao.chang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=junxiao.chang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1683876051; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=tFunb/k6Xco/fMw3XIIzFPi0UrKPu4LtUlpF/FQbhXQ=; b=wcQuf+VCR+7+yD6fGs42G6PbVUnoGmBnwsOdnVaZqYH3zIyIZKImq6GzMPESOe/RfgfP7f 7Y5ysfOjGq323IFLxOEnsbJg2Ss1FRCmfj9gL+Vz+hnE6R6lmyQ6M/gIB2YCV48uFlUeaM Q3+ojGHrte1vVg90cd3IwplUdJArFZk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1683876051; a=rsa-sha256; cv=none; b=Q2xMvVvc1dEjYZ1DpKI05T2R4xRMralFTuabMJ9+/lyjz1V7gnU5ggi54fwhLAuprphDjJ nTTheD3dj22Iqr/8DxoYSvc5F8jPGCFMOOA1pA3XGj2KNTz87d3XvI0i6I4vmTXkz2gYcP 8+BRRY9cL5ZAK+8OImD+1Ui3ik+iCQg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=XQlFmqWu; spf=pass (imf26.hostedemail.com: domain of junxiao.chang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=junxiao.chang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683876050; x=1715412050; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=vbfbV3YIRAiL+Me0xGu53szMmW7EztoemLnA/EVynh4=; b=XQlFmqWuspAydhtDGFgcLxYuGz4tn6Jv4sbpnCgt3/u+hYELmfy48/Dv l0XoD2dXhAW1BmA+93YVYAxieRnCDbTGpJtY7UEvSohOvWC9vnYXfAU03 MifSvEXeTavrBrhBcYKEFE7a6N5S1RfBA6ij9pJ3mBrtV9ctPzM05HALy 5nnYbFIM/U8DwVxCWABz0nMRnJJT+gmzZFZGNQCcAd6K6VNkhLD5ztvUT HVv5VtTHYwxPqPNjVkvs4R3paEh+3DwhE/yguXxEqxIVRIwaXQ9e/JqeB qI/mucNIzoteCfllozPjxwijHR1Ow9WnjGPCDrpanhF6YJUqsH1osZsdx Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="349566266" X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; d="scan'208";a="349566266" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2023 00:20:48 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="765073979" X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; d="scan'208";a="765073979" Received: from junxiaochang.bj.intel.com ([10.238.154.225]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 May 2023 00:20:46 -0700 From: Junxiao Chang To: akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, mhocko@suse.com, jmarchan@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, mike.kravetz@oracle.com, muchun.song@linux.dev Cc: junxiao.chang@intel.com Subject: [PATCH] mm: fix hugetlb page unmap count balance issue Date: Fri, 12 May 2023 15:20:36 +0800 Message-Id: <20230512072036.1027784-1-junxiao.chang@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: ie1d6ionjqrme6cojzqkhb1xodr85xig X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 4218A14000F X-HE-Tag: 1683876050-904391 X-HE-Meta: U2FsdGVkX1/8T8T43WDU1b2HQgN1IcK7cYEZwW0L5RDiNIdHjgVv21ncNlW8RU+OOm4qy4Umm0H6KK1vL/TKjAbXOuDqRiBuhC/TkAe+jvIkSizpuQLPqA3yL1SdGAb6WfCckGrETa58blyEO5DuVM9rVWFcubp1GlgwL10aaZmI+1QGtZ0/F0Z+MGw0lJsCxn+AvAwlLDpfYh4FIFGKTqFuC1Lp51yZBqAlvCqPSeaNI0hfywU4aazO+byoNkLUYxKtDtzmLwoKYHVy311pkCpJc/x75lX5f76UkFsT0FUwEqTczRA3mCdQ9PAI+ImQ9RvrCsKd1rCH+chjYbq7Z8HpjNB8gtCxBM5qxd92T/vNwv/Suz3616gzVX6Nww2rckTu13TXqLx0/wX0MPsxBWYk3YRaFPEL1B5vwN258KQ/CIDcQ14enm6dFtLFvQzDHIu574wNqpNqNY93rGgTlmf+tSWSrIz09SutoyHdzBgnTTQTqZROq/4+corVp/wrBl/3kfSDruTvwHaRt1wvrOD2t1fUHd6tRgARB6aOh7H6D3oUCfCEPSzoQMcmhBepJdwppZ/IcHHe/vb4dNWq/4t51NwbAHObu1FWU4PxkJO90kQVVC1ECdRDKj51m3utm5i/TsiXOu6nfadlnCnCcnVx1fTXlGbyB3OBGDLcBnae8bL/pW4n4R8c4dC7E/ki7Sn4f1pwp2Tb+gVVb+oyraDhZUfAhq5pULpNTuAQo+dyQQYX0piMvB5GYh3Ax/6ZmMyiOKP0h71K+0KmSRN3AYB/ZwftXjfb/n7Vu2L8MSYDkmBN1vc9xlGpjgqUcABZ1e2Q1wVevfLUb93/+LSkdPiqGVaHGutjoqyRHPATT9CwdHeSPnt0KdAVm+jgZ8cVCDyGGCGts2mRvm0/dmedGOSwFmjwlY9t5jM5J7ppP5bSU4l24GqiKPlHAz493xMc4ch875QX64VCY2cDmh+ ptspLsyk /Bcz+a/XaOhfEJ5rz/DYfCZ5+OK0E8O+gkQCoqTgPiVXko+2wWyjh4CSRpzsQld8ZZOWeWChF7qX5of6S71jkoXY5wxs9Ojp/2H8dcl+Av/EgugttwAiZJfIqoGBavAO9AwmojLVUly0WDccG7kL8UslJIA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hugetlb page usually is mapped with pmd, but occasionally it might be mapped with pte. QEMU can use udma-buf to create host dmabufs for guest framebuffers. When QEMU is launched with parameter "hugetlb=on", udmabuffer driver maps hugetlb page with pte in page fault handler. Call chain looks like: page_add_file_rmap do_set_pte finish_fault __do_fault -> udmabuf_vm_fault, it maps hugetlb page here. do_read_fault In function page_add_file_rmap, compound is false since it is pte mapping. When qemu exits and page is unmapped in function page_remove_rmap, the hugetlb page should not be handled in pmd way. This change is to check compound parameter as well as hugetlb flag. It fixes below kernel bug which is reproduced with 6.3 kernel: [ 114.027754] BUG: Bad page cache in process qemu-system-x86 pfn:37aa00 [ 114.034288] page:000000000dd2153b refcount:514 mapcount:-4 mapping:000000004b01ca30 index:0x13800 pfn:0x37aa00 [ 114.044277] head:000000000dd2153b order:9 entire_mapcount:-4 nr_pages_mapped:4 pincount:512 [ 114.052623] aops:hugetlbfs_aops ino:6f93 [ 114.056552] flags: 0x17ffffc0010001(locked|head|node=0|zone=2|lastcpupid=0x1fffff) [ 114.064115] raw: 0017ffffc0010001 fffff7338deb0008 fffff7338dea0008 ffff98dc855ea870 [ 114.071847] raw: 000000000000009c 0000000000000002 00000202ffffffff 0000000000000000 [ 114.079572] page dumped because: still mapped when deleted [ 114.085048] CPU: 0 PID: 3122 Comm: qemu-system-x86 Tainted: G BU W E 6.3.0-v3+ #62 [ 114.093566] Hardware name: Intel Corporation Alder Lake Client Platform DDR5 SODIMM SBS RVP, BIOS ADLPFWI1.R00.3084.D89.2303211034 03/21/2023 [ 114.106839] Call Trace: [ 114.109291] [ 114.111405] dump_stack_lvl+0x4c/0x70 [ 114.115073] dump_stack+0x14/0x20 [ 114.118395] filemap_unaccount_folio+0x159/0x220 [ 114.123021] filemap_remove_folio+0x54/0x110 [ 114.127295] remove_inode_hugepages+0x111/0x5b0 [ 114.131834] hugetlbfs_evict_inode+0x23/0x50 [ 114.136111] evict+0xcd/0x1e0 [ 114.139083] iput.part.0+0x183/0x1e0 [ 114.142663] iput+0x20/0x30 [ 114.145466] dentry_unlink_inode+0xcc/0x130 [ 114.149655] __dentry_kill+0xec/0x1a0 [ 114.153325] dput+0x1ca/0x3c0 [ 114.156293] __fput+0xf4/0x280 [ 114.159357] ____fput+0x12/0x20 [ 114.162502] task_work_run+0x62/0xa0 [ 114.166088] do_exit+0x352/0xae0 [ 114.169321] do_group_exit+0x39/0x90 [ 114.172892] get_signal+0xa09/0xa30 [ 114.176391] arch_do_signal_or_restart+0x33/0x280 [ 114.181098] exit_to_user_mode_prepare+0x11f/0x190 [ 114.185893] syscall_exit_to_user_mode+0x2a/0x50 [ 114.190509] do_syscall_64+0x4c/0x90 [ 114.194095] entry_SYSCALL_64_after_hwframe+0x72/0xdc Fixes: 53f9263baba6 ("mm: rework mapcount accounting to enable 4k mapping of THPs") Signed-off-by: Junxiao Chang --- mm/rmap.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 19392e090bec6..b42fc0389c243 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1377,9 +1377,9 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, VM_BUG_ON_PAGE(compound && !PageHead(page), page); - /* Hugetlb pages are not counted in NR_*MAPPED */ - if (unlikely(folio_test_hugetlb(folio))) { - /* hugetlb pages are always mapped with pmds */ + /* Hugetlb pages usually are not counted in NR_*MAPPED */ + if (unlikely(folio_test_hugetlb(folio) && compound)) { + /* hugetlb pages are mapped with pmds */ atomic_dec(&folio->_entire_mapcount); return; } -- 2.34.1