From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 596BAC5AD4C for ; Thu, 23 Nov 2023 08:38:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B0CC76B02B5; Thu, 23 Nov 2023 03:38:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ABDF66B02B6; Thu, 23 Nov 2023 03:38:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 95EE76B02BE; Thu, 23 Nov 2023 03:38:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7B7FB6B02B5 for ; Thu, 23 Nov 2023 03:38:37 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4C3A8B69B6 for ; Thu, 23 Nov 2023 08:38:37 +0000 (UTC) X-FDA: 81488567874.16.7B201CE Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.93]) by imf11.hostedemail.com (Postfix) with ESMTP id 0D33940007 for ; Thu, 23 Nov 2023 08:38:34 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DZojCeWc; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700728715; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Nu/8Tj4NX0y12PsFNDLqXujDSo9eb3QQyFJNhjM3XaE=; b=esR72SaYh6NOhk/nyIH11w3cQqUPOPNQzF/KEy0RV6azPnOl36E2dUuS+UxYf+2/JdZ8Yo cKSS+AXQ8v9bYtoPwk1EE/7t+v8Lnc83y4gerBhov/E4Ho8GKNL2bM7IlN1o5C6TUvabv4 Ra9/TIAcuSA7ALGq30fwdnXtkmgfuZI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700728715; a=rsa-sha256; cv=none; b=0X2VQRHDpHK+1O/Uz5pcCLj+HtV62xzxaWN6vv682sFQluNIGwyfINxj+moSBXLeluoiPa r3BuCJgoJlR5EIh+Gyly454sWS108A7dYS5cEv2jNZFrkHpZ2ylBCHZi8lNyO3SWS7TGfo 3D2oNz1jXgxH1g4cHe+ScYsJ9GYMlf0= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=DZojCeWc; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.93 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700728715; x=1732264715; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=PfB+bD4Xld4bAhck5/xWDTE34JBnBQaCLqQ04SDVW+c=; b=DZojCeWcH33Fm74mXiUrA4l6lIaOO9GaksIqjMVv04SwzlkkshDM7V2/ Ev49U2Fs5GpSQzPJYEeu5ADQ4E4e/UBL8sQ1wonW9k61Q1rINzz2ZpWhw Qnabk4WqVfz86nVQdcpH4K7bOTovVwIWuyLgInK/Qh79AfQPcyO3iZm/c cRt32Q8SzLXAh+yK7/05LaPVKhBBGkdnRAY/tGmCAhuchAOBwRHrwb2zb kpeUx+Oq4jyNFEnlERpgqEf95mtEThHJFlDJ+g/6SwfPqRnkZEqDIb0SY vPiL8EpFLawDvLvekLysBSrxG2/lwLQ7CC+UNz7S/vmzJv+oqqQV3tLWy Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="389375001" X-IronPort-AV: E=Sophos;i="6.04,221,1695711600"; d="scan'208";a="389375001" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 00:38:23 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="833352047" X-IronPort-AV: E=Sophos;i="6.04,221,1695711600"; d="scan'208";a="833352047" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 00:38:20 -0800 From: "Huang, Ying" To: Peng Zhang Cc: , , , , , , , , , , Subject: Re: [RFC PATCH] mm: filemap: avoid unnecessary major faults in filemap_fault() In-Reply-To: <20231122140052.4092083-1-zhangpeng362@huawei.com> (Peng Zhang's message of "Wed, 22 Nov 2023 22:00:52 +0800") References: <20231122140052.4092083-1-zhangpeng362@huawei.com> Date: Thu, 23 Nov 2023 16:36:20 +0800 Message-ID: <87a5r4988r.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Stat-Signature: q5hswkk35qkwuri4rdszfquqr5ny4g1u X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 0D33940007 X-Rspam-User: X-HE-Tag: 1700728714-569826 X-HE-Meta: U2FsdGVkX18tnMppWktBCSNm/tk4ocQ4C/RnUAvboQHBAqbPYQgx0wCss47JxRmxNSjfPhRhGt2Fn/uDabyrKWktysG2DI+3D/GPgh/D/eMNq+kdSjTtKgoepNZNZZSrRU4rPfyD8wUQs+rGvQtx15z3Gi85DxrUXWO0q5lCp/4nlxpoDVelZnI8zQvYbllRgukxJz0A7oq7KyoGTVKwVEPSOvetGqi8b1ZX52pMOZ5hyZO3pIQCchG8m20mFQsqXvYBP+iEHbxWSrdjLXGMLcmDeKs87gtzQB8qpzf3A+c/hKQRymB6T/yCixjDAPHlblIRhkoOVv9p9v3TccwKbP/tRl7Ngrvwa0GZkFpb2XgKfee3X8dnWr/5ISMwcArpmpC1qZRNmw4xbKDHw0kwN7CdB0XW0dZwihP+P1qqLwftIvbQPCGZxXHYKqGnDIsREEclijmGNejATyKGh0eRkaWB1vONv9YiW3AwY9ZEGL8NoY3qQ9bz4SBYa/FpVlo9qLY/Q5aYr0inFNNMn2fN0dGsC3LvO5xy+cppB7Cd7nn51/XH2kqxQxQsKxv/khXz4d9aNKDI/y+060vEIdGJWdHa1J3yoGQICNfW2Lp34LLSmEwOizw6TGFsEjxIrD3nSqu3rWefEqN9LiAcd9qz6a/ymaoTp9gwtzzvaUV8kTPKBQt1bgjsk+hUlVGipgSW5OkJxAyl2CbRoW4nbK23gWx/vJap+7U7Q3Nm7k2gxbxlbMnJ47S7R9fQxdpHHTeZz+K1H0MncfC8co5FYCfoLw9v/Q8+4KDxYNuaOjapy5iXEpXbu9ipSCN8dCm8VMDLvz/8vuwhUPsePTNAzjavgbvkLen4rkLraRnwKnDJmXNOvD/599+RtPQJXz1FMWmSmMiBfrH6ZJsVMC0hwe+X1sTPZFVo5iP/ToH+BdOi5JZhFxW4Sd9prO3SimUphsQnbIDlc8W+yC+9Hx4yZ6b wFRl92VY 0vW6Me7QM+8nDdXoCF8u3C5+jPb5jOdwXTAZSNN5lgrD4rxV1SnxaztA1+qYQF0cQhuh1jK1DYQOnm3yBNbivztK7qOjlUIBmWzxSC/ZajQnqkEqTXdjhRVyqBkvkMagJpVwqyYvz/TYs9QPHtepsAW5eTlcBPomeb72c7uwwXninqVG0g/TS/mnPEexwP/fWCoDP1I1DVxX8p9EbW1jKWzc3tPoCoiP9SGXvzJ1G37ohnBXXeemotKlpE270Dz8g4qJWQjusZ2aFZlsPMP5XIIw7TvOijl8aply8O1iQ1SY8SnbT6Rf+jYklPkyF6ztQ93+n84ASUctKqqc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Peng Zhang writes: > From: ZhangPeng > > The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) > in application, which leading to an unexpected performance issue[1]. > > This caused by temporarily cleared pte during a read/modify/write update > of the pte, eg, do_numa_page()/change_pte_range(). > > For the data segment of the user-mode program, the global variable area > is a private mapping. After the pagecache is loaded, the private anonymous > page is generated after the COW is triggered. Mlockall can lock COW pages > (anonymous pages), but the original file pages cannot be locked and may > be reclaimed. If the global variable (private anon page) is accessed when > vmf->pte is zeroed in numa fault, a file page fault will be triggered. > > At this time, the original private file page may have been reclaimed. > If the page cache is not available at this time, a major fault will be > triggered and the file will be read, causing additional overhead. > > Fix this by rechecking the pte by holding ptl in filemap_fault() before > triggering a major fault. > > [1] https://lore.kernel.org/linux-mm/9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com/ > > Signed-off-by: ZhangPeng > Signed-off-by: Kefeng Wang Suggested-by: "Huang, Ying" :-) > --- > mm/filemap.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 71f00539ac00..bb5e6a2790dc 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3226,6 +3226,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) > mapping_locked = true; > } > } else { > + pte_t *ptep = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, > + vmf->address, &vmf->ptl); > + if (ptep) { > + /* > + * Recheck pte with ptl locked as the pte can be cleared > + * temporarily during a read/modify/write update. > + */ > + if (unlikely(!pte_none(ptep_get(ptep)))) > + ret = VM_FAULT_NOPAGE; > + pte_unmap_unlock(ptep, vmf->ptl); > + if (unlikely(ret)) > + return ret; > + } > + Need to deal with ptep == NULL. Although that is high impossible. -- Best Regards, Huang, Ying > /* No page in the page cache at all */ > count_vm_event(PGMAJFAULT); > count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);