From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B693C61DF4 for ; Fri, 24 Nov 2023 08:01:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A223F8D0064; Fri, 24 Nov 2023 03:01:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D2398D0063; Fri, 24 Nov 2023 03:01:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 89A8D8D0064; Fri, 24 Nov 2023 03:01:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 771CB8D0063 for ; Fri, 24 Nov 2023 03:01:40 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 21D97B69EF for ; Fri, 24 Nov 2023 08:01:40 +0000 (UTC) X-FDA: 81492103560.28.D229699 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf11.hostedemail.com (Postfix) with ESMTP id 3C6BD40024 for ; Fri, 24 Nov 2023 08:01:36 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="d+vg/tOf"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700812898; a=rsa-sha256; cv=none; b=jzjo0rI/EPB8EcJGBKc2ipDUE/mtSxc1CZ8pVXvhwoEdG6dPu8GYQ6vhyWDDbWg9jPR9gy eK1S8rZQYfih8qo5UXsmFyFyaRTZX8RXqmQLGyHFj0NmS5p4Ke+YdT5l8B2nsMyy3M+Uc4 xkJNhjl3NP6mX583Vx2Pn8XcR0y1IiI= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b="d+vg/tOf"; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf11.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700812898; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5hbU6if7xKd/FJQFPuvIXlTdfwShVs2McuhrrX4yvFM=; b=A89w94g3+8khgVpOqM26x/XrY9SWH6y3/NZ15nCF3mg+eN1WPzj/bGhxYgCGrSaEui1Gl3 UQnVNeEBn4PA9gI77wN6zeNaeS64FVP+8LJqdm4keKZ8zLMJ5sBltfG2AmbMaMSJyf14p5 axD7cGpoLUCJzgypsdaqLO5ngLvQivM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700812897; x=1732348897; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=MtmvYraCJNQKGzyRO3mx3t8JQ1nwkmRYUb9WLJq7st4=; b=d+vg/tOf6QGc5UdhF04B1JiRXdyJciHm12eIOPP6bexMJhFD0cEYcXGe Nb8qAY6w3X7BNEl+SlhPmMjYeA5CoFFRi1AZ0IQoDstvqbILINhB4o6oT lXoPvbmkG48GllvTNz7Mb33/8q7ZwrZCx8wLBJCOXAUzijRGdDHzNBjpp nKhib5KpPuRWNpKNSbxhCmSQLh83VjfTyiujkFOL7vkrICq/GM5J93Vtg lRVDrDiIkO39R9yTe3oCQwYa7co9BnuTTTuoapm5vDPE1RX2Iz9mAt5xx 6qJ3r2ZAXvzn+igO6svDAdV0PtXU2G/qtbjLFLzg/LURs1Caz6W2jZqfk w==; X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="478592963" X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="478592963" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2023 00:01:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="796520369" X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="796520369" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Nov 2023 00:01:06 -0800 From: "Huang, Ying" To: "zhangpeng (AS)" Cc: , , , , , , , , , , Subject: Re: [RFC PATCH] mm: filemap: avoid unnecessary major faults in filemap_fault() In-Reply-To: <513144c1-c0e0-fe63-e133-c2611a440a94@huawei.com> (zhangpeng's message of "Fri, 24 Nov 2023 15:26:38 +0800") References: <20231122140052.4092083-1-zhangpeng362@huawei.com> <87a5r4988r.fsf@yhuang6-desk2.ccr.corp.intel.com> <513144c1-c0e0-fe63-e133-c2611a440a94@huawei.com> Date: Fri, 24 Nov 2023 15:59:05 +0800 Message-ID: <87plzz7fau.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3C6BD40024 X-Stat-Signature: w898i55sc5zi4dzats434uygeswpmen3 X-HE-Tag: 1700812896-38542 X-HE-Meta: U2FsdGVkX18WI8FvomIR9nmIFyeVh5MgcK8yu95Ph6LHUEzZ4Z9Z2Tm177CyW98NFnid9nv2ISk9TRIWbFiXXDlIse5eibE0S5seboI65OH3ivPEu6JLDHYiVp5rvhXPYIPn0qclPZBdCPPn9XCQGvSWnKJN698DWRtX3vrMFIK/ujEe3Qo04wkvgHtVCm31tgaIqu9qFHYKEmj71NrmAKSaU8hXyGvz7xrvLuH48aH8SI80zo02RrSzNywvLyyOnCEekTdd19n6go7FIi9pXvirMFkCIT6aEZFlqw80ZQfn9652Jji9594uY6B10uPTTVtqZew/jniQJmTSRaMsYF2hrmWa8TskEqHqg+v1pH2xs962f70kmksF4YtYEkto26109XLgf4CIFcOP2WDB1BrDOZAxqTWSP5Cu4GvH9OQwtoilvRDZhwKewovE+NIA9WmtehXsI2WdEbgeLUNKTNMoHAJcTsK0euglZImmE/y3eRZmcnrrtwM2UPJ6ZL3dS6h9YRGVIW6JHibcVK/7USbqwLiMfm3ABASCxFOagxu+eJkts3ETNBw0nraNDSKcGTzndFjecCaz6c5JwSJomo5R+ZQc9tUrCbfVwBp4YFtRqc6WGB510yJ5BSAL7IiVRDJjMvz/b/fYO73lVEjEplA7yF6cofBYSqRXuw5egZfXrU9+hSVDW6GhlyqRgqepx3RPtCaQyCVaTIioWJADrYSj4PlBdLWgL8e7Eqg6tfEHPWtzz1U9EFjndD99MSnj5GlS0pPoswBQZlXLSzBaioT123+Z5DimqWatLniML5UDQih+IqF0C/fNkCDvd2qalgKc7fhvT2WAIrqdKdPp94lKhbwbOCBcBgh/SWeZqtK+0xf/q/QjzG8nLkIEkQ/0z/Q6Cbc1iPsr1aSBGlIz+AVIUqZGHj0jOxalI/odPMU3WAsiqc7ohXWvS0hS2Whaj/7qeDsWiUPOpt7g4aD SIjUl8xA ad5KGcAdbyTWcE+FmXV8Oy6NjSXjGbgomwI1MtBU2oMEjcKhPSyAMbGsgW0RG3dbCVVf+Fc+5veP4pHIs3Y81bMosoFg/FeKLBsuj2JEus2R/iHuT/DTzj4g2Wd60S9Ns5W3/CmFNgipDi3L6jTsG8MSmru8yCLQ+p0q1KlXUelNbg4mmSDKoJQ1lQZ91sNHRNg6eU00ZQq+mgFJ27yJ2iLu9y33l1Q3i2Q0/m3ak5O5JjLC+A/9QJ6YYxzlcO8c15zSAIrVOaYhy0ug+aYUWHZF70zjB94KEGxxCGSRSUoCa4CEkXL9DrIf3SbPe1xHkVRK5Cn8YTZszBBY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "zhangpeng (AS)" writes: > On 2023/11/23 16:36, Huang, Ying wrote: > >> Peng Zhang writes: >> >>> From: ZhangPeng >>> >>> The major fault occurred when using mlockall(MCL_CURRENT | MCL_FUTURE) >>> in application, which leading to an unexpected performance issue[1]. >>> >>> This caused by temporarily cleared pte during a read/modify/write update >>> of the pte, eg, do_numa_page()/change_pte_range(). >>> >>> For the data segment of the user-mode program, the global variable area >>> is a private mapping. After the pagecache is loaded, the private anonymous >>> page is generated after the COW is triggered. Mlockall can lock COW pages >>> (anonymous pages), but the original file pages cannot be locked and may >>> be reclaimed. If the global variable (private anon page) is accessed when >>> vmf->pte is zeroed in numa fault, a file page fault will be triggered. >>> >>> At this time, the original private file page may have been reclaimed. >>> If the page cache is not available at this time, a major fault will be >>> triggered and the file will be read, causing additional overhead. >>> >>> Fix this by rechecking the pte by holding ptl in filemap_fault() before >>> triggering a major fault. >>> >>> [1] https://lore.kernel.org/linux-mm/9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com/ >>> >>> Signed-off-by: ZhangPeng >>> Signed-off-by: Kefeng Wang >> Suggested-by: "Huang, Ying" >> >> :-) >> >>> --- >>> mm/filemap.c | 14 ++++++++++++++ >>> 1 file changed, 14 insertions(+) >>> >>> diff --git a/mm/filemap.c b/mm/filemap.c >>> index 71f00539ac00..bb5e6a2790dc 100644 >>> --- a/mm/filemap.c >>> +++ b/mm/filemap.c >>> @@ -3226,6 +3226,20 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) >>> mapping_locked = true; >>> } >>> } else { >>> + pte_t *ptep = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, >>> + vmf->address, &vmf->ptl); >>> + if (ptep) { >>> + /* >>> + * Recheck pte with ptl locked as the pte can be cleared >>> + * temporarily during a read/modify/write update. >>> + */ >>> + if (unlikely(!pte_none(ptep_get(ptep)))) >>> + ret = VM_FAULT_NOPAGE; >>> + pte_unmap_unlock(ptep, vmf->ptl); >>> + if (unlikely(ret)) >>> + return ret; >>> + } >>> + >> Need to deal with ptep == NULL. Although that is high impossible. > > Maybe we don't need to deal with ptep == NULL, because it has been > considered later in filemap_fault()? > ptep == NULL means that the ptep has been replaced with a PMD entry. > In this case, major fault is also required. I still think that we need to deal with that. That is common error processing logic. -- Best Regards, Huang, Ying >> >>> /* No page in the page cache at all */ >>> count_vm_event(PGMAJFAULT); >>> count_memcg_event_mm(vmf->vma->vm_mm, PGMAJFAULT);