From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0256C07EBD for ; Wed, 15 Nov 2023 01:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 847FE80014; Tue, 14 Nov 2023 20:48:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F7AA80010; Tue, 14 Nov 2023 20:48:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6984880014; Tue, 14 Nov 2023 20:48:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5740E80010 for ; Tue, 14 Nov 2023 20:48:13 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2B0F01405EE for ; Wed, 15 Nov 2023 01:48:13 +0000 (UTC) X-FDA: 81458503266.26.F242929 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by imf15.hostedemail.com (Postfix) with ESMTP id 9830BA000C for ; Wed, 15 Nov 2023 01:48:10 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=noz4nuwt; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf15.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1700012891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QJO7FFbq36uWvBEqi+2Re3K6Bymclc2Nq7BHO863K5A=; b=OlQkQx5Nh6Dz5FXoRPZF16X8uIQ8mWRGaRx9Uf35ILo0UhllU4gYv5MB6loBkd9Us8hre3 qRKRoNDOijQYdk37Js40NcN8ZjhTV8hIsM/yqW3Z5VghF3Bklh1RTKBHnvkH/a3nmxuTQZ OahgXURuQVDYSBGwgVj7yMZoPlI0mGk= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=noz4nuwt; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf15.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1700012891; a=rsa-sha256; cv=none; b=T0/INdKM8CNy3b5znubFIMXQ9DxeiMTjmKISU5qjiepgjPBj+xxJRkCZZh+7PKI+CjTtlB 2e2dtRD6Gxwb0mEtx/Mjyps0anv+wRHNn+cXLM/eqKPS2FKBRu+aca5P40gucL77vBMI9w 0fNvPFW6oLaa5kd88XhAEC/bPue7fvk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700012890; x=1731548890; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=3b1ta+crUkN2gJb/DV0hyx/sd1rL21WhPmKOJHaL9Q8=; b=noz4nuwtyeK8L7q8WP777A+TFqRUH/SrhYK4CIIaRcz8XoeeuLpO/rNQ 6lM0BNUGIwYhTYGqiuZFwiuTj+ASpPqfWBBUAg54GF/8k6OqP5FMKzFzn uOXlG3xxuf3qL242CJhOYphYspyTxbq/9e1lvzRE+sBRV5NDZX7PfuQr5 ZpC1/4K9qyhKhmwovFGqkwWIPSI9ANekR/f4XWH6LFgiFb2PhFT3OAc5m 6BrIvGwzPVdNSlTaR+XEBr/LsF6aggacLKm3sKsE8imTdZOnO5V9RreSE FX6bOIbDHkp0ea1131tRCHfdWUodHqmHbnTcrBSXXcgpT2iQHLR/AnYOH w==; X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="394706181" X-IronPort-AV: E=Sophos;i="6.03,303,1694761200"; d="scan'208";a="394706181" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 17:48:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10894"; a="764842001" X-IronPort-AV: E=Sophos;i="6.03,303,1694761200"; d="scan'208";a="764842001" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Nov 2023 17:48:04 -0800 From: "Huang, Ying" To: "Yin, Fengwei" Cc: Matthew Wilcox , "zhangpeng (AS)" , , , , , , , , , , , , , Nanyong Sun , Kefeng Wang Subject: Re: [Question]: major faults are still triggered after mlockall when numa balancing In-Reply-To: (Fengwei Yin's message of "Tue, 14 Nov 2023 19:23:25 +0800") References: <9e62fd9a-bee0-52bf-50a7-498fa17434ee@huawei.com> <874jhugom8.fsf@yhuang6-desk2.ccr.corp.intel.com> <2c95d0d0-a708-436f-a9d9-4b3d90eafb16@intel.com> <87r0kufm15.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 15 Nov 2023 09:46:03 +0800 Message-ID: <87wmujeqlg.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 9830BA000C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: sq1q87uqzdox6cu3f6rr49kj1p5egdjc X-HE-Tag: 1700012890-88983 X-HE-Meta: U2FsdGVkX1/2Qlol58JRuclH3dLynGGOOp03WQvFdq6dA5iS3bU1QTr75TGi1nubUxGtQehNoTqw4EQQgMU7E3L2KQZBH/jBSRN0qTZ3siqHpXZWfyL2TuvMfQOebRl2VXAOXyC/LHh4RNVdqWDg6IRalqaZ+FF9A0SLeuaBzI6YdSah8YuLQciB6Ubv7DHpHPknRyB/9FFCsbMULI89FAeD+AYk3n5tj+g5wy5ClqgaN7sFd1buJYyN2ElaKv/fUqUdEzkq92i4cFl2YGoWrtlC3HeWy3y19v6VGiyhykVijKel0AqOvHNe6l5VzmrsscJmWpHcGRJhSFozf+0DAOIlEefhDGV/Hj5aGCQ8BD3b3t6a8TU6ew4c3N4BLNzcnb7fd+gm4NS92MzH5u2d51OfkLi7B+TOh2c2jtD8RDSBRv4CMCWtUb3EoAMRIPL7l6V3/ZUnHyAnf8/3czFKL3MKWNN1THN8fkbC90eP7Wk7k8tKP+5+5LdSE/4933uO7STItR9XVMQaLt4jYoSicCiqvntFvdqzIOlxsgat9EFJeoqqH3+rSD7uxRrYSd70PPRXZiQWSA/Ue1RmTnIgwsfY1E+CoO0Hf77mIBJ92HoHFw5teaLBnX4t0q2Cfgxyl58eQVVeWiK10MnFBc8FH19pbuRzYELu+NxZwNGrT2VU6yhPrVZuF8wTAnxeQS7LDA0y8LKzgHb9EpQLePuQfRcBUZXHnaVwd6oulcOA27j7ym+QB99rNrlVIa9nysdzUPVEcgmmIDrFSKuAVAUAmHkzzbPs/IcHZp991kmPb07CFHT5FdAuIO0zJ4tuzz8LVY89jMpTM8RhtXCaeGANFee1DZ7HAE3TsN3VwMsY/eQ34NJcgArqbCmqd+AsNmAB7GfuGE0ojM+6ODIcgUDPeb+JCJuSQc/HU3Ks1BN4EDX98ymfdoaLe9hYHeez1xe4v5QugPeJTF5HlbexIeq DeokEV0E 4u854uLgqqlEFv4tvYJ7ta8O7xQxNBVUPr7hmp+PHraPrfK51s1d3ati/wufLoz+k+Bgq45N3ymJAohOFgTYJALEWdgSHUm/p3e4H7eVR2naEpOF6C/CNNBz39wzLrbTquDu8cjezKTT+imX+lzKvUct6+MSOy64ET+7yp8IqM360KVk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: "Yin, Fengwei" writes: > On 11/13/2023 10:02 AM, Huang, Ying wrote: >>>> There are other places in the kernel where the PTE is cleared, for >>>> example, move_ptes() in mremap.c. IIUC, we need to audit all them. >>>> >>>> Another possible solution is to check PTE again with PTL held before >>>> reading in file data. This will increase the overhead of major fault >>>> path. Is it acceptable? >>> What if we check the PTE without page table lock acquired? >> The PTE is zeroed temporarily only with PTL held. So, if we acquire the >> PTL in filemap_fault() and check the PTE, the PTE which is zeroed in >> do_numa_page() will be non-zero now. So we can avoid the major fault. > Yes. > >> >> But, if we don't acquire the PTL, the PTE may still be zero. > For do_numa_page()/change_pte_range(), it does very limit thing during > PTE is cleared. Considering the code path of do_read_fault(), it's likely > the PTE is none-zero. It's possible per my understanding, although it doesn't feel good to depend on some "race" condition. > My concern to acquiring lock is that it brings extra PTL lock acquire/release > for other more common cases. Yes. It will bring some overhead to acquire the PTL. Anyway, some performance test is needed to compare the solution. -- Best Regards, Huang, Ying