From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6494AEB64D9 for ; Tue, 11 Jul 2023 01:58:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6E218D0002; Mon, 10 Jul 2023 21:58:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D1ED98D0001; Mon, 10 Jul 2023 21:58:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C0E288D0002; Mon, 10 Jul 2023 21:58:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id B2CAE8D0001 for ; Mon, 10 Jul 2023 21:58:41 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 6F08B120284 for ; Tue, 11 Jul 2023 01:58:41 +0000 (UTC) X-FDA: 80997672042.27.69BD4E0 Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf20.hostedemail.com (Postfix) with ESMTP id D885B1C0021 for ; Tue, 11 Jul 2023 01:58:38 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFoaV74x; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689040719; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nERzr8MS1BAeXlPcz/t3Yk27iw4Tkm5s0cC9m1wlnts=; b=7tltt9YmGOwDjyGQ+5hrHdhvrxauRkKVkMALe2/yYgEZ1Sgv1qj4rcAtTeCpzN4DqurQfw 5SpHYltBQl95AIhtn0kZDCVbeE4q0OhD1mFzwe27FO+FJKPpHmJczcQqjB0vowI6iRgsoI B3xjiKMjrCT9QI1r0IfOiQi7+JNH80k= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=BFoaV74x; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf20.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689040719; a=rsa-sha256; cv=none; b=53SFv+3uXA60RpVgvkzky9tByzn9rZAb64MAFzGy5pykNsNMom7iNysyieZ3yXo4RgMNAx 2VPQ8YZMcTQjkDsbqR37nhA5bZupPdN4TxO/7Goz9xLV/V9bOvQdqmF58W5rO9ow6avjTk r5CYbmT8cTNF71rMgh0BCKexFJQN3BM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1689040719; x=1720576719; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=mUJWbkk0R2FjZZFMtx9TSK7WG+4KdrbXbU/pwTbFtaE=; b=BFoaV74x5UQYSnZWGueJd/3nB6Uoh3XH7+neSp1ygCdVAj7NuXPexEkt 9DPlm6rYUFVnANoweUYiU/edr+3LCRbUBDXxQh22NNd5mj+8h7SMeqJan oRLn0TfhlmmSBz85FL9k81lBXqRgzwddXupzJMfwcUIzL/r8wAIiSjrc7 83JTmQ1WMiQbrNa9pfDb3EGlGDBlvmTGOLAHjH3tW/t0RDUrG7pRggTWA 8++qBNWt0kPGa1Tmc+lqZdbO3O48Eup0ck76M92c4IQcGj/NpPTSkcRjB PhJ2BdHKCQ6plnNjoYi/PT21EdCBl+gZqpB6stb48bnYlBEwZC0Mz1UXn w==; X-IronPort-AV: E=McAfee;i="6600,9927,10767"; a="349306989" X-IronPort-AV: E=Sophos;i="6.01,195,1684825200"; d="scan'208";a="349306989" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 18:58:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10767"; a="756178434" X-IronPort-AV: E=Sophos;i="6.01,195,1684825200"; d="scan'208";a="756178434" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2023 18:58:33 -0700 From: "Huang, Ying" To: Ryan Roberts Cc: Andrew Morton , Matthew Wilcox , "Kirill A. Shutemov" , Yin Fengwei , "David Hildenbrand" , Yu Zhao , "Catalin Marinas" , Will Deacon , "Anshuman Khandual" , Yang Shi , , , Subject: Re: [PATCH v2 2/5] mm: Allow deferred splitting of arbitrary large anon folios References: <20230703135330.1865927-1-ryan.roberts@arm.com> <20230703135330.1865927-3-ryan.roberts@arm.com> <877crcgmj1.fsf@yhuang6-desk2.ccr.corp.intel.com> <6379dd13-551e-3c73-422a-56ce40b27deb@arm.com> <87ttucfht7.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rbof8cs.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 11 Jul 2023 09:56:56 +0800 In-Reply-To: (Ryan Roberts's message of "Mon, 10 Jul 2023 10:39:58 +0100") Message-ID: <87r0pfdxcn.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: D885B1C0021 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: nhq6y4u5c648he7c4eiw5nitdfeg36ji X-HE-Tag: 1689040718-126401 X-HE-Meta: U2FsdGVkX187lpKLwl2gYnv0Qyu4y7l2kGLwNaHiuwGzvTbiQUVhthi+I3xCvbPoI3WpJPUh+1YfkPqpUbKGCVCyGq+vCxCRmf2LaSc8DWhYnmWQdDmPncl43dh6hzXkMa4yZ/QgufLOlOSDgcFjOS5wcAQXJC/4ElN1UizOlEn9mUon0l8NB5v12RMbnLE8L38zYOhP8GkzCkm+iyFWjM4znWU8WNPZRs8XxjijJkvTBPM7fg/XZY3/3kHelh4rinKf6JCGR88EpHU0y0JYK4Lpi6F1kW6l8xJaYowiScCrPE6UUU6H53VvC5LQDfjUJgJbeuqpq1WJg4XgxgGRjIpkNOGlUNsA2ecOs+ExsMjrFehDOKxiaA+a11n6FcSzhHW/m44QIywBOgjKrBX9YHoAXY5fCvpXqR08QY9DQujov4jHd8+uXglpfdiD6lrFHvizEIR/15ZhoNAjRV8n/414Nxq/qspZokBxnzEbSzqIRPGdOzeU0aNxq2Ek6LzTTxuFJaPuNkARwTh/s8o8LwvLxN948w+R2c8c483YE+9RzlP0ZAC/9T4NqRCiZPHiICqKLdll3EBYflmmQ1ZKJ3D4zJkbwHpeZPit+ByAWsRqg4GLhAWOCyWfcX7EapvmYWUgnEzECTpTdf13Uo3/xzlJ6WfAvPnvRv8qPetGKnrs083L0aXu+z3a5OMjAMaVQGjHmN4o7KmHl5Ik4Fx6SGQ1bb6533kGc1vaxWsxiGYqgQYRlQZUDwNFrYP6pV8R+YbcyB3KvVzSyRX4uSmD0JA0Zjp4gyiXC7t9r7Zc+F4x/8sII8QbevnYGD2iEP5gAaS0lhY5YR9I3JBX9aVMbX9GuTwFOrHx8wjINd4BtV+Gapy1Rx4k6JDIvQy1fdpt6ofLkAUae2ntV+X48HxajH8WZsg90SG+tbVurdAcuJtNFYPcYfS9dfpOHjPoDiHoC0CO9EI7+pOU5dGu1z6 hiDNiOad G60137G58UQEt4yrRy4XDhu5TAWhMxbg3a+3rAte6xKpvMb1dmN/L/phHlhBjaea4t5F/P7Zt3aol/j9mVW5B4662VXWTXpdfRA1cKGKply8ktAg/phQBeymlIe83rOjEOOkGZR//an36BCv89ffUZfDtre3Z1tJzLG2n2RXBJ1PVFa48+lbHt5C6Ze99clJItt09rZtyDLxExiKxXnoPmzI/hrjhlMpYENbCLNYUnyhYMA5lydCpZplaY5Hf8jbCv80NpJDH77CH6pxcYFzozqlBhrWznAxsNSIcqqkFjE1myLr5oMMhjmee9/WRJWbu2gQGRXA5MNXBo+u8ZhCJx9E7vTdKPEEFg0pyW/SCf8B3dNl+LKFKjJXlSn5/qdCvbcU6l0pUbtQwCXW1aTjWAeSKjg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Ryan Roberts writes: > On 10/07/2023 10:01, Huang, Ying wrote: >> Ryan Roberts writes: >> >>> On 10/07/2023 06:37, Huang, Ying wrote: >>>> Ryan Roberts writes: >>>> >>>>> Somehow I managed to reply only to the linux-arm-kernel list on first attempt so >>>>> resending: >>>>> >>>>> On 07/07/2023 09:21, Huang, Ying wrote: >>>>>> Ryan Roberts writes: >>>>>> >>>>>>> With the introduction of large folios for anonymous memory, we would >>>>>>> like to be able to split them when they have unmapped subpages, in order >>>>>>> to free those unused pages under memory pressure. So remove the >>>>>>> artificial requirement that the large folio needed to be at least >>>>>>> PMD-sized. >>>>>>> >>>>>>> Signed-off-by: Ryan Roberts >>>>>>> Reviewed-by: Yu Zhao >>>>>>> Reviewed-by: Yin Fengwei >>>>>>> --- >>>>>>> mm/rmap.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>>>> index 82ef5ba363d1..bbcb2308a1c5 100644 >>>>>>> --- a/mm/rmap.c >>>>>>> +++ b/mm/rmap.c >>>>>>> @@ -1474,7 +1474,7 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, >>>>>>> * page of the folio is unmapped and at least one page >>>>>>> * is still mapped. >>>>>>> */ >>>>>>> - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) >>>>>>> + if (folio_test_large(folio) && folio_test_anon(folio)) >>>>>>> if (!compound || nr < nr_pmdmapped) >>>>>>> deferred_split_folio(folio); >>>>>>> } >>>>>> >>>>>> One possible issue is that even for large folios mapped only in one >>>>>> process, in zap_pte_range(), we will always call deferred_split_folio() >>>>>> unnecessarily before freeing a large folio. >>>>> >>>>> Hi Huang, thanks for reviewing! >>>>> >>>>> I have a patch that solves this problem by determining a range of ptes covered >>>>> by a single folio and doing a "batch zap". This prevents the need to add the >>>>> folio to the deferred split queue, only to remove it again shortly afterwards. >>>>> This reduces lock contention and I can measure a performance improvement for the >>>>> kernel compilation benchmark. See [1]. >>>>> >>>>> However, I decided to remove it from this patch set on Yu Zhao's advice. We are >>>>> aiming for the minimal patch set to start with and wanted to focus people on >>>>> that. I intend to submit it separately later on. >>>>> >>>>> [1] https://lore.kernel.org/linux-mm/20230626171430.3167004-8-ryan.roberts@arm.com/ >>>> >>>> Thanks for your information! "batch zap" can solve the problem. >>>> >>>> And, I agree with Matthew's comments to fix the large folios interaction >>>> issues before merging the patches to allocate large folios as in the >>>> following email. >>>> >>>> https://lore.kernel.org/linux-mm/ZKVdUDuwNWDUCWc5@casper.infradead.org/ >>>> >>>> If so, we don't need to introduce the above problem or a large patchset. >>> >>> I appreciate Matthew's and others position about not wanting to merge a minimal >>> implementation while there are some fundamental features (e.g. compaction) it >>> doesn't play well with - I'm working to create a definitive list so these items >>> can be tracked and tackled. >> >> Good to know this, Thanks! >> >>> That said, I don't see this "batch zap" patch as an example of this. It's just a >>> performance enhancement that improves things even further than large anon folios >>> on their own. I'd rather concentrate on the core changes first then deal with >>> this type of thing later. Does that work for you? >> >> IIUC, allocating large folios upon page fault depends on splitting large >> folios in page_remove_rmap() to avoid memory wastage. Splitting large >> folios in page_remove_rmap() depends on "batch zap" to avoid performance >> regression in zap_pte_range(). So we need them to be done earlier. Or >> I miss something? > > My point was just that large anon folios improves performance significantly > overall, despite a small perf regression in zap_pte_range(). That regression is > reduced further by a patch from Yin Fengwei to reduce the lock contention [1]. > So it doesn't seem urgent to me to get the "batch zap" change in. I don't think Fengwei's patch will help much here. Because that patch is to optimize if the folio isn't in deferred split queue, but now the folio will be put in deferred split queue. And I don't think allocating large folios upon page fault is more urgent. We should avoid regression if possible. > I'll add it to my list, then prioritize it against the other stuff. > > [1] https://lore.kernel.org/linux-mm/20230429082759.1600796-1-fengwei.yin@intel.com/ > Best Regards, Huang, Ying