From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EF4F0C3ABBF for ; Tue, 6 May 2025 14:11:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 66EB56B0082; Tue, 6 May 2025 10:11:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6186A6B0085; Tue, 6 May 2025 10:11:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 507666B0088; Tue, 6 May 2025 10:11:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 33AB36B0082 for ; Tue, 6 May 2025 10:11:07 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id AE5DC1602DF for ; Tue, 6 May 2025 14:11:07 +0000 (UTC) X-FDA: 83412669774.28.1731876 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 70A89C000F for ; Tue, 6 May 2025 14:11:05 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746540666; a=rsa-sha256; cv=none; b=JYfS178Tq80ZtdFQ1Kx5rlBPfIe2nXCq6DVTcELD9y9GUNFDG/zsfMqpwwOKFc6c95u2vj THChxYh4VAnB5P31Clb66h9SOCJItyOyL99LhFWaVF7Pm1RnxSOeaQNkxpZAE0Z9bUS2z1 +viXQt4Ea83MDbB6E/hMto00io8F3eo= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746540666; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=G5AJJHL6ol0IiX8RSOn+CwJZlS4+rNZnxCm52HBc/yo=; b=mMrUud1vjHv/oxR+cCax/yZcJzx+jTmxFkVMztSsTSsA3kWijfEoIoHKe2bXI+l8idnpVd OZF7NWsfOdOEt5GqPzr6aJUoA7ImvvghxmkhKBZnu5uouD4cBWbsxDlmNA1ZGLGL3ZqEBU F+ksRgYTb3SOIBNxFv679RRwBHkurvM= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E19D9339; Tue, 6 May 2025 07:10:53 -0700 (PDT) Received: from [10.163.81.21] (unknown [10.163.81.21]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 2F7AC3F5A1; Tue, 6 May 2025 07:10:53 -0700 (PDT) Message-ID: <9c2844c9-5c1a-4250-a89a-0c4d01d47d5e@arm.com> Date: Tue, 6 May 2025 19:40:49 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 3/3] mm: Optimize mremap() by PTE batching To: Lorenzo Stoakes Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, vbabka@suse.cz, jannh@google.com, pfalcato@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, peterx@redhat.com, ryan.roberts@arm.com, mingo@kernel.org, libang.li@antgroup.com, maobibo@loongson.cn, zhengqi.arch@bytedance.com, baohua@kernel.org, anshuman.khandual@arm.com, willy@infradead.org, ioworker0@gmail.com, yang@os.amperecomputing.com References: <20250506050056.59250-1-dev.jain@arm.com> <20250506050056.59250-4-dev.jain@arm.com> <7c6e61fa-2437-4c99-b1d3-97af5e2b37a3@lucifer.local> Content-Language: en-US From: Dev Jain In-Reply-To: <7c6e61fa-2437-4c99-b1d3-97af5e2b37a3@lucifer.local> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 70A89C000F X-Stat-Signature: fzb6p17yso4uttaebwopmbgok3u3pish X-HE-Tag: 1746540665-265315 X-HE-Meta: U2FsdGVkX199PuOX09gFz2NvNIwp9TcWsnK03UjCTrJ7bOBShIb7jcPz1ZkpH2Fen3U+BnLnoDmQaEeBdm74BNJklhI61iAw1WoV6SONSif8f25GqB0KPhVtF48aUc8PmLJ4YgsPVQncK5gGlK/5jKwUh2wIf2KI8u/flUYfQLlBeHm1yu6WSfSljWMqE0Vi+DA+3DH1qLOGOvD4Dxd6iU/T3IQVHWMl9SICg08vqMfQdto0iKINsnfZDN3jCpf6ueYK3dDS997Z7FTEeifKqhhEvxNQyX1wR0pT21FwFwkYeBIzZDuSH2j1BW7I6G0XuYSAg5LcejEhuIsS9BEuk8quI/WxWBBJhtIU0PGqPFB1E3otAMZQQFZBaGpqQn340WdaulxfFXeDb5jxwREEbD53aCOIiUdzSg+AeSozPr0LHDCLwPoifwFjbW7gpMpVxZWXq/i0/xSSJU+yoCpzawZwl/CSli/MiIaiKhDJGGi3vOobr446rlBysXi5yB+pB420eFBKhHy5aBLXC8P3wnmZDIE/bhrO4y4m9Kg6Q8iez38wVdSF4M4X0m+EDG3yWwdmhM+XWWLC22yyoGlJgq1xEHZfdvLyWrXWUetEZQfaMZRikhmUlKADrLOxZp5EJ1UNYx2w/TTDtyuGEEyVo0KBzcHXYgURyOguozbWMbkg6UvPZoDLVTE3jMyyWGXwsq5dMEMNPtSFr3e+8jmrvBmNoK9U6jJ10HIl1VdtHyxbAFryCjJkYD5hZbdnx/BByi70KwjG16vqtuu0ZijilQx7i/icaYmcHC4dTi6tnOBL1SXwQSAPa/L4NYEjNLbCURhnku4zojK8C4SuecK0sz2/kceD1mW8+Sk/+pZvGu3Gqf9hHLtoz6AQYiw/8K7xwDldDke+qbbL1ab36sftioS4N8inwPqS5Wn9fD4Wiq5SM/WjihNHmS0Kk55fZrRqQkAcUpIcKfSidoPTBAC Po3XCulx jM9H8kSaFQ3i1nVUy4TpwWFdnoOJ87Sjz2hr2+Al5aq2JMAF8MxpiWLH0aij6qrUFYdfRRLORBXKW2wUEZEVOi0wgwSFMyi7oC86L/RS3/7nwQOvKyovPvQBSCSjeGK1W9c2TZWlZ8izdi8bm7hUABRvXmiqYCZVYZMljqeGFM+3/Uc12IKX6lz3HTlMF6QMh1P0WtJ8mhAxhDzRIWb4+bd9dQ2bMJBYMglzD3mGlSqRu8Oup8WAJSF+4AQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 06/05/25 7:19 pm, Lorenzo Stoakes wrote: > On Tue, May 06, 2025 at 10:30:56AM +0530, Dev Jain wrote: >> Use folio_pte_batch() to optimize move_ptes(). Use get_and_clear_full_ptes() >> so as to elide TLBIs on each contig block, which was previously done by >> ptep_get_and_clear(). > > No mention of large folios > >> >> Signed-off-by: Dev Jain >> --- >> mm/mremap.c | 24 +++++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/mm/mremap.c b/mm/mremap.c >> index 1a08a7c3b92f..3621c07d8eea 100644 >> --- a/mm/mremap.c >> +++ b/mm/mremap.c >> @@ -176,7 +176,7 @@ static int move_ptes(struct pagetable_move_control *pmc, >> struct vm_area_struct *vma = pmc->old; >> bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma); >> struct mm_struct *mm = vma->vm_mm; >> - pte_t *old_ptep, *new_ptep, pte; >> + pte_t *old_ptep, *new_ptep, old_pte, pte; > > Obviously given previous comment you know what I'm going to say here :) let's > put old_pte, pte in a new decl. > >> pmd_t dummy_pmdval; >> spinlock_t *old_ptl, *new_ptl; >> bool force_flush = false; >> @@ -185,6 +185,7 @@ static int move_ptes(struct pagetable_move_control *pmc, >> unsigned long old_end = old_addr + extent; >> unsigned long len = old_end - old_addr; >> int err = 0; >> + int nr; >> >> /* >> * When need_rmap_locks is true, we take the i_mmap_rwsem and anon_vma >> @@ -237,10 +238,14 @@ static int move_ptes(struct pagetable_move_control *pmc, >> >> for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE, >> new_ptep++, new_addr += PAGE_SIZE) { > > Hm this just seems wrong, even if we're dealing with a large folio we're still > offsetting by PAGE_SIZE each time and iterating through further sub-pages? > > Shouldn't we be doing something like += nr and += PAGE_SIZE * nr? This is embarrassing *facepalm* . The crazy part is that I didn't even notice this because I got an optimization due to get_and_clear_full_ptes -> the number of TLB flushes reduced, and the loop continued due to pte_none(). > > Then it'd make sense to initialise nr to 1. > > Honestly I'd prefer us though to refactor move_ptes() to something like: > > for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE, > new_ptep++, new_addr += PAGE_SIZE) { > pte_t old_pte = ptep_get(old_ptep); > > if (pte_none(old_pte)) > continue; > > move_pte(pmc, vma, old_ptep, old_pte); > } > > Declaring this new move_pte() where you can put the rest of the stuff. > > I'd much rather we do this than add to the mess as-is. > > > >> - if (pte_none(ptep_get(old_ptep))) >> + const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >> + int max_nr = (old_end - old_addr) >> PAGE_SHIFT; >> + >> + nr = 1; >> + old_pte = ptep_get(old_ptep); > > You can declare this in the for loop, no need for us to contaminate whole > function scope with it. > > Same with 'nr' in this implementation (though I'd rather you changed it up, see > above). > >> + if (pte_none(old_pte)) >> continue; >> >> - pte = ptep_get_and_clear(mm, old_addr, old_ptep); >> /* >> * If we are remapping a valid PTE, make sure >> * to flush TLB before we drop the PTL for the >> @@ -252,8 +257,17 @@ static int move_ptes(struct pagetable_move_control *pmc, >> * the TLB entry for the old mapping has been >> * flushed. >> */ >> - if (pte_present(pte)) >> + if (pte_present(old_pte)) { >> + if ((max_nr != 1) && maybe_contiguous_pte_pfns(old_ptep, old_pte)) { >> + struct folio *folio = vm_normal_folio(vma, old_addr, old_pte); >> + >> + if (folio && folio_test_large(folio)) >> + nr = folio_pte_batch(folio, old_addr, old_ptep, >> + old_pte, max_nr, fpb_flags, NULL, NULL, NULL); > > Indentation seems completely broken here? I also hate that we're nesting to this > degree? Can we please find a way not to? > > This function is already a bit of a clogged mess, I'd rather we clean up then > add to it. > > (See above again :) > > >> + } >> force_flush = true; >> + } >> + pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr, 0); >> pte = move_pte(pte, old_addr, new_addr); >> pte = move_soft_dirty_pte(pte); >> >> @@ -266,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc, >> else if (is_swap_pte(pte)) >> pte = pte_swp_clear_uffd_wp(pte); >> } >> - set_pte_at(mm, new_addr, new_ptep, pte); >> + set_ptes(mm, new_addr, new_ptep, pte, nr); >> } >> } >> >> -- >> 2.30.2 >> > > Cheers, Lorenzo