From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B00EAC5B543 for ; Tue, 10 Jun 2025 07:04:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DA696B007B; Tue, 10 Jun 2025 03:04:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28B326B0089; Tue, 10 Jun 2025 03:04:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1A1DA6B008A; Tue, 10 Jun 2025 03:04:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id F098B6B007B for ; Tue, 10 Jun 2025 03:04:04 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 9AD20BEDE2 for ; Tue, 10 Jun 2025 07:04:04 +0000 (UTC) X-FDA: 83538601608.12.5B31138 Received: from mail-qt1-f171.google.com (mail-qt1-f171.google.com [209.85.160.171]) by imf04.hostedemail.com (Postfix) with ESMTP id A60B640010 for ; Tue, 10 Jun 2025 07:04:02 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DFhUW16R; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749539042; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hRxCgz3ov39f3NkO1C/Ste+6+jR42JfHSBXHuL3upq4=; b=e6jyrz82ozIbLKPaaxqBigqtkE58BjKG/35Z3Aarh5kUZH3r0U3lnPS8jjw2k8VI7Jv35S 1jmiEDf1sYx/U1lafqWDp+pfHuEGMyHpTfn/B27jo7bZ+hYL2bm4aBjnPOIgIUQdQrz40q 5AhMxdX5d3VGC8DZFKIhzwmGVl24lk4= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DFhUW16R; spf=pass (imf04.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.160.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749539042; a=rsa-sha256; cv=none; b=xZiqc1JntrU8R+2SCwQ6Nh5HkEc94PqEg6nKc5NDir+YhbWbyQWRODAUFGCFPTDI47TVzD or6nhT0LKyBnlXN3CpWI9r95MratO5DjO9CrIVek0EBkWQwl0gntBaFbACtsf/69fXBYHF aE8JSj0joCs6zNim2yGaS9A6yHxiBvA= Received: by mail-qt1-f171.google.com with SMTP id d75a77b69052e-4a5ae2fdf4eso56989191cf.0 for ; Tue, 10 Jun 2025 00:04:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749539042; x=1750143842; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hRxCgz3ov39f3NkO1C/Ste+6+jR42JfHSBXHuL3upq4=; b=DFhUW16R3ztdc7zq8e8YK174OMuy218qIqL+utqlr2iNP9nVEiHqfa7Pi6BoeSIlHE lDAD1xZUyaK/OfKZaTn8I/TTFbdB2JnrVvAi2BI/4HYpmHoMDIE7YHyMjELjPSveeQTl LNhFyZw/gq2QP6F2yfp+HAAjIo6YXIE7m8DTLFAvk9ffnudxc6HDbLVkQ/rMS7NYmSCA dNvyA45Lt0LAVfKc8HAGrQu/wwDCW8r50g0OwGlhtOjCwMSJGf3BNfx+v6BRooF2MCB0 8u4Wp4VGiE25o/lUbWGaeiMI8JRa73rPUovijeXAyb4EqhDHmqIUs1+p9zjE7yWt1ZCT DZag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749539042; x=1750143842; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hRxCgz3ov39f3NkO1C/Ste+6+jR42JfHSBXHuL3upq4=; b=hYFtY3/b5WAvz3wC1k+HVtJSw9Yrgw/YfZm8FlbNzh5SmtCvT40lATlpyGa9DN5UFp j7sbIkd0erGRiRx1YBfRtVuOBAKsaDCHZNB3QxgeBISbst6+yAYvvR3tEqxY8rYDe4T4 Hmp9vrL4fKxEPB469TOOoB8oSjtTRI76rWbiceCLD+0Kt8jjj2vUrh+FqaETHq7zyr8L maFRIbeiUaoO0pTDlBHBUkpE8xFusOwqSgZpHk0waNlre2HixmdsBWR2QvtuV4fJSZUp x0Z3N/BPQuShR+lGwNiN9p5+1+BhFCalOKiVFmIPL1UMmegnwebrTYW1ZSQr1SSj0Zwm EeCQ== X-Forwarded-Encrypted: i=1; AJvYcCVaQcn+mGTYQcC7hacGp7w7vqa4rVLPfvQwAyNWksLE+q0BI9QqMX8wO0HJRn40b9V2DT/U6lZnFA==@kvack.org X-Gm-Message-State: AOJu0YxVvsNsbnKmv7PcuxUDbk78NN/2l5VIi9boUJsJbiijI1hRxidm 208hKYU0760JTjF3AI8I0xo1zgvaA2TvrpQMBOXnOe5N3sr8X4Ms/b/Q5FhvsKDYRJoBe11XzJR cVMn6sGU9qeu0gf7RsYhrnlmKt5VUFQgTBeHY X-Gm-Gg: ASbGncuP+7odI6//uQjpVJOB8IT/iQc2oHz7lPsY8jPb7DkO2E+QzlsUSym/2l8S5ZA lTqUrnCcnOCeIE374AeC/Msty7ggQEiXyMIBCHhE3QhJL/X4ujQb7QJf/veNpE4edDr846hEZAl 8RArU3dgK2ruvccyxdRM73pCMvlZUZom/FhFK2d8k0srMb X-Google-Smtp-Source: AGHT+IHQS/YTqX0/RrUUYk+jBcicnmche2QaN2G7qxoFRaV8iI3I4823oDVD+CXzXXILw4YsvumYKvj6iy5XGekoduA= X-Received: by 2002:a05:6102:5802:b0:4e5:9608:1298 with SMTP id ada2fe7eead31-4e7a5d9219amr2083010137.9.1749539029690; Tue, 10 Jun 2025 00:03:49 -0700 (PDT) MIME-Version: 1.0 References: <20250610035043.75448-1-dev.jain@arm.com> <20250610035043.75448-3-dev.jain@arm.com> In-Reply-To: <20250610035043.75448-3-dev.jain@arm.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 10 Jun 2025 19:03:38 +1200 X-Gm-Features: AX0GCFssF_BfaHBSnwwuRHl4g5EboG5sCNvwQwRg64OyRx_hLEwKVZyhAQ8jJH4 Message-ID: Subject: Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching To: Dev Jain Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, pfalcato@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, peterx@redhat.com, ryan.roberts@arm.com, mingo@kernel.org, libang.li@antgroup.com, maobibo@loongson.cn, zhengqi.arch@bytedance.com, anshuman.khandual@arm.com, willy@infradead.org, ioworker0@gmail.com, yang@os.amperecomputing.com, baolin.wang@linux.alibaba.com, ziy@nvidia.com, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: f6sewgtiqda9ziysj3sax3g5h76twaa5 X-Rspamd-Queue-Id: A60B640010 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1749539042-211540 X-HE-Meta: U2FsdGVkX19seuCHd0lYP4E9OicZ5/hmf7qo2BSBl+Fy9Yx+fMyqG3QFOlKSSTyPDllmKbFxCSA2o+ROket7z+QgzE3whMQL/xRMfsVMvHGERBbY64TPkDjGt7vfZ5OlzRDVJVgjl4gxDDkHETmXGHCeCQYN+aSEFbSxicHCflB9P6HuqCkl8ENkUH0wlRSUbAGaVBgmV/r3bIF3lpp9mioxBgyJlRqgt+3ZEAPNqkv07jmawsD+YrE5+Irkbn6051nl2OVDQG/jxoDmOc9aRV0FJVmWE+tp6btruRwHG6tEEaoVaK6SFaEjMWSWhjrqMWU9px/k4Fe7BQv0xFqbQjSWMMB95uccRlIFPnyo5pURNC90BhEISmGNXza57y6LGTe3Zf695Wlh0JxmHM3Z2b6dwSLoSujVLgJi9G5jmZzzQMLgDN+TFcmgk+tjkxiC817lLEkhoEQ2vJvCKlXWVg0/GWc0EHRCBhfW7zIFVceJ05n6BAs2Y1bhhtt8zC7eM/CeJtpX0w+fv4ZIVaZupT5oDmOitHUNhiJNBnFFy7wLn8Olv8clRk9fGFACIj4mMNDK4+pmKBDyA0OpteSCesq0ebZVe02be+QATKpYeDprZAgpz5VDGIyabkRhDf4AWhC2FHLUMvqUP/IkLB2CeX5s3XaQlSFxYFiWmELl4U7L8YWSZjxvhcCc/DH7AGMJXp2O2xtaYngHT6X3JndjTbHxgnGne4/KXTZ23vmrCMTW1K5SQKOp5O3zIH20pK2wydYv93ZlZ0emX1S8beLaUk+YByfcdBJCbUoSuPsdL9U/H8TOEHySRd6hcXC4xeXY2+dU3qvqyO9FCb+fFR8dSxomzkNUz3K4NEAll5xYp+m0GXfonzFCb7UD79WdMoHW/gUOa/sa8LZZq9T82h4wuww9NsmHQog3Ib0fwSPCQEQjsnczzdww2htIPHxHYhPqH6JFoBKRAPw7RVJdD0q yZ4ToghL L+nh3i73sm+oSZO/x7mXL4O4ZTMpr08xAPMod1i13vcs4lnGp3cUar+J0hDLHi/21WVtO2lQzfRubwATymetBAwTxrxYLzrFe66L6/qT/+Bu2mzYi0w06iqfI8gwiu69Eh6s7AEppoi16xuloxVkFTJ5qo2za5sgemVckJqRx3N9Mttr5lqx29AGY7kCdf/UxlfWryHNZ2RqeInET8Ej1AlgaccIsXAPGvhs/RPBqPLG5gsG4BxjGPkR+t6DR7orUKfoMVS32ZOHzb8S1aYuDeRqqrvbi++dayYczFd0tTCHg7QQhDvhQmGb5PIwZjV+ERhJkf0AlVMa4KQHLUxp90gMfpEGmMxsSBVBaFNwcbTneE0LCwEpGQ5PEDrmeBubfdxVmT2tkB7pzbytvSzC55G3x8vLKILSLjxlb X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Dev, On Tue, Jun 10, 2025 at 3:51=E2=80=AFPM Dev Jain wrote: > > Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes > are painted with the contig bit, then ptep_get() will iterate through all= 16 > entries to collect a/d bits. Hence this optimization will result in a 16x > reduction in the number of ptep_get() calls. Next, ptep_get_and_clear() > will eventually call contpte_try_unfold() on every contig block, thus > flushing the TLB for the complete large folio range. Instead, use > get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and = only > do them on the starting and ending contig block. > > For split folios, there will be no pte batching; nr_ptes will be 1. For > pagetable splitting, the ptes will still point to the same large folio; > for arm64, this results in the optimization described above, and for othe= r > arches (including the general case), a minor improvement is expected due = to > a reduction in the number of function calls. > > Signed-off-by: Dev Jain > --- > mm/mremap.c | 39 ++++++++++++++++++++++++++++++++------- > 1 file changed, 32 insertions(+), 7 deletions(-) > > diff --git a/mm/mremap.c b/mm/mremap.c > index 180b12225368..18b215521ada 100644 > --- a/mm/mremap.c > +++ b/mm/mremap.c > @@ -170,6 +170,23 @@ static pte_t move_soft_dirty_pte(pte_t pte) > return pte; > } > > +static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned l= ong addr, > + pte_t *ptep, pte_t pte, int max_nr) > +{ > + const fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > + struct folio *folio; > + > + if (max_nr =3D=3D 1) > + return 1; > + > + folio =3D vm_normal_folio(vma, addr, pte); > + if (!folio || !folio_test_large(folio)) I'm curious about the following case: If the addr/ptep is not the first subpage of the folio=E2=80=94for example,= the 14th subpage=E2=80=94will mremap_folio_pte_batch() return 3? If so, get_and_clear_full_ptes() would operate on 3 subpages of the folio. In that case, can unfold still work correctly? Similarly, if the addr/ptep points to the first subpage, but max_nr is less than CONT_PTES, what will happen in that case? > + return 1; > + > + return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NUL= L, > + NULL, NULL); > +} > + > static int move_ptes(struct pagetable_move_control *pmc, > unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd) > { > @@ -177,7 +194,7 @@ static int move_ptes(struct pagetable_move_control *p= mc, > bool need_clear_uffd_wp =3D vma_has_uffd_without_event_remap(vma)= ; > struct mm_struct *mm =3D vma->vm_mm; > pte_t *old_ptep, *new_ptep; > - pte_t pte; > + pte_t old_pte, pte; > pmd_t dummy_pmdval; > spinlock_t *old_ptl, *new_ptl; > bool force_flush =3D false; > @@ -185,6 +202,8 @@ static int move_ptes(struct pagetable_move_control *p= mc, > unsigned long new_addr =3D pmc->new_addr; > unsigned long old_end =3D old_addr + extent; > unsigned long len =3D old_end - old_addr; > + int max_nr_ptes; > + int nr_ptes; > int err =3D 0; > > /* > @@ -236,14 +255,16 @@ static int move_ptes(struct pagetable_move_control = *pmc, > flush_tlb_batched_pending(vma->vm_mm); > arch_enter_lazy_mmu_mode(); > > - for (; old_addr < old_end; old_ptep++, old_addr +=3D PAGE_SIZE, > - new_ptep++, new_addr +=3D PAGE_SIZE) { > + for (; old_addr < old_end; old_ptep +=3D nr_ptes, old_addr +=3D n= r_ptes * PAGE_SIZE, > + new_ptep +=3D nr_ptes, new_addr +=3D nr_ptes * PAGE_SIZE)= { > VM_WARN_ON_ONCE(!pte_none(*new_ptep)); > > - if (pte_none(ptep_get(old_ptep))) > + nr_ptes =3D 1; > + max_nr_ptes =3D (old_end - old_addr) >> PAGE_SHIFT; > + old_pte =3D ptep_get(old_ptep); > + if (pte_none(old_pte)) > continue; > > - pte =3D ptep_get_and_clear(mm, old_addr, old_ptep); > /* > * If we are remapping a valid PTE, make sure > * to flush TLB before we drop the PTL for the > @@ -255,8 +276,12 @@ static int move_ptes(struct pagetable_move_control *= pmc, > * the TLB entry for the old mapping has been > * flushed. > */ > - if (pte_present(pte)) > + if (pte_present(old_pte)) { > + nr_ptes =3D mremap_folio_pte_batch(vma, old_addr,= old_ptep, > + old_pte, max_nr_= ptes); > force_flush =3D true; > + } > + pte =3D get_and_clear_full_ptes(mm, old_addr, old_ptep, n= r_ptes, 0); > pte =3D move_pte(pte, old_addr, new_addr); > pte =3D move_soft_dirty_pte(pte); > > @@ -269,7 +294,7 @@ static int move_ptes(struct pagetable_move_control *p= mc, > else if (is_swap_pte(pte)) > pte =3D pte_swp_clear_uffd_wp(pte= ); > } > - set_pte_at(mm, new_addr, new_ptep, pte); > + set_ptes(mm, new_addr, new_ptep, pte, nr_ptes); > } > } > > -- > 2.30.2 > Thanks Barry