From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 36222C5B543 for ; Tue, 10 Jun 2025 08:11:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B3E206B0093; Tue, 10 Jun 2025 04:11:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AEEBE6B0095; Tue, 10 Jun 2025 04:11:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A04B06B0098; Tue, 10 Jun 2025 04:11:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 7EDD06B0093 for ; Tue, 10 Jun 2025 04:11:45 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 2A6C0100EF2 for ; Tue, 10 Jun 2025 08:11:45 +0000 (UTC) X-FDA: 83538772170.16.321C328 Received: from mail-vk1-f171.google.com (mail-vk1-f171.google.com [209.85.221.171]) by imf24.hostedemail.com (Postfix) with ESMTP id 4FF15180003 for ; Tue, 10 Jun 2025 08:11:43 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h2PkBoLd; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749543103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hn0wbc6lEUEago4mKd8U5pWPg4kTjjsOa+9yDlNZsgQ=; b=cOZpFYZr3Xhd9H50gs34G0LNKveEMx3s4dyj88DbmAx1t2jL0Y1/hAOE5xiPGY4c215R9i w+DgPnD/E0T2KU2adqrYak2bvBlXaOzGWar9uJ0MvkdhUt4qjKjh5B6Ruydl7PYDz2tbQL Vtv75hkscVtlxxU/YKmrLyd4PV/Cyjk= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=h2PkBoLd; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.171 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749543103; a=rsa-sha256; cv=none; b=4Df0OUFfNfBe2KaFXSVwIFKSz4mxSR39rrKfcOtN9QK+63/bSuSQ2i31g8QXwOyP7CdnS1 RJAAXhaJ1H184XyUrjtJ2zLAsVavGYweUENKq3J9ykpzJZPHKIHopx3gdKS4NRS5Ob2PG1 vlGA+vkL0YyGSNaUuyZkyIbqBlxOB70= Received: by mail-vk1-f171.google.com with SMTP id 71dfb90a1353d-530f9edd032so1644681e0c.0 for ; Tue, 10 Jun 2025 01:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1749543102; x=1750147902; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Hn0wbc6lEUEago4mKd8U5pWPg4kTjjsOa+9yDlNZsgQ=; b=h2PkBoLdZoSKdf6dulF/PV6hu28AYTi6KZ9EmDYzgiNp/Q6Z/jcal9p0uakbiDHfqQ 122bLx4m9J2GaUlZZNtNYRULMkhBZB80owYDIFVO/hv7VFCo7uVQWLwyyrJEJgbsGw/e lbehekwAV2Mmsjk7cmrd6f2IKrjS85u8H8V/VGLcw7H/OlK6cPxpdydcsXvnDjRMDgow 7TEdZ0sAT5M+VElb89GIcGSPjXJhC8DE95LChDYY4Ja4rveGQ+wTRzbTano2yFVXM2Iz 16+X8F5TeqqaBBm4UxmjkD/pQWxHK7aySIwv1WECnmtlZOX+vE3veCuQbxj2uOGp1I8g FcWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749543102; x=1750147902; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Hn0wbc6lEUEago4mKd8U5pWPg4kTjjsOa+9yDlNZsgQ=; b=rghtYuIuBdbC6H7R3blZhmQoLQI/jgH20dHB1r8mP3bpTMG6n7uv5AFbU5DWyLGgBI n8ZBDs1sMgSdDDco8tu3xigoVaaYE6XfFDTznCWWA45fdORPDFuygrzX4xv5440xQT0U Nd824LNAmSV9uyyJy562HHeIoWq+KLIE8XE+ycJfAbcMf0Lfw/CCasiN73qZo0XQiVN3 pFlkew0/08es3EBeymjmbO7AT4xubWuUBcRpCnHdqviwmfkY1DhE6e4u86Y4vyQkDuPk O3WZ612ZDQMDpYHOPrucyC+XrjbBDmq/S93CRI9eGpu/qSgv7J8YCzeMA2iWQbrULcbw AQUA== X-Forwarded-Encrypted: i=1; AJvYcCXjiVjb4mQGGzIeuQCaYj+AVPjHc+q+MudMK91UeJ4VI1rgcW6jcZdQdHfuTYDrae45R3dwgTCsSQ==@kvack.org X-Gm-Message-State: AOJu0YySeTOZF4eugWMIpQtpT3U9cZHgQiDbmLSPqHoZXmPf039nrgCI xRcaAedW9gHzdUDx9Y1u9Iml78XE9QuRqIAl/T09dUCx44N91f4gYG0O/BBah7es4LUAZipEypY J67jb39y0L1AoWjRloF6bBQ2P/2afZKg= X-Gm-Gg: ASbGncvfcwo5Es6PbEyJEXdPdmkxtIckvr95vzFC46DfJOnz/jrt9GD5yg618ALszOz HdnBzieUc8WbtnUEgJ0pgG40Z99AYpClVsVmmY9SmHV2MkOey/pVPLNbpftY2nsmBJpu6B776nY IMndcY6LDK5XiR40DYV/SG6GZGSILb+MhCGKD3EjeXKQ3m X-Google-Smtp-Source: AGHT+IEGEzAEtu+csMPWQWNFbl6NTrSKQH12N5HlKYz4Yle+6IUBkcQBSSs5CYZOqK6bq7ZiDaDQbfD/sAZmujBm5Sg= X-Received: by 2002:a05:6122:640c:10b0:527:67d9:100d with SMTP id 71dfb90a1353d-53113144d99mr1287762e0c.4.1749543102357; Tue, 10 Jun 2025 01:11:42 -0700 (PDT) MIME-Version: 1.0 References: <20250610035043.75448-1-dev.jain@arm.com> <20250610035043.75448-3-dev.jain@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 10 Jun 2025 20:11:31 +1200 X-Gm-Features: AX0GCFvDGDuDSnfkPf-NWYVXTiddf4fI4DcZ1MZ9I4Kd1QbrYWe3BlRqTTWLwRs Message-ID: Subject: Re: [PATCH v4 2/2] mm: Optimize mremap() by PTE batching To: Dev Jain Cc: akpm@linux-foundation.org, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, vbabka@suse.cz, jannh@google.com, pfalcato@suse.de, linux-mm@kvack.org, linux-kernel@vger.kernel.org, david@redhat.com, peterx@redhat.com, ryan.roberts@arm.com, mingo@kernel.org, libang.li@antgroup.com, maobibo@loongson.cn, zhengqi.arch@bytedance.com, anshuman.khandual@arm.com, willy@infradead.org, ioworker0@gmail.com, yang@os.amperecomputing.com, baolin.wang@linux.alibaba.com, ziy@nvidia.com, hughd@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4FF15180003 X-Stat-Signature: xqzau6p9ywei4omykxkbfducgs95hzed X-Rspam-User: X-HE-Tag: 1749543103-513249 X-HE-Meta: U2FsdGVkX18LVJiSRuijaDxPJTvglL9UWAHI4IUf1H2cbUjBbGk9rq1dtKrTR1s/0A50kl2KwHCW+gAsdzuD7ihHyFVWtRZbh3nSDuf/fBPsjWPgUWUNW4NP8ATOCpJ80CoBQS7U0du55iOFCIyehPMNXXGR4uPOuNziIQ/LhZyXrG6XsOIRLF0fHZHyR2j6q9lU5iuXEccEmWu7zRJOhmhdv9Hk8EmnZnQ33GekHAmQUkEqZqC/171jE1Umo+apboC0dRouuuPiApAtnuprek0xOIgZFE0WBJqYUx0nDIaeaJw+sLy2/GzvKKS9/wyJCtdwUQODlsBFEIaekIAHAE+SBFS3KGsW0lXlyN8403CeEflcgGRPNfaiP1S9jIrmZi1HFjP/wLlmfoApCLqmKzVt4pBtDruqDwGowc+Fs22HkojZqm0Wb0EDE7HW98le4oRWH3fYj/pOErm4Lvzu5EJqBMBmrqqyk/mNpe72V6mppzECa+gD5N2Qt40X95ZFm0PLTXch5ZhNNvnsMEENO+CG87FVLYyJblEvdumN3ibOUsfnLr8QbG0PiSjfT/A1CXAOzTG750mNEInBHa5AIS4GaMdv05GvT9Nclpen3iUnKBsk9YKLvvv/dIJR6/HDqacFT2QJIVMNbD/wdC6TK5LvLOjCsjvsxh/bTzR37QFW0SNI1LEJrZFkGgaMKgGFin4c963SdJiYtmqafjFgb2xwmzJzeDJiwhK1w/EFEiM0JrFecbLhuipbqR3eRUVxZWDvXosgCjB18iwR28w5+WL/XyIl1/wWrrZun4qW+CkeZngbKzFFqYQexzwsumxjLwYfbZU+oYfoHgPbVqttTcuppqB1i+jt4hjpwJepQfScOKBTDT5AWMvCd0+lGzc6EvoVnpvO/t30/otYbuFdZUBpkrLnVHFWzrrxSU2zBJfA+Fwbs2Mlh+37tW84cCADJWq7wgpAhdFgScjkPLD CFRs/YSd MCiV7VT9qLTHMiAYgGKI8qFD18rv9hAEUDWfyVvF4vx1vpeupa9jfLCV+JLglfV7Q7jhsZhKe8syrfRuiKbXfDNd2B2db6XiTSH8SF5Jn89g9/y4SWBqwWTfwAwS7KXciMBXNGMBoOWWOZNG11s01X6qLM7vWNhYqc/JaJelVT/0S0FoBSNESrwtrHbTfxDaKjSlDSwrcgV4TwMaG/o+tAHraZ2EiUxuYGr7YNigVTLqbuHLON9aDmG0hGHG5MuIHgdVoOhj8cME1HAb2lbhtIdvH6pTCR19K81ubMiqEvqa7v2Sne+Darq6oQMQGFKiuj4pBJTRV3dwXopPXRpLn0upWehcxHmGjbOxcn6vYQjfQtbq/0srlzKPkatWZMUznpvmM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 10, 2025 at 7:45=E2=80=AFPM Dev Jain wrote: > > > On 10/06/25 12:33 pm, Barry Song wrote: > > Hi Dev, > > > > On Tue, Jun 10, 2025 at 3:51=E2=80=AFPM Dev Jain wro= te: > >> Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptes > >> are painted with the contig bit, then ptep_get() will iterate through = all 16 > >> entries to collect a/d bits. Hence this optimization will result in a = 16x > >> reduction in the number of ptep_get() calls. Next, ptep_get_and_clear(= ) > >> will eventually call contpte_try_unfold() on every contig block, thus > >> flushing the TLB for the complete large folio range. Instead, use > >> get_and_clear_full_ptes() so as to elide TLBIs on each contig block, a= nd only > >> do them on the starting and ending contig block. > >> > >> For split folios, there will be no pte batching; nr_ptes will be 1. Fo= r > >> pagetable splitting, the ptes will still point to the same large folio= ; > >> for arm64, this results in the optimization described above, and for o= ther > >> arches (including the general case), a minor improvement is expected d= ue to > >> a reduction in the number of function calls. > >> > >> Signed-off-by: Dev Jain > >> --- > >> mm/mremap.c | 39 ++++++++++++++++++++++++++++++++------- > >> 1 file changed, 32 insertions(+), 7 deletions(-) > >> > >> diff --git a/mm/mremap.c b/mm/mremap.c > >> index 180b12225368..18b215521ada 100644 > >> --- a/mm/mremap.c > >> +++ b/mm/mremap.c > >> @@ -170,6 +170,23 @@ static pte_t move_soft_dirty_pte(pte_t pte) > >> return pte; > >> } > >> > >> +static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigne= d long addr, > >> + pte_t *ptep, pte_t pte, int max_nr) > >> +{ > >> + const fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY= ; > >> + struct folio *folio; > >> + > >> + if (max_nr =3D=3D 1) > >> + return 1; > >> + > >> + folio =3D vm_normal_folio(vma, addr, pte); > >> + if (!folio || !folio_test_large(folio)) > > I'm curious about the following case: > > If the addr/ptep is not the first subpage of the folio=E2=80=94for exam= ple, the > > 14th subpage=E2=80=94will mremap_folio_pte_batch() return 3? > > It will return the number of PTEs, starting from the PTE pointing to the = 14th > subpage, that point to consecutive pages of the same large folio, up till= max_nr. > For an example, if we are operating on a single large folio of order 4, t= hen max_nr > will be 16 - 14 + 1 =3D 3. So in this case we will return 3, since the 14= th, 15th and > 16th PTE point to consec pages of the same large folio. > > > If so, get_and_clear_full_ptes() would operate on 3 subpages of the fol= io. > > In that case, can unfold still work correctly? > > Yes, first we unfold as in, we do a BBM sequence: cont -> clear -> non-co= nt. > Then, on this non-contig block, we will clear only the PTEs which were as= ked > for us to do. While going through the code, static inline pte_t get_and_clear_full_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned int nr, int full) { pte_t pte; if (likely(nr =3D=3D 1)) { contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); pte =3D __get_and_clear_full_ptes(mm, addr, ptep, nr, full)= ; } else { pte =3D contpte_get_and_clear_full_ptes(mm, addr, ptep, nr,= full); } return pte; } Initially, I thought it only unfolded when nr =3D=3D 1, but after reading contpte_get_and_clear_full_ptes more closely, I realized we do support partial unfolding=E2=80=94that's what I had missed. pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm, unsigned long addr, pte_t *ptep, unsigned int nr, int full) { contpte_try_unfold_partial(mm, addr, ptep, nr); return __get_and_clear_full_ptes(mm, addr, ptep, nr, full); } I think you are right. > > > > > Similarly, if the addr/ptep points to the first subpage, but max_nr is > > less than CONT_PTES, what will happen in that case? > > > > > >> + return 1; > >> + > >> + return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, = NULL, > >> + NULL, NULL); > >> +} > >> + > >> static int move_ptes(struct pagetable_move_control *pmc, > >> unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd) > >> { > >> @@ -177,7 +194,7 @@ static int move_ptes(struct pagetable_move_control= *pmc, > >> bool need_clear_uffd_wp =3D vma_has_uffd_without_event_remap(= vma); > >> struct mm_struct *mm =3D vma->vm_mm; > >> pte_t *old_ptep, *new_ptep; > >> - pte_t pte; > >> + pte_t old_pte, pte; > >> pmd_t dummy_pmdval; > >> spinlock_t *old_ptl, *new_ptl; > >> bool force_flush =3D false; > >> @@ -185,6 +202,8 @@ static int move_ptes(struct pagetable_move_control= *pmc, > >> unsigned long new_addr =3D pmc->new_addr; > >> unsigned long old_end =3D old_addr + extent; > >> unsigned long len =3D old_end - old_addr; > >> + int max_nr_ptes; > >> + int nr_ptes; > >> int err =3D 0; > >> > >> /* > >> @@ -236,14 +255,16 @@ static int move_ptes(struct pagetable_move_contr= ol *pmc, > >> flush_tlb_batched_pending(vma->vm_mm); > >> arch_enter_lazy_mmu_mode(); > >> > >> - for (; old_addr < old_end; old_ptep++, old_addr +=3D PAGE_SIZE= , > >> - new_ptep++, new_addr +=3D PAGE_SIZE= ) { > >> + for (; old_addr < old_end; old_ptep +=3D nr_ptes, old_addr += =3D nr_ptes * PAGE_SIZE, > >> + new_ptep +=3D nr_ptes, new_addr +=3D nr_ptes * PAGE_SI= ZE) { > >> VM_WARN_ON_ONCE(!pte_none(*new_ptep)); > >> > >> - if (pte_none(ptep_get(old_ptep))) > >> + nr_ptes =3D 1; > >> + max_nr_ptes =3D (old_end - old_addr) >> PAGE_SHIFT; > >> + old_pte =3D ptep_get(old_ptep); > >> + if (pte_none(old_pte)) > >> continue; > >> > >> - pte =3D ptep_get_and_clear(mm, old_addr, old_ptep); > >> /* > >> * If we are remapping a valid PTE, make sure > >> * to flush TLB before we drop the PTL for the > >> @@ -255,8 +276,12 @@ static int move_ptes(struct pagetable_move_contro= l *pmc, > >> * the TLB entry for the old mapping has been > >> * flushed. > >> */ > >> - if (pte_present(pte)) > >> + if (pte_present(old_pte)) { > >> + nr_ptes =3D mremap_folio_pte_batch(vma, old_ad= dr, old_ptep, > >> + old_pte, max_= nr_ptes); > >> force_flush =3D true; > >> + } > >> + pte =3D get_and_clear_full_ptes(mm, old_addr, old_ptep= , nr_ptes, 0); > >> pte =3D move_pte(pte, old_addr, new_addr); > >> pte =3D move_soft_dirty_pte(pte); > >> > >> @@ -269,7 +294,7 @@ static int move_ptes(struct pagetable_move_control= *pmc, > >> else if (is_swap_pte(pte)) > >> pte =3D pte_swp_clear_uffd_wp= (pte); > >> } > >> - set_pte_at(mm, new_addr, new_ptep, pte); > >> + set_ptes(mm, new_addr, new_ptep, pte, nr_ptes)= ; > >> } > >> } > >> > >> -- > >> 2.30.2 Thanks Barry