From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C785CC4345F for ; Fri, 12 Apr 2024 02:09:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 582096B0082; Thu, 11 Apr 2024 22:09:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 50AEC6B009A; Thu, 11 Apr 2024 22:09:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 383A86B009B; Thu, 11 Apr 2024 22:09:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 1228B6B009A for ; Thu, 11 Apr 2024 22:09:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C707CC0D51 for ; Fri, 12 Apr 2024 02:09:50 +0000 (UTC) X-FDA: 81999248940.25.E4226C0 Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf19.hostedemail.com (Postfix) with ESMTP id E66F31A0002 for ; Fri, 12 Apr 2024 02:09:48 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nf8hB9Jw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712887789; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x41aUECQyH1uJoELeJiHibMz7iZeQsBZdUA9Cc2JC0Q=; b=w6tY4lY7fCpACP2ipvRCxQXvSDAt67mZP3HBns0OTg8CBEXI/d58UFgF9nrG6nFFc38z+P lSvxMbq63fZHvA7OeaDfHpn1w27+jn2sdqEN6OpgUUAHvJijn7utWGjSA/BnpgP7hGfTzi bjsW/ausQaiwHXw1DV7wW2CvtUY7MBc= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Nf8hB9Jw; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf19.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712887789; a=rsa-sha256; cv=none; b=eRfxFNhAwjwShFPZNak5pIeHpGQdTGLzn3MyZdvfEbNuFws+C6TmRMBTzyT8VsvzLc/+au yT2824LU2RZDamsAQoep+84h7RPzDPVedB5D+g9Gn7aFhMkEgEhA8BiCtlin1yvDS+SJie i0dT0kXvAGoheQ5ceC8JrljdXTl5tsE= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-55a179f5fa1so408700a12.0 for ; Thu, 11 Apr 2024 19:09:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1712887787; x=1713492587; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=x41aUECQyH1uJoELeJiHibMz7iZeQsBZdUA9Cc2JC0Q=; b=Nf8hB9JwJydAyQcHMB53m4ydSFstd836tAv3wmJd0ez5nxxV9VqDU5VBfblsCwTr09 g9eSt8/FSyF0gW8luhFFT35kSK312bNKHV+2mQtaxa4+HbTNIGesZZfA0TiGBv/E0qHn lMZElObWfsPQ65frNAdcDDIxFlhzJsOFws1tncOn5QM8oAd3T1dU+oRzOHVRxp3EctB9 rYa0xzzTmFqrofzGHb+Yo0oLFs2nD7rE/zPXmPyH8GxQxMSar/LjyLB1JC2CfsnVkLWZ nNMz/Hoj/ZjFGwYfjDG7SEXUqc+ImwZg04XDfi/4oKMh+njT8mdOy/9P+RBQEBEzMtpd s8gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1712887787; x=1713492587; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=x41aUECQyH1uJoELeJiHibMz7iZeQsBZdUA9Cc2JC0Q=; b=HG0vtWWsNF/mGhoV5CgbHyAIZq5wIqFWOTVRLs0WH8s1O5kEVixNZ2iTwLGQPCuqIg tEO1zzIpLYnay2gB9FigGelzNYhAy9PKpGUdrS42K7oXFag03SEwrvHFzOyqMl2AdEUl JSMiRtkdzyCv732c8yCMIUrMcal2B9hSK9EhThVX6Fy7NllJ9JbXY6ZRDUwp/chsyYyP EfuscnN6VBW5w9P0Pxa2jb7Yyg5P1YXuC6pRSZRMn+2QIfcz1deoJmqqJweJLgDVIHGc QGlGwPfRT2oNPeQl6rCEh+PAunRNHpZoY4VKfuk69Wz9E83nJvv+ZpdVZqf+Jj3Jwcrb dl4w== X-Forwarded-Encrypted: i=1; AJvYcCXIKBhQCjJJJNoAF7kncjbl65LDDWYCkARsGuyUop5ytZ+4ZCxEcNF4TpTe1aAtYuxvrLQcA/QMfvLPWEUdcnF86JM= X-Gm-Message-State: AOJu0YxI68Rm3f/gsM0evda3AAWLy6i4avgTuLBNDJJcKIuqrbZKMsz5 HF9lCQ5fI3i7S9W/Sgl10GbIPhrWQ6dC2bCA5eNghUz1wrbBaUrYvGsQcpyXrs3fNf8odzSEPIL F2hfcHdTWEMIxTcuoiNU2BlLX7lI= X-Google-Smtp-Source: AGHT+IGLOacDFDocnZwYMyuilqXHN9H+0VBuzfNXqCGXgGq5ADANxEeokXkpwDcR21MNsW/eQtXwE9p6m2AQrcl5OTM= X-Received: by 2002:a50:d642:0:b0:56d:fdb3:bcc5 with SMTP id c2-20020a50d642000000b0056dfdb3bcc5mr824684edj.12.1712887787112; Thu, 11 Apr 2024 19:09:47 -0700 (PDT) MIME-Version: 1.0 References: <20240408042437.10951-1-ioworker0@gmail.com> <20240408042437.10951-3-ioworker0@gmail.com> <3cd1036d-3814-4a10-b6d2-099937ceabc8@arm.com> In-Reply-To: <3cd1036d-3814-4a10-b6d2-099937ceabc8@arm.com> From: Lance Yang Date: Fri, 12 Apr 2024 10:09:36 +0800 Message-ID: Subject: Re: [PATCH v5 2/2] mm/arm64: override mkold_clean_ptes() batch helper To: Ryan Roberts Cc: akpm@linux-foundation.org, david@redhat.com, 21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: cqsmo5n41h75y67umsz3z9g3p9x34qfx X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: E66F31A0002 X-HE-Tag: 1712887788-225068 X-HE-Meta: U2FsdGVkX19MZk8evgjfzxg+EeKCGSVFByGBCw0b8A8xGyDmNXv4SHW0a5NHDoFHu1yN2FssrJGmG+06EsaGwFRfAub9isunFGKW8Yu4DU+RP2lxfaUI0tZPwResty6n6zxbHEOIHSyrrY+Zakgu2emeFeTnY54qjYrDRcUe5ni8jwEFqQKc8Co649uULvYyMfLseMn7Wl9Cnqi4ZZzVtHaS9grpOxRLbXyjhsrrCAAKg0FYclHHx5p9/nXWbJNoAiwLpgcVxUOohCZhWGAf1PsFXlFe0zmD8Irlsjjtyp4pCrn3VxemK24WQ8pbsuzsR6b6fnt6aME1P79w13mMjElq8TwWfLULH+bmxQCbWbCHrN9ta4adWBLvi4A98poPmMxEe2Q8nSxPIi8Ur1Bpdi2ZjLqVhJW8TFoRe/0nsNGQghgT2cp/ww7QyjcLYevHveCPON/l+/fixNU9Rl9HqvQp8+O3u1E9EsONqubwIPi1WeECr68LfkVjAF4pHhkgJ1HDTZ8tpFTy+XMYjg91PBhd3KmVZZfPYrCInAC1oDpuqCG72tGjiuPfMsl7amxc8u6dVhEnV+kmXT91RrtHyFkMM+XxD84PV1iHA7dibweKPuURzfV2RT5ZKfgP5zS3hm4k1fKvc139FPLIYJNHu+u+3g4W2IgI2hZz4l4mk/2gfZoOjFS9qSl1LOwg6jqV1E2s65Xp4WzBEI4baRkvEgHuNwbcVFkoWIROvJsp6VCpBN3GzAoWxEBS1xnJUVRu4hWgkv+WkL9nbFiJ17WAna2sG+kLfxaWLZcOsDbC1OH65hbXqYOdsuZdzfFSjUKgRcEnMkLA3zLV2UH/65QZXrgPWcqgkNaizSryYC45Zys3dZdx3Wwudlv8ua8qFzr41GkYTPIJVGMSbPzloFSA9kR55bdLqxjX4A7Ejz9n8aQ2AQPNJ5vD8ST3ox5lkdZemWn3kn+ITHaeEbCuBIc jzuLMR7M X0FNMzCyMl92HWz7Yyz/Q6BteljgwI9AUygndA+nNjFgDA+DDgC3zpqvlqePqwwvNKNfCE1DykVli4wS6vU64yTUKaS9kaHsLituqauEz1ZuAR2tj8t+xUXWe2GFSm/XxQ3UJvskx4ZBOZITxaLuCYzt82eJbXZmQgU6ptb1+U96mc6N9xhK8VXf3symt9ahaX8IGnTUL32N7H/Z6xe+6HVS0b0Phl6tIkyr6Fc5Ikx7seDG9BGTMcZMyeplXwE7O1n9uJNeWPqvEexG6D+6bOpFcE84WIVxqgiLcMCM50vsCbyhVVHbOQTBfb9ThpHrEcwTOaM4DRT19tcf0VG36/7mwOmILBmVh8IQrtCw9mytoqC3QSXxgt1RSXJfOtblXdIPdwijzdVzNt/17PNfGasantcCNTiPnNikCFg5Mx0Rq5Ko/A72cX8xKxG8g0Noj4n5QZYM1dFrq2BV1O9B3HwWBkNxJuUvttdkfV2u0foNo1Gq9cc68Tug8vGzpYd4z2277 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Apr 11, 2024 at 9:17=E2=80=AFPM Ryan Roberts = wrote: > > On 08/04/2024 05:24, Lance Yang wrote: > > The per-pte get_and_clear/modify/set approach would result in > > unfolding/refolding for contpte mappings on arm64. So we need > > to override mkold_clean_ptes() for arm64 to avoid it. > > IIRC, in the last version, I suggested copying the wrprotect_ptes() patte= rn to > correctly iterate over contpte blocks. I meant for you to take it as insp= iration > but looks like you have done a carbon copy, including lots of things that= are > unneeded here. That's my fault for not being clear - sorry! My bad. I must have misunderstood your intention. > > > > > > Suggested-by: David Hildenbrand > > Suggested-by: Barry Song <21cnbao@gmail.com> > > Suggested-by: Ryan Roberts > > Signed-off-by: Lance Yang > > --- > > arch/arm64/include/asm/pgtable.h | 55 ++++++++++++++++++++++++++++++++ > > arch/arm64/mm/contpte.c | 15 +++++++++ > > 2 files changed, 70 insertions(+) > > > > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/= pgtable.h > > index 9fd8613b2db2..395754638a9a 100644 > > --- a/arch/arm64/include/asm/pgtable.h > > +++ b/arch/arm64/include/asm/pgtable.h > > @@ -1223,6 +1223,34 @@ static inline void __wrprotect_ptes(struct mm_st= ruct *mm, unsigned long address, > > __ptep_set_wrprotect(mm, address, ptep); > > } > > > > +static inline void ___ptep_mkold_clean(struct mm_struct *mm, unsigned = long addr, > > + pte_t *ptep, pte_t pte) > > +{ > > + pte_t old_pte; > > + > > + do { > > + old_pte =3D pte; > > + pte =3D pte_mkclean(pte_mkold(pte)); > > + pte_val(pte) =3D cmpxchg_relaxed(&pte_val(*ptep), > > + pte_val(old_pte), pte_val(= pte)); > > + } while (pte_val(pte) !=3D pte_val(old_pte)); > > +} > > Given you are clearing old and dirty, you have nothing to race against, s= o you > shouldn't need the cmpxchg loop here; just a get/modify/set should do? Of= course > if you are setting one or the other, then you need the loop. Got it. > > > + > > +static inline void __ptep_mkold_clean(struct mm_struct *mm, unsigned l= ong addr, > > + pte_t *ptep) > > +{ > > + ___ptep_mkold_clean(mm, addr, ptep, __ptep_get(ptep)); > > +} > > I don't see a need for this intermediate function. > > > + > > +static inline void __mkold_clean_ptes(struct mm_struct *mm, unsigned l= ong addr, > > + pte_t *ptep, unsigned int nr) > > +{ > > + unsigned int i; > > + > > + for (i =3D 0; i < nr; i++, addr +=3D PAGE_SIZE, ptep++) > > It would probably be good to use the for() loop pattern used by the gener= ic > impls here too. Got it. > > > + __ptep_mkold_clean(mm, addr, ptep); > > +} > > + > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > > #define __HAVE_ARCH_PMDP_SET_WRPROTECT > > static inline void pmdp_set_wrprotect(struct mm_struct *mm, > > @@ -1379,6 +1407,8 @@ extern void contpte_wrprotect_ptes(struct mm_stru= ct *mm, unsigned long addr, > > extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma, > > unsigned long addr, pte_t *ptep, > > pte_t entry, int dirty); > > +extern void contpte_mkold_clean_ptes(struct mm_struct *mm, unsigned lo= ng addr, > > + pte_t *ptep, unsigned int nr); > > > > static __always_inline void contpte_try_fold(struct mm_struct *mm, > > unsigned long addr, pte_t *ptep, pte_t pt= e) > > @@ -1603,6 +1633,30 @@ static inline int ptep_set_access_flags(struct v= m_area_struct *vma, > > return contpte_ptep_set_access_flags(vma, addr, ptep, entry, dirt= y); > > } > > > > +#define mkold_clean_ptes mkold_clean_ptes > > +static inline void mkold_clean_ptes(struct mm_struct *mm, unsigned lon= g addr, > > + pte_t *ptep, unsigned int nr) > > +{ > > + if (likely(nr =3D=3D 1)) { > > + /* > > + * Optimization: mkold_clean_ptes() can only be called fo= r present > > + * ptes so we only need to check contig bit as condition = for unfold, > > + * and we can remove the contig bit from the pte we read = to avoid > > + * re-reading. This speeds up madvise(MADV_FREE) which is= sensitive > > + * for order-0 folios. Equivalent to contpte_try_unfold()= . > > + */ > > Is this true? Do you have data that shows the cost? If not, I'd prefer to= avoid > the optimization and do it the more standard way: > > contpte_try_unfold(mm, addr, ptep, __ptep_get(ptep)); > > > + pte_t orig_pte =3D __ptep_get(ptep); > > + > > + if (unlikely(pte_cont(orig_pte))) { > > + __contpte_try_unfold(mm, addr, ptep, orig_pte); > > + orig_pte =3D pte_mknoncont(orig_pte); > > + } > > + ___ptep_mkold_clean(mm, addr, ptep, orig_pte); > > + } else { > > + contpte_mkold_clean_ptes(mm, addr, ptep, nr); > > + } > > ...but I don't think you should ever need to unfold in the first place. E= ven if > it's folded and you are trying to clear access/dirty for a single pte, yo= u can > just clear the whole block. See existing comment in > contpte_ptep_test_and_clear_young(). Thanks for pointing that out. > > So this ends up as something like: > > static inline void clear_young_dirty_ptes(struct mm_struct *mm, > unsigned long addr, pte_t *ptep, unsigned int nr, > bool clear_young, bool clear_dirty) > { > if (likely(nr =3D=3D 1 && !pte_cont(__ptep_get(ptep)))) > clear_young_dirty_ptes(mm, addr, ptep, nr, > clear_young, clear_dirty); > else > contpte_clear_young_dirty_ptes(mm, addr, ptep, nr, > clear_young, clear_dirty); > } Nice. I'll make sure to follow this approach. > > > > +} > > + > > #else /* CONFIG_ARM64_CONTPTE */ > > > > #define ptep_get __ptep_get > > @@ -1622,6 +1676,7 @@ static inline int ptep_set_access_flags(struct vm= _area_struct *vma, > > #define wrprotect_ptes __wrprotect_ptes > > #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS > > #define ptep_set_access_flags __ptep_set_access= _flags > > +#define mkold_clean_ptes __mkold_clean_ptes > > > > #endif /* CONFIG_ARM64_CONTPTE */ > > > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c > > index 1b64b4c3f8bf..dbff9c5e9eff 100644 > > --- a/arch/arm64/mm/contpte.c > > +++ b/arch/arm64/mm/contpte.c > > @@ -361,6 +361,21 @@ void contpte_wrprotect_ptes(struct mm_struct *mm, = unsigned long addr, > > } > > EXPORT_SYMBOL_GPL(contpte_wrprotect_ptes); > > > > +void contpte_mkold_clean_ptes(struct mm_struct *mm, unsigned long addr= , > > + pte_t *ptep, unsigned int nr) > > +{ > > + /* > > + * If clearing the young and dirty bits for an entire contig rang= e, we can > > + * avoid unfolding. Just set old/clean and wait for the later mmu= _gather > > + * flush to invalidate the tlb. If it's a partial range though, w= e need to > > + * unfold. > > + */ > > nit: Please reflow comments like this to 80 cols. > > We can avoid unfolding in all cases. See existing comment in > contpte_ptep_test_and_clear_young(). Suggest something like this (unteste= d): > > void clear_young_dirty_ptes(struct mm_struct *mm, unsigned long addr, > pte_t *ptep, unsigned int nr, > bool clear_young, bool clear_dirty) > { > /* > * We can safely clear access/dirty without needing to unfold fro= m the > * architectures perspective, even when contpte is set. If the ra= nge > * starts or ends midway through a contpte block, we can just exp= and to > * include the full contpte block. While this is not exactly what= the > * core-mm asked for, it tracks access/dirty per folio, not per p= age. > * And since we only create a contpte block when it is covered by= a > * single folio, we can get away with clearing access/dirty for t= he > * whole block. > */ > > unsigned int start =3D addr; > unsigned int end =3D start + nr; > > if (pte_cont(__ptep_get(ptep + nr - 1))) > end =3D ALIGN(end, CONT_PTE_SIZE); > > if (pte_cont(__ptep_get(ptep))) { > start =3D ALIGN_DOWN(start, CONT_PTE_SIZE); > ptep =3D contpte_align_down(ptep); > } > > __clear_young_dirty_ptes(mm, start, ptep, end - start, > clear_young, clear_dirty); > } Nice. Thanks a lot for your help! Thanks, Lance > > Thanks, > Ryan > > > + > > + contpte_try_unfold_partial(mm, addr, ptep, nr); > > + __mkold_clean_ptes(mm, addr, ptep, nr); > > +} > > +EXPORT_SYMBOL_GPL(contpte_mkold_clean_ptes); > > + > > int contpte_ptep_set_access_flags(struct vm_area_struct *vma, > > unsigned long addr, pte_t *ptep, > > pte_t entry, int dirty) >