From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55393C54798 for ; Tue, 27 Feb 2024 10:56:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D7434280002; Tue, 27 Feb 2024 05:56:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2316940008; Tue, 27 Feb 2024 05:56:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC44D280002; Tue, 27 Feb 2024 05:56:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id ADAA7940008 for ; Tue, 27 Feb 2024 05:56:07 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 85E6A1A0B81 for ; Tue, 27 Feb 2024 10:56:07 +0000 (UTC) X-FDA: 81837279174.09.E0F4F5B Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) by imf18.hostedemail.com (Postfix) with ESMTP id AB5F01C0005 for ; Tue, 27 Feb 2024 10:56:05 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R48ikrBH; spf=pass (imf18.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709031365; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ED0zzN2J7aQW/TBXYhdK+COJTUND1pmDy7G6gypl/WU=; b=FDiOZJugJxPOHF7DEqhAtnaW31vSzn7RDQ7dsosze7SkcThfolNIAMltYKI3sQAEwG+xyL DocoYGUvZJlaQ4AaMnbYV14UZrYvqsGQjwKzxUI44zYL8BaXTJueprFLlWsapmH4hC/2f7 kdbS/+LO1te/RTRYIdIfnvNllGn5jJs= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=R48ikrBH; spf=pass (imf18.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.171 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709031365; a=rsa-sha256; cv=none; b=o/BCNNAvV40ngLuwA1glHy1nqOc7T9NGBkfDbIV/erAK91kQHou4kPbM4pvwVcRMtEpCJw rSf7+IORoHnX20sug1Ky/QjT/VcBAgojael1f+qvLbnu7VZhiRWKJ5EzPFhNXSI7fZi1Fv US9J6nqcgdNpSp7USuqVLoIpPwixU5g= Received: by mail-yb1-f171.google.com with SMTP id 3f1490d57ef6-dc236729a2bso3787313276.0 for ; Tue, 27 Feb 2024 02:56:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709031365; x=1709636165; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ED0zzN2J7aQW/TBXYhdK+COJTUND1pmDy7G6gypl/WU=; b=R48ikrBHbnG2fXlvt4yu6Cwwdl0sfhtbzFXCb+eyObWOZVkJBjqKAVucwck0P18oaL tD1GCc2DwJuKUo52ODozIg20Pklm2cVWSVFh+biqfH0+wsQ9DN4/tx9rbIqwIM1aMbaO swfxA4SkHKRdrmsku3s7DCEHwdc+uPrqgNs6k5KxhjrgHVcNiv/BbJjAzchXgIjFAE1n C0Y8OPvlFuqbbVE1AdafpeI7I/IoXw9mFYrmLsdePQ5NShSxJoJdUW5GLMf/KYWyGoBB LFEpPCISjRGRQOk/QWcF78Yjhq5X3r0BUmE3N7OAJEpNcI6BoF/WPD+QxhOnT7cQ0d01 esvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709031365; x=1709636165; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ED0zzN2J7aQW/TBXYhdK+COJTUND1pmDy7G6gypl/WU=; b=gAMsUP1pXwPTbR94uYa3mDq6zSOERW5PWNAum2swkhb490AQkG9lkkftniAeefBYQD TjtgV077Up1uVffaf8dClrz8Du/3EU323FZhyKKq4+Sp1pfnyWPacq6K3l2g9ujmeZpp jasFx2DOppGWd5RCHQGRiH6RSPvbyH3Htoa+mliBA7R6ry+IppZhQ4Z8lub1GFjwWjlH QfwaIZ+IOx2KuzYaNx36fYgSECYFHd2LvpkfVMxflX8vca4MhINLQZyrrRHTXFBH6cBm wWoUc5AHt9lbDpcNt9LgL7Cd0CpQ5+d17mD/WXXTNkQzX35qet5QTdiZVPWIRuCfA2jC r3Pg== X-Forwarded-Encrypted: i=1; AJvYcCUWIhKk3/NJw4Bjgt1Pb9ss6InwbA0Hj21noqC9FVTP8oq8cYnBttb85RA/S2uWL63/POIEghcckb527jprx+f+OOY= X-Gm-Message-State: AOJu0Yw++fDeocvjGBY+udztGtLYl4WjYSFY6+SbacLX9ZP/jRXFIVaX V1vPsTjCfJti3akBwCGXJnb5k1tfSzpzEqnpEhZJADU8g99HTz6hxERreaeaxEIf6YeCXCuD2ne 2L8IpEjW9IixvGQX3dPo00k+ZY6s= X-Google-Smtp-Source: AGHT+IG4KxVvRfi/gCW6H51rEZV6s8STfDt6xGv0klsD2sqx+jAHu0OLXuGKzZtIVphEcnqBLRPUUpLRbwLJxpULbnc= X-Received: by 2002:a25:a128:0:b0:dcc:e854:d453 with SMTP id z37-20020a25a128000000b00dcce854d453mr1901576ybh.1.1709031364751; Tue, 27 Feb 2024 02:56:04 -0800 (PST) MIME-Version: 1.0 References: <20240227024050.244567-1-21cnbao@gmail.com> <61b9dfc9-5522-44fd-89a4-140833ede8af@arm.com> In-Reply-To: From: Lance Yang Date: Tue, 27 Feb 2024 18:55:53 +0800 Message-ID: Subject: Re: [PATCH] mm: export folio_pte_batch as a couple of modules might need it To: Ryan Roberts Cc: David Hildenbrand , Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Yin Fengwei Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: AB5F01C0005 X-Rspam-User: X-Stat-Signature: nyq4dwnm4yjk1871s7cpxcyxed3x6kmp X-Rspamd-Server: rspam01 X-HE-Tag: 1709031365-344238 X-HE-Meta: U2FsdGVkX18k4t/xZgyUXF51hmk3zQ7/5hy2GG0r0/bFWlDanZbBYDeMLkA64ul5AD/RYPfcGrylzRjI2cVbGSM0B9xkWqalVUpfWxXMV5wyUYJq4ygL/CR2ll9qVMXIake6+8jN014qjC7sGtUXi5rBdVzxUWofZa/rv4xp9Ra3T9cwFjSUqDPCcN22y2kFeh5RfLV6dw1KRMEnnGgjyZKy17b90ovqvrrfpINHz6JmroSGUdfj07Xml6TrOaZ442+s+DjnWwmdjMJLO0OmirctQTTSxNRZRjfR05RrK8yOroGKGOekrlYSJwsj21iFEo3nN9/TNHPPvipiWb2cuk8L54HAfMDNV0vVLAKm3E+bCaOIVLhG/+fGQpJe+9o/GSeFdYGYC4NJN6Iw/qBOjJd0xm+A7ks2XCG60akzxEEolhsrbmb0Jqvgng1GEV0URL/E7ALYrdZKfPV+THld/g99zTi1dRa4jh+NT0MRAdV8x9HmUlDznc2hYqGplgG6tEHTL8Xibiw1dZUpE+yVuuPBcWwZ7Fzo8qEUhehIgkqd5K4Md7TDQpfhxc167kWjHg642/nX+X9R+UuJe/VVYZrB2lBqlSsaSyGnQHNL0oXeTGUiqHpESnl8v1kIJCKbimIrm+mxWFY6QpFut+wbRpwgjlfG6idsLBX0NhjekpA98U5tWnKQ+nYFzaMdFJqThhqAVkBNGRCaEDcJeBTTKFwCBHpXEet2iut9kGsFGk/+OUehVpyP/c4Thh3ECM0J86cVB26QZZiRhCBbwuMiVCjY8/NJ7ar7uJDQ6y4LWysXeSn+OqZDPNUliRxbXXOsH+v1uCy8r4AXLmDRPjkJDd5RVAgeTIFzKe9uFlG0jfcGI996N/MQkPNvSsyPh1rudbYXXcrRG/clqLlfHfxkY/AHeSQXMocpSBmAm5/FJZh8Puqx3d5Jxrlsy8vfjC+RnxsJAVFStX0OxSEm/WG d5fBTs/R a06IxLXkL0NBO+jwa3JF82suG8vVIjdmWya6++RB1wZ5CpwqDstQrh3eqf7RUw9t2HHi9eCeGHXlcLy+xsYMFtMve++59vsyWVPAoXpCThqDyZpEbfgfmaIMGtGCj/ctpMOStdkuYeQqnYFyErfCoszZFRYReqYj4ZtoRr8NEuthco0MZXyjYEHRbfnr9gqCCUg4LQAfMnP0KdwmUKIWJqKrqtrmrMfCKzIJz5GlmUEzIF5GNR4dWUcvXbJytZC6lJmvpwLOMPiZz53ojGXmisJF1H2YPlK0xY8FMjbcv2NR3R3+ZyD4vZvbQF9f5JdAAVvAvPRIvjP4D1I/NJjg3PSv25IJlPzq7MYGas7LzTLIOXaOnrUffhNMGWePfT3S+NfgefgkH88eZlmZZ8H3Zp3p8ot6oVVg/Yxi75fVOgbshoi1WJennET7H0p/rxmtcZkilToJwSgya28XlAebMUJ1QsOuVPVzt4la55kP1+8ljJCuGAPb4yAGVSVUb+mMxyONNgeGUKsuAgvxQplwRqO1fNpMko1f3sU/4ns3zy7sK4mjBxUZxrGgGCis+NI7WKCLkzsmX/hqD0KQBUuvJNj1sMmV3nJTVixPzrkoKpmg6JXU= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Thanks, Ryan, Barry, David! Best, Lance On Tue, Feb 27, 2024 at 6:53=E2=80=AFPM Ryan Roberts = wrote: > > On 27/02/2024 10:30, David Hildenbrand wrote: > > On 27.02.24 11:21, Lance Yang wrote: > >> On Tue, Feb 27, 2024 at 5:14=E2=80=AFPM David Hildenbrand wrote: > >>> > >>> On 27.02.24 10:07, Ryan Roberts wrote: > >>>> On 27/02/2024 02:40, Barry Song wrote: > >>>>> From: Barry Song > >>>>> > >>>>> madvise and some others might need folio_pte_batch to check if a ra= nge > >>>>> of PTEs are completely mapped to a large folio with contiguous phys= cial > >>>>> addresses. Let's export it for others to use. > >>>>> > >>>>> Cc: Lance Yang > >>>>> Cc: Ryan Roberts > >>>>> Cc: David Hildenbrand > >>>>> Cc: Yin Fengwei > >>>>> Signed-off-by: Barry Song > >>>>> --- > >>>>> -v1: > >>>>> at least two jobs madv_free and madv_pageout depend on it. To av= oid > >>>>> conflicts and dependencies, after discussing with Lance, we pref= er > >>>>> this one can land earlier. > >>>> > >>>> I think this will also ultimately be useful for mprotect too, though= I haven't > >>>> looked at it properly yet. > >>>> > >>> > >>> Yes, I think we briefly discussed that. > >>> > >>>>> > >>>>> mm/internal.h | 13 +++++++++++++ > >>>>> mm/memory.c | 11 +---------- > >>>>> 2 files changed, 14 insertions(+), 10 deletions(-) > >>>>> > >>>>> diff --git a/mm/internal.h b/mm/internal.h > >>>>> index 13b59d384845..8e2bc304f671 100644 > >>>>> --- a/mm/internal.h > >>>>> +++ b/mm/internal.h > >>>>> @@ -83,6 +83,19 @@ static inline void *folio_raw_mapping(struct fol= io *folio) > >>>>> return (void *)(mapping & ~PAGE_MAPPING_FLAGS); > >>>>> } > >>>>> > >>>>> +/* Flags for folio_pte_batch(). */ > >>>>> +typedef int __bitwise fpb_t; > >>>>> + > >>>>> +/* Compare PTEs after pte_mkclean(), ignoring the dirty bit. */ > >>>>> +#define FPB_IGNORE_DIRTY ((__force fpb_t)BIT(0)) > >>>>> + > >>>>> +/* Compare PTEs after pte_clear_soft_dirty(), ignoring the soft-di= rty bit. */ > >>>>> +#define FPB_IGNORE_SOFT_DIRTY ((__force fpb_t)BIT(1)= ) > >>>>> + > >>>>> +extern int folio_pte_batch(struct folio *folio, unsigned long addr= , > >>>>> + pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, > >>>>> + bool *any_writable); > >>>>> + > >>>>> void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *fo= lio, > >>>>> int nr_throttled); > >>>>> static inline void acct_reclaim_writeback(struct folio *folio) > >>>>> diff --git a/mm/memory.c b/mm/memory.c > >>>>> index 1c45b6a42a1b..319b3be05e75 100644 > >>>>> --- a/mm/memory.c > >>>>> +++ b/mm/memory.c > >>>>> @@ -953,15 +953,6 @@ static __always_inline void __copy_present_pte= s(struct > >>>>> vm_area_struct *dst_vma, > >>>>> set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); > >>>>> } > >>>>> > >>>>> -/* Flags for folio_pte_batch(). */ > >>>>> -typedef int __bitwise fpb_t; > >>>>> - > >>>>> -/* Compare PTEs after pte_mkclean(), ignoring the dirty bit. */ > >>>>> -#define FPB_IGNORE_DIRTY ((__force fpb_t)BIT(0)) > >>>>> - > >>>>> -/* Compare PTEs after pte_clear_soft_dirty(), ignoring the soft-di= rty bit. */ > >>>>> -#define FPB_IGNORE_SOFT_DIRTY ((__force fpb_t)BIT(1)= ) > >>>>> - > >>>>> static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t f= lags) > >>>>> { > >>>>> if (flags & FPB_IGNORE_DIRTY) > >>>>> @@ -982,7 +973,7 @@ static inline pte_t __pte_batch_clear_ignored(p= te_t > >>>>> pte, fpb_t flags) > >>>>> * If "any_writable" is set, it will indicate if any other PTE b= esides the > >>>>> * first (given) PTE is writable. > >>>>> */ > >>>> > >>>> David was talking in Lance's patch thread, about improving the docs = for this > >>>> function now that its exported. Might be worth syncing on that. > >>> > >>> Here is my take: > >>> > >>> Signed-off-by: David Hildenbrand > >>> --- > >>> mm/memory.c | 22 ++++++++++++++++++---- > >>> 1 file changed, 18 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index d0b855a1837a8..098356b8805ae 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -971,16 +971,28 @@ static inline pte_t __pte_batch_clear_ignored(p= te_t > >>> pte, fpb_t flags) > >>> return pte_wrprotect(pte_mkold(pte)); > >>> } > >>> > >>> -/* > >>> +/** > >>> + * folio_pte_batch - detect a PTE batch for a large folio > >>> + * @folio: The large folio to detect a PTE batch for. > >>> + * @addr: The user virtual address the first page is mapped at. > >>> + * @start_ptep: Page table pointer for the first entry. > >>> + * @pte: Page table entry for the first page. > >> > >> Nit: > >> > >> - * @pte: Page table entry for the first page. > >> + * @pte: Page table entry for the first page that must be the first s= ubpage of > >> + * the folio excluding arm64 for now. > >> > >> IIUC, pte_batch_hint is always 1 excluding arm64 for now. > >> I'm not sure if this modification will be helpful? > > > > IIRC, Ryan made sure that this also works when passing another subpage,= after > > when cont-pte is set. Otherwise this would already be broken for fork/z= ap. > > > > So I don't think this comment would actually be correct. > > Indeed, the spec for the function is exactly the same for arm64 as for ot= her > arches. It's just that arm64 can accelerate the implementation by skippin= g > forward to the next contpte boundary when the current pte is part of a co= ntpte > block. > > There is no requirement for pte (or addr or start_ptep) to point to the f= irst > subpage of a folio - they can point to any subpage. > > pte, addr and start_ptep must all refer to the same entry, but I think th= at's > clear from the existing text. > >