From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 48D00C48BF6
	for <linux-mm@archiver.kernel.org>; Thu,  7 Mar 2024 14:41:55 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CE2256B018E; Thu,  7 Mar 2024 09:41:54 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C6AE66B0194; Thu,  7 Mar 2024 09:41:54 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id ABE376B0195; Thu,  7 Mar 2024 09:41:54 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14])
	by kanga.kvack.org (Postfix) with ESMTP id 94F436B018E
	for <linux-mm@kvack.org>; Thu,  7 Mar 2024 09:41:54 -0500 (EST)
Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay09.hostedemail.com (Postfix) with ESMTP id 35A5380523
	for <linux-mm@kvack.org>; Thu,  7 Mar 2024 14:41:54 +0000 (UTC)
X-FDA: 81870507348.16.B7AEB50
Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172])
	by imf24.hostedemail.com (Postfix) with ESMTP id 3A2B2180011
	for <linux-mm@kvack.org>; Thu,  7 Mar 2024 14:41:52 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=GhgFc4xI;
	spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=ioworker0@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1709822512;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=4j+RBywVVOs9aYXXDFYhmAM4YOjedOfLhk9nCVzIHhQ=;
	b=WgYa64D6o8nMYjL2fe0VR5kXvYpikC1qEDotu14eBpfT+LRRXYZNWUjWLJLNSGfydxkGk9
	n0jHcqKpHZeDaBR2Sini6W7jVPyWym4EAF+TFgSAXh6P8oAYRZyjuEex45eBYHA8ZkxxLx
	haHBthFTYCuAR4uaAadRl9dlzlZ/faM=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709822512; a=rsa-sha256;
	cv=none;
	b=yuYL06YphQNQmF7cI5KGXMI2zg3PAUF3cK/YU0HthNWXm14Bjktik90GyiNHMWeD1/mSUL
	obaamDcFAkaCLVNvPIy1SEoFtCb5CvzONhDbYvWREScgTFSFjaQ5BuifBRjhmM2HPVyoAd
	W+6J5WsjX+/oG+d4JnPkjhZyCVh4uLM=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=GhgFc4xI;
	spf=pass (imf24.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.172 as permitted sender) smtp.mailfrom=ioworker0@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-dd02fb9a31cso919540276.3
        for <linux-mm@kvack.org>; Thu, 07 Mar 2024 06:41:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1709822511; x=1710427311; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=4j+RBywVVOs9aYXXDFYhmAM4YOjedOfLhk9nCVzIHhQ=;
        b=GhgFc4xIpygNgYf88FynrifJpHGKAyYnbWeuDueX3ZDj80PTdK3M+nn5LCgc+Eq5Bl
         J9ppjC8mA+XFRFlTtH3BH0usccVq0w5z+6EU1euHChADf4vPTEeBCWOG1XYfR0pdS+K2
         7w8RRkYAWRzff8odzpasAIXw+70SHvnogMOQplALfSoI1up1UO7G7o04ACQUjwDZIh6w
         7B/NqJGGgJQHQIRHLBUzl0TgbGr/p2OaVPlom/M8Rr23kjEEhq49lQxASMtGKI91lIaa
         4ggenNz4Nlu5YQ8vhk7S3+ZrV36hc2pPBS4Yh3Ke4apIOyKVa3L4NwtvMN+Aoy4M1lS/
         ZQjA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1709822511; x=1710427311;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=4j+RBywVVOs9aYXXDFYhmAM4YOjedOfLhk9nCVzIHhQ=;
        b=GeLNdr/hvXXRRiFnaOF6EDM8Lilzl8DG5oS29NQLyXH/aCazqDeKivsSSo8O5DKk9R
         uhtUbCQqM9Mlxgfl5vdlnDCB6uKiaHPysOATuTqrtNk8W2lSV5WthBEZrvbPDZQloQfn
         yLnypo3YPx4a1mDIXjEy8QT8ohXek1P/Wd5Li/5p1A+VwXv8iOe/Q03izcWpNrrr3OuS
         yW/Q/43bkyvDzat4hgC2pSTA2itNU7rUnJm22eQY/+V4omd07SHuP7PDwx80rvt0Ed8g
         ISz5C8UhQGOCGqqlHSL2JOnbaFinW8Ofr1sQ077dBLyIq0RGZ7ezbXbLyQB1PdyzhTh2
         JdYg==
X-Forwarded-Encrypted: i=1; AJvYcCU/Len2v0ru7ry+ktN7Vbm64VBmKrC16f4fhjjMVg/YYel47bfulPG+igf0/AzB7vqL8O30Mr6/HbJ2cbnerzk4+eE=
X-Gm-Message-State: AOJu0YwN3Or4GI0HIaK5Z7HNo+Y0BAJRrZHbWEQYWFKLCyThS8+yKdf6
	pP930kHWkhldGiHqi7+5S7ZXwVpwgwMr+EOeCRN3w3wJ5Z8Zz7cxopDwtJaErQiQ2oqIOSMXPhE
	uHzExpU8YKM6VeqxRb5UhiJPiuII=
X-Google-Smtp-Source: AGHT+IEDxvCrrfvxOcu/4yY6zIqt5/u2anHJxDWo8f6Vd6sNy7uSR6/79RHkYg4iN0i4S+Qlxt7fQbIsnqmF0Go75VA=
X-Received: by 2002:a5b:810:0:b0:dcc:eb38:199c with SMTP id
 x16-20020a5b0810000000b00dcceb38199cmr16213392ybp.56.1709822510918; Thu, 07
 Mar 2024 06:41:50 -0800 (PST)
MIME-Version: 1.0
References: <20240307061425.21013-1-ioworker0@gmail.com> <CAGsJ_4xcRvZGdpPh1qcFTnTnDUbwz6WreQ=L_UO+oU2iFm9EPg@mail.gmail.com>
 <CAK1f24k2G_DSEjuqqqPyY0f7+btpYbjfoyMH7btLfP8nkasCTQ@mail.gmail.com>
 <CAGsJ_4xREM-P1mFqeM-s3-cJ9czb6PXwizb-3hOhwaF6+QM5QA@mail.gmail.com>
 <03458c20-5544-411b-9b8d-b4600a9b802f@arm.com> <CAGsJ_4zp1MXTjG=4gBO+J3owg7sHDgDJ8Ut51i1RBSnKnK0BfQ@mail.gmail.com>
 <501c9f77-1459-467a-8619-78e86b46d300@arm.com> <8f84c7d6-982a-4933-a7a7-3f640df64991@redhat.com>
 <e6bc142e-113d-4034-b92c-746b951a27ed@redhat.com> <d24f8553-33f2-4ae7-a06d-badaf9462d84@arm.com>
 <db46212b-000d-4e8e-87d2-90dbf0a6236a@redhat.com>
In-Reply-To: <db46212b-000d-4e8e-87d2-90dbf0a6236a@redhat.com>
From: Lance Yang <ioworker0@gmail.com>
Date: Thu, 7 Mar 2024 22:41:39 +0800
Message-ID: <CAK1f24k8MgR-3sqpqZmg=aTF5Sh4if2o7qeW9zfGpGCSbHR2PA@mail.gmail.com>
Subject: Re: [PATCH v2 1/1] mm/madvise: enhance lazyfreeing with mTHP in madvise_free
To: David Hildenbrand <david@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>, Barry Song <21cnbao@gmail.com>, 
	Vishal Moola <vishal.moola@gmail.com>, akpm@linux-foundation.org, zokeefe@google.com, 
	shy828301@gmail.com, mhocko@suse.com, fengwei.yin@intel.com, 
	xiehuan09@gmail.com, wangkefeng.wang@huawei.com, songmuchun@bytedance.com, 
	peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, 
	linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: 3A2B2180011
X-Rspam-User: 
X-Rspamd-Server: rspam11
X-Stat-Signature: 64foha4keorandgngx66d3nafj5s4q4g
X-HE-Tag: 1709822512-34846
X-HE-Meta: U2FsdGVkX19I6ms8ZTWdrc1G86BaoTsAantWDjT9RL/U7wwygp6stXAkpQ6va4zyPoJYHfIUZGZEXVa7XdqLinJxJxTJp2IKuTVrc+XEpXifl4LqJ40vDroKjltr0/mApIYBbT4UCemZxakK11jv9jJrckkSTnwtz/UdRx7t9V7CMNMHmlUjLoBb1WBEEYusyIg6WruEoZ1GoDUuvw3qGMpOHXSrM2avoW0L4hxCnSzJl4egObaPjhm3OZ8AuK2NeVOiHm175knRRI9QZb7A+Nd0iJJ7kD5gFQUgnBmHT2H+tD1zVNmvAEXLfYQWZUQO4qxBBxQnp0JyrgVJnXRjqVWy2L9g89BA/bgC2dQ1JLYRm2vE8+5PCESNQQC8YgjQmR3BE6aaZTuukst+1RZSbyXTRCY0/ZMwuzUAXFwvjB8P3tqYeoiaOAqTUrPPaJhimbyAt1zuxFSx6TQNfvkfUjy//lARblEGnEcbDNf6jJBSiNClMKYMYpNRWLj0iwewBscil1SOA4uoBrf/qXu63kxy1anYarCGktrnZshMMFXSQGKNLqSQ+PlgBjukFzFRu0WXKvDL5cFgr+W82Y8cb7VeIIogV7NayUmr0x9o6IGlUNndVeUwHhUO5Olfyub1dD6OZ/o97LUeYJ93k1ftXLX4td+4hLmS39ByKHRsLicfqQPFfbmPmUqwSTDJjTNYvhehYfVcUCFqAadeKoSiZ4LUMdY3XlQCRImERSC/gPP7Otwt0Hsx6W3YurC/9FdQzWi1RmMgZzmH484AVc3ayvFMxhhrvtHt4jEquGbUUrzxq30WNpMXBLlPZiccxL0wyIHNjsWCRl0oEtHqRV0jGhcfbPP/4ag+bc3FK9daa8oILUftUd8Z9/gAq/qiX+T8C+f4U18FY6JZyHlsV0GygNo2ntHecquK7y5Lxl/hgf9D6xNjUmjR34pg90nnRA9A4+KChdt8NeutZb25sTv
 oxZLtlLV
 13oiXtTXnCQPETqpdWIrXo+a8KZ7j0asY8qo8435Gy4C1shBD+7r/q1R9uEoWfJNlneMzOjAjgJ044kKwXdjLa5EsI5UsbJD9ga1T7V7jjIkOBaFJzH6dmbLOPEWs5wUYm3kb5/ACU6QGDIoGXikldBmHAKzKscei6cF+OwDbWuBRocY6uEV02lu5kdJUtxP9r8o5SoqEr2C2i3Hvc8FsbHZ7mVDhdIWyuu77gfBP5B4wwdebXytEE8yzq+UIqlR8t+Ywa/wvW1cbCTzH6ZNuvxqKtEyNZFL01E+rVC4qfUjQ7/Q/NfKhRIol5NMmDDOcTcKW7hARXnk5hTNE6zl4eQuPonLBDix3tij71sqxsYvs1ZIAE4dJuaG3+ssFi94LkxpxRb4qk714NEQUbaI5ZvR1ppYnptj4izcOtwtsPQWbvLw3if4WnPCk8iigm3v9S5En8dwbH5nH/To3HwvkGL6STNQS0thlKa0sHbbu7+powHjYUfMuCV6wqe7CT5kCJqCHKRG6jqGSxDKpobGghigqzjdW4BhQ76TvzG85eIKI038gtckR7B6XZFZ/aPkyEZs9icBpAkOsu0a+f3UZE/snpsZCNtlLZUiR0K365pW6JrRgM8ZQ20uWF9RZ5SethnEU
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Hey Barry, Ryan, David,

Thanks a lot for taking the time to explain and provide suggestions!
I really appreciate your time!

IIUC, here's what we need to do for v3:

If folio_likely_mapped_shared() is true, or if we cannot acquire
the folio lock, we simply skip the batched PTEs. Then, we compare
the number of batched PTEs against folio_mapcount(). Finally,
batch-update the access and dirty only.

I'm not sure if I've understood correctly, could you please confirm?

Thanks,
Lance

On Thu, Mar 7, 2024 at 7:17=E2=80=AFPM David Hildenbrand <david@redhat.com>=
 wrote:
>
> On 07.03.24 12:13, Ryan Roberts wrote:
> > On 07/03/2024 10:54, David Hildenbrand wrote:
> >> On 07.03.24 11:54, David Hildenbrand wrote:
> >>> On 07.03.24 11:50, Ryan Roberts wrote:
> >>>> On 07/03/2024 09:33, Barry Song wrote:
> >>>>> On Thu, Mar 7, 2024 at 10:07=E2=80=AFPM Ryan Roberts <ryan.roberts@=
arm.com> wrote:
> >>>>>>
> >>>>>> On 07/03/2024 08:10, Barry Song wrote:
> >>>>>>> On Thu, Mar 7, 2024 at 9:00=E2=80=AFPM Lance Yang <ioworker0@gmai=
l.com> wrote:
> >>>>>>>>
> >>>>>>>> Hey Barry,
> >>>>>>>>
> >>>>>>>> Thanks for taking time to review!
> >>>>>>>>
> >>>>>>>> On Thu, Mar 7, 2024 at 3:00=E2=80=AFPM Barry Song <21cnbao@gmail=
.com> wrote:
> >>>>>>>>>
> >>>>>>>>> On Thu, Mar 7, 2024 at 7:15=E2=80=AFPM Lance Yang <ioworker0@gm=
ail.com> wrote:
> >>>>>>>>>>
> >>>>>>>> [...]
> >>>>>>>>>> +static inline bool can_mark_large_folio_lazyfree(unsigned lon=
g addr,
> >>>>>>>>>> +                                                struct folio =
*folio,
> >>>>>>>>>> pte_t *start_pte)
> >>>>>>>>>> +{
> >>>>>>>>>> +       int nr_pages =3D folio_nr_pages(folio);
> >>>>>>>>>> +       fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIR=
TY;
> >>>>>>>>>> +
> >>>>>>>>>> +       for (int i =3D 0; i < nr_pages; i++)
> >>>>>>>>>> +               if (page_mapcount(folio_page(folio, i)) !=3D 1=
)
> >>>>>>>>>> +                       return false;
> >>>>>>>>>
> >>>>>>>>> we have moved to folio_estimated_sharers though it is not preci=
se, so
> >>>>>>>>> we don't do
> >>>>>>>>> this check with lots of loops and depending on the subpage's ma=
pcount.
> >>>>>>>>
> >>>>>>>> If we don't check the subpage=E2=80=99s mapcount, and there is a=
 cow folio
> >>>>>>>> associated
> >>>>>>>> with this folio and the cow folio has smaller size than this fol=
io,
> >>>>>>>> should we still
> >>>>>>>> mark this folio as lazyfree?
> >>>>>>>
> >>>>>>> I agree, this is true. However, we've somehow accepted the fact t=
hat
> >>>>>>> folio_likely_mapped_shared
> >>>>>>> can result in false negatives or false positives to balance the
> >>>>>>> overhead.  So I really don't know :-)
> >>>>>>>
> >>>>>>> Maybe David and Vishal can give some comments here.
> >>>>>>>
> >>>>>>>>
> >>>>>>>>> BTW, do we need to rebase our work against David's changes[1]?
> >>>>>>>>> [1]
> >>>>>>>>> https://lore.kernel.org/linux-mm/20240227201548.857831-1-david@=
redhat.com/
> >>>>>>>>
> >>>>>>>> Yes, we should rebase our work against David=E2=80=99s changes.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> +
> >>>>>>>>>> +       return nr_pages =3D=3D folio_pte_batch(folio, addr, st=
art_pte,
> >>>>>>>>>> +                                        ptep_get(start_pte), =
nr_pages,
> >>>>>>>>>> flags, NULL);
> >>>>>>>>>> +}
> >>>>>>>>>> +
> >>>>>>>>>>     static int madvise_free_pte_range(pmd_t *pmd, unsigned lon=
g addr,
> >>>>>>>>>>                                    unsigned long end, struct m=
m_walk *walk)
> >>>>>>>>>>
> >>>>>>>>>> @@ -676,11 +690,45 @@ static int madvise_free_pte_range(pmd_t =
*pmd,
> >>>>>>>>>> unsigned long addr,
> >>>>>>>>>>                     */
> >>>>>>>>>>                    if (folio_test_large(folio)) {
> >>>>>>>>>>                            int err;
> >>>>>>>>>> +                       unsigned long next_addr, align;
> >>>>>>>>>>
> >>>>>>>>>> -                       if (folio_estimated_sharers(folio) !=
=3D 1)
> >>>>>>>>>> -                               break;
> >>>>>>>>>> -                       if (!folio_trylock(folio))
> >>>>>>>>>> -                               break;
> >>>>>>>>>> +                       if (folio_estimated_sharers(folio) !=
=3D 1 ||
> >>>>>>>>>> +                           !folio_trylock(folio))
> >>>>>>>>>> +                               goto skip_large_folio;
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I don't think we can skip all the PTEs for nr_pages, as some of=
 them
> >>>>>>>>> might be
> >>>>>>>>> pointing to other folios.
> >>>>>>>>>
> >>>>>>>>> for example, for a large folio with 16PTEs, you do MADV_DONTNEE=
D(15-16),
> >>>>>>>>> and write the memory of PTE15 and PTE16, you get page faults, t=
hus PTE15
> >>>>>>>>> and PTE16 will point to two different small folios. We can only=
 skip
> >>>>>>>>> when we
> >>>>>>>>> are sure nr_pages =3D=3D folio_pte_batch() is sure.
> >>>>>>>>
> >>>>>>>> Agreed. Thanks for pointing that out.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> +
> >>>>>>>>>> +                       align =3D folio_nr_pages(folio) * PAGE=
_SIZE;
> >>>>>>>>>> +                       next_addr =3D ALIGN_DOWN(addr + align,=
 align);
> >>>>>>>>>> +
> >>>>>>>>>> +                       /*
> >>>>>>>>>> +                        * If we mark only the subpages as laz=
yfree, or
> >>>>>>>>>> +                        * cannot mark the entire large folio =
as lazyfree,
> >>>>>>>>>> +                        * then just split it.
> >>>>>>>>>> +                        */
> >>>>>>>>>> +                       if (next_addr > end || next_addr - add=
r !=3D
> >>>>>>>>>> align ||
> >>>>>>>>>> +                           !can_mark_large_folio_lazyfree(add=
r, folio,
> >>>>>>>>>> pte))
> >>>>>>>>>> +                               goto split_large_folio;
> >>>>>>>>>> +
> >>>>>>>>>> +                       /*
> >>>>>>>>>> +                        * Avoid unnecessary folio splitting i=
f the large
> >>>>>>>>>> +                        * folio is entirely within the given =
range.
> >>>>>>>>>> +                        */
> >>>>>>>>>> +                       folio_clear_dirty(folio);
> >>>>>>>>>> +                       folio_unlock(folio);
> >>>>>>>>>> +                       for (; addr !=3D next_addr; pte++, add=
r +=3D
> >>>>>>>>>> PAGE_SIZE) {
> >>>>>>>>>> +                               ptent =3D ptep_get(pte);
> >>>>>>>>>> +                               if (pte_young(ptent) ||
> >>>>>>>>>> pte_dirty(ptent)) {
> >>>>>>>>>> +                                       ptent =3D ptep_get_and=
_clear_full(
> >>>>>>>>>> +                                               mm, addr, pte,
> >>>>>>>>>> tlb->fullmm);
> >>>>>>>>>> +                                       ptent =3D pte_mkold(pt=
ent);
> >>>>>>>>>> +                                       ptent =3D pte_mkclean(=
ptent);
> >>>>>>>>>> +                                       set_pte_at(mm, addr, p=
te, ptent);
> >>>>>>>>>> +                                       tlb_remove_tlb_entry(t=
lb, pte,
> >>>>>>>>>> addr);
> >>>>>>>>>> +                               }
> >>>>>>>>>
> >>>>>>>>> Can we do this in batches? for a CONT-PTE mapped large folio, y=
ou are
> >>>>>>>>> unfolding
> >>>>>>>>> and folding again. It seems quite expensive.
> >>>>>>
> >>>>>> I'm not convinced we should be doing this in batches. We want the =
initial
> >>>>>> folio_pte_batch() to be as loose as possible regarding permissions=
 so that we
> >>>>>> reduce our chances of splitting folios to the min. (e.g. ignore SW=
 bits like
> >>>>>> soft dirty, etc). I think it might be possible that some PTEs are =
RO and other
> >>>>>> RW too (e.g. due to cow - although with the current cow impl, prob=
ably not.
> >>>>>> But
> >>>>>> its fragile to assume that). Anyway, if we do an initial batch tha=
t ignores
> >>>>>> all
> >>>>>
> >>>>> You are correct. I believe this scenario could indeed occur. For in=
stance,
> >>>>> if process A forks process B and then unmaps itself, leaving B as t=
he
> >>>>> sole process owning the large folio.  The current wp_page_reuse() f=
unction
> >>>>> will reuse PTE one by one while the specific subpage is written.
> >>>>
> >>>> Hmm - I thought it would only reuse if the total mapcount for the fo=
lio was 1.
> >>>> And since it is a large folio with each page mapped once in proc B, =
I thought
> >>>> every subpage write would cause a copy except the last one? I haven'=
t looked at
> >>>> the code for a while. But I had it in my head that this is an area w=
e need to
> >>>> improve for mTHP.
> >>>
> >>> wp_page_reuse() will currently reuse a PTE part of a large folio only=
 if
> >>> a single PTE remains mapped (refcount =3D=3D 0).
> >>
> >> ^ =3D=3D 1
> >
> > Ahh yes. That's what I meant. I got the behacviour vagulely right thoug=
h.
> >
> > Anyway, regardless, I'm not sure we want to batch here. Or if we do, we=
 want to
> > batch function that will only clear access and dirty.
>
> We likely want to detect a folio batch the "usual" way (as relaxed as
> possible), then do all the checks (#pte =3D=3D folio_mapcount() under fol=
io
> lock), and finally batch-update the access and dirty only.
>
> --
> Cheers,
>
> David / dhildenb
>