From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id A9F8BD0E6D8
	for <linux-mm@archiver.kernel.org>; Mon, 21 Oct 2024 09:42:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3426B6B007B; Mon, 21 Oct 2024 05:42:28 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 2F2766B0082; Mon, 21 Oct 2024 05:42:28 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 193DD6B0083; Mon, 21 Oct 2024 05:42:28 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11])
	by kanga.kvack.org (Postfix) with ESMTP id ED66D6B007B
	for <linux-mm@kvack.org>; Mon, 21 Oct 2024 05:42:27 -0400 (EDT)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 67E511C4219
	for <linux-mm@kvack.org>; Mon, 21 Oct 2024 09:42:10 +0000 (UTC)
X-FDA: 82697118210.20.B267C67
Received: from mail-vs1-f42.google.com (mail-vs1-f42.google.com [209.85.217.42])
	by imf26.hostedemail.com (Postfix) with ESMTP id E2F1614000F
	for <linux-mm@kvack.org>; Mon, 21 Oct 2024 09:42:14 +0000 (UTC)
Authentication-Results: imf26.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=dc5FkDiT;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf26.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729503635; a=rsa-sha256;
	cv=none;
	b=CYkwSTzPm+QQXyS3K2otfJd/rcImRsROMpk2ZyAhFCVC6AKBK63Bduz2ZJE2YjTcxYHtKI
	s1c071ddp93Rw8chDiX4acq0CJB3QLa43IXjqfdmu4UsbRCWocVmQ1VrSuDePrDfTDe4eU
	apo10W7Dt5hshhrhqlqaMnRWLmPupdc=
ARC-Authentication-Results: i=1;
	imf26.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=dc5FkDiT;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (imf26.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1729503635;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=GKWQ+PiNxEr+3J2vDFkOhXSbXDVfWjoiMlO+y8Dj1m0=;
	b=itYSivoak26rAPW0B536I7VvKjTlZJFZ+pjbrDsc+x6/dXNohRvwnWtsyV6454tH0RjG13
	/TP6MD8oKnXXDF/Xkpo7yJdMuekvQdEnzF/R5KmNysyZvaf1cg7bSkya6N0AtFFTFMNCki
	PfsCv/2jjSZSC5xTYZNS7K02JFTaNZc=
Received: by mail-vs1-f42.google.com with SMTP id ada2fe7eead31-4a48477b25eso3056046137.0
        for <linux-mm@kvack.org>; Mon, 21 Oct 2024 02:42:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1729503745; x=1730108545; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=GKWQ+PiNxEr+3J2vDFkOhXSbXDVfWjoiMlO+y8Dj1m0=;
        b=dc5FkDiTJhwzAdEpJU9wCfXxNLEavXZBa3kaeL763wDsyeBDiE4L7T69viDdpblu3m
         6YZ12w/LIsiNBq18VWjgTFk7gMS0LGpBz84GhTV5bOl6kLdF3Q4CWcNdhwmGY+UbAi7K
         MCWfzwKss73KiNkFsBwMJUmrOKvG/erZpuAdJFUhmyB3OrnBfLSt18gNCMefbnpirGo8
         n0KyRHlAYQuJI6h1GE7IHtverWSOErQF/3uikhzWMGdRsyYk8cCz/L7Mtx6HQS3O1CPD
         QJK8Bq/h2sXRPBGpGHafVSfj5kBofoT9CSqQHBUudZD+p0QTOEluoaRENYdYJN6u93M0
         DlNQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1729503745; x=1730108545;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=GKWQ+PiNxEr+3J2vDFkOhXSbXDVfWjoiMlO+y8Dj1m0=;
        b=CpBeqeM7yVfizsMMBmsk/MwyDsvWlE9JMi7W5LFFJvTNkVUoL1UmUSlLUseuooUQSR
         78VKIy4KHbdbs/GHB+IhJwYVJnsOOrCkFbE0x8oRhpNiVUcmmWC8p8enE5tfk993kkdU
         C8WIUx0hz7j0PIS+sNMvOI3nCCF7XNw/pHJ5+cNDrwDNeQYjwpYpaS/LHm6M5AKYMPzA
         chFE8kWc9UbqAtBLwjU+V8L5Kl5mVFNbWKDN2zGPlKlslbOL05ED2vDTQMo4hX+IDTJL
         AcjCrXr1c2WWQYikbyOSzxL3c/LBvfBa6ALkzCBWsJ9IhePEDDYRQ8GB2RRsCIjlGCOS
         mx7A==
X-Forwarded-Encrypted: i=1; AJvYcCXk9U6sRxEoZBbAOf49V0MBLQYLXkEG6kCPn+JAqmqsicE9vu7M9je/LwWTDz2aoEu2x8TvY0FIZA==@kvack.org
X-Gm-Message-State: AOJu0YwbfwFijBfl0X/T/vCOJhknWa26502pndgCBlOl3on7SCVx7Y0I
	/eL9ogRwzpJCRODotUzJ2O8CE4WaklaAiCj9B+nzZ6d/sWRkVOllbgYrMLLd5q7I+GCQKuGWUJv
	5Yh8T0y9Tok114seVYEo2T11E5Sw=
X-Google-Smtp-Source: AGHT+IH5bvFdKhaa3Mfg5e8pWptQK15nfHtN0KXRjq1x037cl+QupNMTsv4bTid/1Hw+JF7dCSiO01RFSWIhmKGnq1Y=
X-Received: by 2002:a05:6102:38d2:b0:4a4:92f7:3611 with SMTP id
 ada2fe7eead31-4a5d6f01da5mr4919910137.12.1729503744835; Mon, 21 Oct 2024
 02:42:24 -0700 (PDT)
MIME-Version: 1.0
References: <20241010081802.290893-1-chenridong@huaweicloud.com>
 <c3f2c5e2-4804-46e8-86ff-1f6a79ea9a7c@huawei.com> <CAGsJ_4zR+=80S_Fz1ZV3iZxjVKUE3-f32w7W1smuAgZM=HrPRQ@mail.gmail.com>
 <62bd2564-76fa-4cb0-9c08-9eb2f96771b6@huawei.com> <CAGsJ_4x=nqKFMqDmfmvXVAhQNTo1Fx-aQ2MoSKSGQrSCccqr3Q@mail.gmail.com>
 <28b3eae5-92e7-471f-9883-d03684e06d1b@huaweicloud.com>
In-Reply-To: <28b3eae5-92e7-471f-9883-d03684e06d1b@huaweicloud.com>
From: Barry Song <21cnbao@gmail.com>
Date: Mon, 21 Oct 2024 22:42:13 +1300
Message-ID: <CAGsJ_4yx8Z2w=FbBCUHtDa-=jDVDcdsBAHu26-LNeuOuquoOmg@mail.gmail.com>
Subject: Re: [PATCH v3] mm/vmscan: stop the loop if enough pages have been page_out
To: Chen Ridong <chenridong@huaweicloud.com>
Cc: chenridong <chenridong@huawei.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, 
	akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, 
	wangweiyang2@huawei.com, Michal Hocko <mhocko@suse.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, Yosry Ahmed <yosryahmed@google.com>, Yu Zhao <yuzhao@google.com>, 
	David Hildenbrand <david@redhat.com>, Matthew Wilcox <willy@infradead.org>, 
	Ryan Roberts <ryan.roberts@arm.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Queue-Id: E2F1614000F
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Stat-Signature: 93jnqbwukfa481dh44se7omubgutynhf
X-HE-Tag: 1729503734-501854
X-HE-Meta: U2FsdGVkX19S/SYdxncg8SuVklYo67gArrFXRGWwh1vV8ZzZLYQZMUHPtNhJBcUx1FUP2fmXvuUjqGRUvyxbxu5rRKkV57fbGDezU+nKoUL7IHqP+4zDsiLLaTLR1q0t8rqIvuaC8AvG2Zffi+aHkDH2VUF/zFdy3lVARNCQQ/JR3IUuT3s5P9ioeIgXwwYcNhvA35YeTLrHCWnzp3x9nXwK0a6nZdz/QDaImQw26YAyLkgrMoW2ax5Jc15zxbQnom5pmJt2fz4XWMIHNKI78kOgQxXJeGqH0eyYudztna5KTw0+v5+Fp7RwMYVzisOZqvWGE398pxrBen1PjZd6OOB+JMEnROTEmcsBXowdNFRo/nEwFllKSpYC3f8RJOOWSs77ZIn4W+zv/4ApBQlYy59ajKyXiH006tPu7fXZhR6dJ/EKdEJrjdAvGH2c4LOyqxpHbUwxeJin59eQ3MoKbLqGen75qEt3355aCDUVIPwD3C3O6g8GBKwX1KM7wZxupfhcHaHWUEUWvcliIqexqe8X+psDGIG2Q7qJeTcc9kbVY6VwFq/0jP2hXGk09hyIVNqOp6noRUdlNBk1p3IEkuhs9uT4pzKkw0AHVPqD/fF4lkU9nriAcA0/UucfMw5P6ZH1H5kKuSw5ht76P/MhpCjhFsDFrpuSp1UQA38xgW8mI+E5kBu6z1z5v0aEfB9Sj/mbYmIMd2DIrK/JZZo0g9lrpbXdyiGJpKL5QsxGARH9HOKQTLWds2iw5zoDRhfzhhuUvVkLCx3p93v1fF60rjfoPf7Sd/2s4HlAivQF8AcYDrK7s1KdPVFWkunXbzPqJFAwN6OdXRlV+lLAvxMa8YDKW+CWfSeF4iR2n7ldNuwe6x1zpAsDTv2q/YY9VKx75qkqdMO8BCW9CS5PQSSedSSATgTvKdTaPKgXgrd4XfSQm87utzswfmkkWfEof5zuS4o/0mG69mqhT1G/sn+
 sfCWlqpa
 LL9Q50ZpWq1R5Q/EaxsqsJpEMPaveytPs8GgzlcaTAQGtd2Cimq7zGnVEJJd2IL0ClPp4peC032Paa61pVQcyPBu/wHRJ++1qP8q+lUOJ5cWUSXkIdBfU9dEnbo9FB3Y5L4kfe1V4pCuSAG5dHaYozTDsl8e86RahU8j0WI7P6KtYBqvKx95L7V81I8Zs57H/0NFor3nZV5u9EPy2amGuXrc6Bzd2ZOZBU1O3MqUkvt4a2u4j6W7/hRWVufyNoenAboDFoR5vn9JeehgaN3MRzuXFrlGjs2nkaJrb+jZuo1IFOVk9ggvWNLA6i11dKYjLpMVjJuT6HZKAL0okTb5PwPfSyMLDNcKKALKzZvF0oH0lqEJft1SM78O+irana1FtesVgMagUeZ1grObpyEzd9Ub/CJoZY+tfx5HC
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Mon, Oct 21, 2024 at 9:14=E2=80=AFPM Chen Ridong <chenridong@huaweicloud=
.com> wrote:
>
>
>
> On 2024/10/21 12:44, Barry Song wrote:
> > On Fri, Oct 11, 2024 at 7:49=E2=80=AFPM chenridong <chenridong@huawei.c=
om> wrote:
> >>
> >>
> >>
> >> On 2024/10/11 0:17, Barry Song wrote:
> >>> On Thu, Oct 10, 2024 at 4:59=E2=80=AFPM Kefeng Wang <wangkefeng.wang@=
huawei.com> wrote:
> >>>>
> >>>> Hi Ridong,
> >>>>
> >>>> This should be the first version for upstream, and the issue only
> >>>> occurred when large folio is spited.
> >>>>
> >>>> Adding more CCs to see if there's more feedback.
> >>>>
> >>>>
> >>>> On 2024/10/10 16:18, Chen Ridong wrote:
> >>>>> From: Chen Ridong <chenridong@huawei.com>
> >>>>>
> >>>>> An issue was found with the following testing step:
> >>>>> 1. Compile with CONFIG_TRANSPARENT_HUGEPAGE=3Dy
> >>>>> 2. Mount memcg v1, and create memcg named test_memcg and set
> >>>>>      usage_in_bytes=3D2.1G, memsw.usage_in_bytes=3D3G.
> >>>>> 3. Create a 1G swap file, and allocate 2.2G anon memory in test_mem=
cg.
> >>>>>
> >>>>> It was found that:
> >>>>>
> >>>>> cat memory.usage_in_bytes
> >>>>> 2144940032
> >>>>> cat memory.memsw.usage_in_bytes
> >>>>> 2255056896
> >>>>>
> >>>>> free -h
> >>>>>                 total        used        free
> >>>>> Mem:           31Gi       2.1Gi        27Gi
> >>>>> Swap:         1.0Gi       618Mi       405Mi
> >>>>>
> >>>>> As shown above, the test_memcg used about 100M swap, but 600M+ swap=
 memory
> >>>>> was used, which means that 500M may be wasted because other memcgs =
can not
> >>>>> use these swap memory.
> >>>>>
> >>>>> It can be explained as follows:
> >>>>> 1. When entering shrink_inactive_list, it isolates folios from lru =
from
> >>>>>      tail to head. If it just takes folioN from lru(make it simple)=
.
> >>>>>
> >>>>>      inactive lru: folio1<->folio2<->folio3...<->folioN-1
> >>>>>      isolated list: folioN
> >>>>>
> >>>>> 2. In shrink_page_list function, if folioN is THP, it may be splite=
d and
> >>>>>      added to swap cache folio by folio. After adding to swap cache=
, it will
> >>>>>      submit io to writeback folio to swap, which is asynchronous.
> >>>>>      When shrink_page_list is finished, the isolated folios list wi=
ll be
> >>>>>      moved back to the head of inactive lru. The inactive lru may j=
ust look
> >>>>>      like this, with 512 filioes have been move to the head of inac=
tive lru.
> >>>>>
> >>>>>      folioN512<->folioN511<->...filioN1<->folio1<->folio2...<->foli=
oN-1
> >>>>>
> >>>>> 3. When folio writeback io is completed, the folio may be rotated t=
o tail
> >>>>>      of lru. The following lru list is expected, with those filioes=
 that have
> >>>>>      been added to swap cache are rotated to tail of lru. So those =
folios
> >>>>>      can be reclaimed as soon as possible.
> >>>>>
> >>>>>      folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->fo=
lioN512
> >>>>>
> >>>>> 4. However, shrink_page_list and folio writeback are asynchronous. =
If THP
> >>>>>      is splited, shrink_page_list loops at least 512 times, which m=
eans that
> >>>>>      shrink_page_list is not completed but some folios writeback ha=
ve been
> >>>>>      completed, and this may lead to failure to rotate these folios=
 to the
> >>>>>      tail of lru. The lru may look likes as below:
> >>>
> >>> I assume you=E2=80=99re referring to PMD-mapped THP, but your code al=
so modifies
> >>> mTHP, which might not be that large. For instance, it could be a 16KB=
 mTHP.
> >>>
> >>>>>
> >>>>>      folioN50<->folioN49<->...filioN1<->folio1<->folio2...<->folioN=
-1<->
> >>>>>      folioN51<->folioN52<->...folioN511<->folioN512
> >>>>>
> >>>>>      Although those folios (N1-N50) have been finished writing back=
, they
> >>>>>      are still at the head of lru. When isolating folios from lru, =
it scans
> >>>>>      from tail to head, so it is difficult to scan those folios aga=
in.
> >>>>>
> >>>>> What mentioned above may lead to a large number of folios have been=
 added
> >>>>> to swap cache but can not be reclaimed in time, which may reduce re=
claim
> >>>>> efficiency and prevent other memcgs from using this swap memory eve=
n if
> >>>>> they trigger OOM.
> >>>>>
> >>>>> To fix this issue, it's better to stop looping if THP has been spli=
ted and
> >>>>> nr_pageout is greater than nr_to_reclaim.
> >>>>>
> >>>>> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> >>>>> ---
> >>>>>    mm/vmscan.c | 16 +++++++++++++++-
> >>>>>    1 file changed, 15 insertions(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >>>>> index 749cdc110c74..fd8ad251eda2 100644
> >>>>> --- a/mm/vmscan.c
> >>>>> +++ b/mm/vmscan.c
> >>>>> @@ -1047,7 +1047,7 @@ static unsigned int shrink_folio_list(struct =
list_head *folio_list,
> >>>>>        LIST_HEAD(demote_folios);
> >>>>>        unsigned int nr_reclaimed =3D 0;
> >>>>>        unsigned int pgactivate =3D 0;
> >>>>> -     bool do_demote_pass;
> >>>>> +     bool do_demote_pass, splited =3D false;
> >>>>>        struct swap_iocb *plug =3D NULL;
> >>>>>
> >>>>>        folio_batch_init(&free_folios);
> >>>>> @@ -1065,6 +1065,16 @@ static unsigned int shrink_folio_list(struct=
 list_head *folio_list,
> >>>>>
> >>>>>                cond_resched();
> >>>>>
> >>>>> +             /*
> >>>>> +              * If a large folio has been split, many folios are a=
dded
> >>>>> +              * to folio_list. Looping through the entire list tak=
es
> >>>>> +              * too much time, which may prevent folios that have =
completed
> >>>>> +              * writeback from rotateing to the tail of the lru. J=
ust
> >>>>> +              * stop looping if nr_pageout is greater than nr_to_r=
eclaim.
> >>>>> +              */
> >>>>> +             if (unlikely(splited && stat->nr_pageout > sc->nr_to_=
reclaim))
> >>>>> +                     break;
> >>>
> >>> I=E2=80=99m not entirely sure about the theory behind comparing stat-=
>nr_pageout
> >>> with sc->nr_to_reclaim. However, the condition might still hold true =
even
> >>> if you=E2=80=99ve split a relatively small =E2=80=9Clarge folio,=E2=
=80=9D such as 16kB?
> >>>
> >>
> >> Why compare stat->nr_pageout with sc->nr_to_reclaim? It's because if a=
ll
> >> pages that have been pageout can be reclaimed, then enough pages can b=
e
> >> reclaimed when all pages have finished writeback. Thus, it may not hav=
e
> >> to pageout more.
> >>
> >> If a small large folio(16 kB) has been split, it may return early
> >> without the entire pages in the folio_list being pageout, but I think
> >> that is fine. It can pageout more pages the next time it enters
> >> shrink_folio_list if there are not enough pages to reclaimed.
> >>
> >> However, if pages that have been pageout are still at the head of the
> >> LRU, it is difficult to scan these pages again. In this case, not only
> >> might it "waste" some swap memory but it also has to pageout more page=
s.
> >>
> >> Considering the above, I sent this patch. It may not be a perfect
> >> solution, but i think it's a good option to consider. And I am wonderi=
ng
> >> if anyone has a better solution.
> >
> > Hi Ridong,
> > My overall understanding is that you have failed to describe your probl=
em
> > particularly I don't understand what your 3 and 4 mean:
> >
> >> 3. When folio writeback io is completed, the folio may be rotated to t=
ail
> >>    of lru. The following lru list is expected, with those filioes that=
 have
> >>    been added to swap cache are rotated to tail of lru. So those folio=
s
> >>  can be reclaimed as soon as possible.
> >>
> >>  folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512
> >
> >  > 4. However, shrink_page_list and folio writeback are asynchronous. I=
f THP
> >  >    is splited, shrink_page_list loops at least 512 times, which mean=
s that
> >  >    shrink_page_list is not completed but some folios writeback have =
been
> >  >    completed, and this may lead to failure to rotate these folios to=
 the
> >   >  tail of lru. The lru may look likes as below:
> >
> > can you please describe it in a readable approach?
> >
> > i feel your below diagram is somehow wrong:
> > folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512
> >
> > You mentioned "rotate', how could "rotate" makes:
> > folioN512<->folioN511<->...filioN1 in (2)
> > become
> > filioN1<->...folioN511<->folioN512 in (3).
> >
>
> I am sorry for any confusion.
>
> If THP is split, filioN1, filioN2, filioN3, ...filioN512 are committed
> to writeback one by one. it assumed that filioN1,
> filioN2,filioN3,...filioN512 are completed in order.
>
> Orignal:
> folioN512<->folioN511<->...filioN1<->folio1<->folio2...<->folioN-1
>
> filioN1 is finished, filioN1 is rotated to the tail of LRU:
> folioN512<->folioN511<->...filioN2<->folio1<->folio2...<->folioN-1<->foli=
oN1
>
> filioN2 is finished:
> folioN512<->folioN511<->...filioN3<->folio1<->folio2...<->folioN-1<->foli=
oN1<->folioN2
>
> filioN3 is finished:
> folioN512<->folioN511<->...filioN4<->folio1<->folio2...<->folioN-1<->foli=
oN1<->folioN2<->filioN3
>
> ...
>
> filioN512 is finished:
> folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512
>
> When the filios are finished, the LRU might just like this:
> folio1<->folio2<->...<->folioN-1<->filioN1<->...folioN511<->folioN512

understood, thanks!

Let me try to understand the following part:

> 4:
>   folioN50<->folioN49<->...filioN1<->folio1<->folio2...<->folioN-1<->
>   folioN51<->folioN52<->...folioN511<->folioN512

 >  Although those folios (N1-N50) have been finished writing back, they
 >  are still at the head of lru. When isolating folios from lru, it scans
 >  from tail to head, so it is difficult to scan those folios again.

What is the reason that "those folios (N1-N50) have finished writing back,
yet they remain at the head of the LRU"? Is it because their writeback_end
occurred while we were still looping in shrink_folio_list(), causing
folio_end_writeback()'s folio_rotate_reclaimable() to fail in moving
these folios, which are still in the "folio_list", to the tail of the LRU?

>
> > btw, writeback isn't always async. it could be sync for zram and sync_i=
o
> > swap. in that case, your patch might change the order of LRU. i mean,
> > for example, while a mTHP becomes cold, we always reclaim all of them,
> > but not part of them and put back part of small folios to the head of l=
ru.
> >
>
> Yes, This can be changed.
> Although it may put back part of small folios to the head of lru, it can
> return in time from shrink_folio_list without causing much additional I/O=
.
>
> If you have understood this issue, do you have any suggestions to fix
> it? My patch may not be a perfect way to fix this issue.
>

My point is that synchronous I/O, like zRAM, doesn't have this issue and
doesn't require this fix, as writeback is always completed without
asynchronous latency.


> Best regards,
> Ridong
>

Thanks
Barry