From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D51BDC4345F
	for <linux-mm@archiver.kernel.org>; Tue, 16 Apr 2024 02:09:07 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3A8826B0082; Mon, 15 Apr 2024 22:09:07 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 358C56B0083; Mon, 15 Apr 2024 22:09:07 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 221896B0087; Mon, 15 Apr 2024 22:09:07 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id F29256B0082
	for <linux-mm@kvack.org>; Mon, 15 Apr 2024 22:09:06 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 79B2EC0861
	for <linux-mm@kvack.org>; Tue, 16 Apr 2024 02:09:06 +0000 (UTC)
X-FDA: 82013762292.21.6010F6E
Received: from mail-ua1-f53.google.com (mail-ua1-f53.google.com [209.85.222.53])
	by imf09.hostedemail.com (Postfix) with ESMTP id A81B6140004
	for <linux-mm@kvack.org>; Tue, 16 Apr 2024 02:09:04 +0000 (UTC)
Authentication-Results: imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=FMpL28H3;
	spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713233344; a=rsa-sha256;
	cv=none;
	b=GkhA1dGS82gCgBaOIOlb3Mno4auMCJeIqjuxKVLul5yLImv/WwCqxGs7I4q5Ra/6gFUnZU
	ovvRLk4f1jZEdNqyxRznD5U7MTJWgtLq1VIAoI7JFbkgTWiyhJxSMkRJQRgvYBuzFTguUX
	eBh3C9xNEz5nR6sMcK5By4rZ0DQvej8=
ARC-Authentication-Results: i=1;
	imf09.hostedemail.com;
	dkim=pass header.d=gmail.com header.s=20230601 header.b=FMpL28H3;
	spf=pass (imf09.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.53 as permitted sender) smtp.mailfrom=21cnbao@gmail.com;
	dmarc=pass (policy=none) header.from=gmail.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1713233344;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=U1uJFPkrJK2J/fdiu3afdJXCjQfBShGSkjcoA5KyuUg=;
	b=GahgkFwCFBrHEJWYzB7DBeW/4Rb4gIAK5cQ/ERsyfSbyd9pInTgZGcWqBzXpU5ZBdwk8WC
	cFzqpLvPtFyYXvv1AtTK7vN1+fr3mbbIHJBML8g272xKxVBqqUCxEYEyYuP4iLUTZAeklE
	FJLQkKNefQSxtsQThSVfWmauPQSTddE=
Received: by mail-ua1-f53.google.com with SMTP id a1e0cc1a2514c-7e4017dc13fso1461840241.2
        for <linux-mm@kvack.org>; Mon, 15 Apr 2024 19:09:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1713233343; x=1713838143; darn=kvack.org;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:from:to:cc:subject:date
         :message-id:reply-to;
        bh=U1uJFPkrJK2J/fdiu3afdJXCjQfBShGSkjcoA5KyuUg=;
        b=FMpL28H3289LTar+gXGTAFpGq6u+HE/vRa4G5Ckqu6Zo3XEPN/tjnIvNTZiZN7msMy
         WSeL+St0PXSj5/Ch65qzmPGGb3gC3QthYoaOKlg36k5nc8IzEVSKbqoIscbq9vtVnre1
         5xHjmVQyMonAGlIpZJsjCWkoamfJXCG7Qa3NE+EIDvyTMidqyDaB8Dg8L6x10B9h0RuC
         emJvSiG2StecjElfF4wA4rv9aig1uRAYO9ztglqn8APiFAgs2JTtlUj3H+sA2Tdxdjts
         tzK1prdh5w5Uxs/W2K+pYGubaBLDRT44kxWQxHmQ0tgBzsoPJnDZ0JnM+MJHAbJbW3gd
         mR6Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1713233343; x=1713838143;
        h=content-transfer-encoding:cc:to:subject:message-id:date:from
         :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=U1uJFPkrJK2J/fdiu3afdJXCjQfBShGSkjcoA5KyuUg=;
        b=rl5C6M0/jIJtWV2HEDnBPMvv20VcbMdgwjZUuXeCpBqcxEf6F4FSkJsSkAFrZrq4Vg
         0pR3cJFwitqmFCMWpSB+/xbHlvFD/XS8sHUGF2OnH7bteMS1ZElBuq2C9EuEJrS5mjHp
         kpD6cSnPy3/MpVH49oGSs/7HsTIaiy2QbEy6k1IZvZ4LiZvQWfGOODERz0VL98YO0bIm
         IdfBLWJiUAJLlg7xWIJV7vgzJzGdnRZTdPbLruMewS59dVLBke3IDDI9RMxW0sjr0JAQ
         mDeNzMhVPsOpxFvKHhPq07x1aSR0DSRGYTt0JEdAXA+4FJI76ny0NzXv5TVPXK0Asz1r
         fV6Q==
X-Forwarded-Encrypted: i=1; AJvYcCUWcuPUJhcUCMKcNvZ1kGDJmxR1povQYLB2yBzK0PBzrxFrtHAFPugP9UCYJsoC0a2s+ixYastkWdOv9CpHkeTJ3Ak=
X-Gm-Message-State: AOJu0YxQZEN/aO2xQjZMGD1Ig0qpMOTiONgJSiOamQse9MxaDjCUVFPg
	eUzdEzDYaeIrsHZOWq8RZjaK3l5O4YP7bz7umGd5q4bzaRAaspO7J/Pdx52hYBr922KvSO0kSoH
	rpTsHQF+jWNo90aP0KucgauD5GiM=
X-Google-Smtp-Source: AGHT+IESykN8OJGd2YOXbwq8TTbeWRaZ0eUnH/a90kt2ReqgCootn4yz8OG1XBn7zHKG6921Z9V8woYnNq2CeaNLnAE=
X-Received: by 2002:a05:6102:50a9:b0:47a:43b4:6410 with SMTP id
 bl41-20020a05610250a900b0047a43b46410mr13647604vsb.29.1713233342096; Mon, 15
 Apr 2024 19:09:02 -0700 (PDT)
MIME-Version: 1.0
References: <20240409082631.187483-1-21cnbao@gmail.com> <20240409082631.187483-2-21cnbao@gmail.com>
 <87y19f2lq3.fsf@yhuang6-desk2.ccr.corp.intel.com> <CAGsJ_4yPQyQpWvB2KmuOh2vyUQA0A6hNHsypEC9Gptp9XtuDgg@mail.gmail.com>
 <87jzkz2g3t.fsf@yhuang6-desk2.ccr.corp.intel.com> <CAGsJ_4wm4SfkyTBHpU46EPTFvhq8e54F3KRkKj6gTBcnOjCw1g@mail.gmail.com>
 <87bk6b2elo.fsf@yhuang6-desk2.ccr.corp.intel.com> <CAGsJ_4zH4RHhZembtXOqzBDL75MA5NiEjMHx7eCN-a1ifnKTBw@mail.gmail.com>
 <877cgy2ifu.fsf@yhuang6-desk2.ccr.corp.intel.com>
In-Reply-To: <877cgy2ifu.fsf@yhuang6-desk2.ccr.corp.intel.com>
From: Barry Song <21cnbao@gmail.com>
Date: Tue, 16 Apr 2024 14:08:50 +1200
Message-ID: <CAGsJ_4yim-PQW0JBjZD8dpGBG3FQm=xZb1V51+Nr37nJEwEPzA@mail.gmail.com>
Subject: Re: [PATCH v2 1/5] mm: swap: introduce swap_free_nr() for batched swap_free()
To: "Huang, Ying" <ying.huang@intel.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org, 
	baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, 
	hanchuanhua@oppo.com, hannes@cmpxchg.org, hughd@google.com, 
	kasong@tencent.com, ryan.roberts@arm.com, surenb@google.com, 
	v-songbaohua@oppo.com, willy@infradead.org, xiang@kernel.org, 
	yosryahmed@google.com, yuzhao@google.com, ziy@nvidia.com, 
	linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam08
X-Rspamd-Queue-Id: A81B6140004
X-Stat-Signature: kzhy8z344sb1qnwdq5f94xruocze5knm
X-Rspam-User: 
X-HE-Tag: 1713233344-612929
X-HE-Meta: U2FsdGVkX1+yJ2FD0js6iKVSt9ISbzLSqRjqbOLTvEqAbUqVVMu1O0NObCOYAddFn2NfRcxzkMqCXYlEOjpQahKmpOy/c4TpZ/zrDhyIcJF6wc02eCooxZiDZWqvXZr9RO+BEYhvx/TEig1tycHDlyB4vjkdTTcSnAYkmj2qJG7y5majZN2/mx5YkKUOBDj+QeBBxsOFgsOIiPXkhRb0YVXIqY3mJf+q40UZoUtM7qHR8458by1wX1uS0g4B9zPohDtwmioEONstaVAI0uhg0P3O03UhaCuui78PUlUhjsfs3DRGPpMXuqBv4SQotjQuGTeRBRuJ6tuFdhPsQVZJRfEWqfjTYJmKO6YrpV7NVXTzFy/hT5XhPjjhlnFsqHYvhTRCFnzBRZ4JG+nyvl4Gy42kYIjmCDOZ+vKzYANAlKEnj4ZDyBuQg2lNsqYiE7cC9LAh85nojjgVYz2BCPPz/ss2S9+Zt0Ly64IaI/tcADgOTi7lKYqq2zgOVgeAtF9/wYbCZ6vJ94JUqi/XuCC3XIjw8iDDPzA8El96Y8baGSmKyKRoIGfWCO+HiTAdt/IvvJ/2U584tw2fL4bAZYwkmZUMwl8x+xGfDcS6nAuDR/pAhOM8/FxBNtjZkmuCw55y8JDiDfIXNbgNU4vJ09yUGcajKY8CPFtzLUzfpuZ8C/O2c8hwaS4VIQdZAHs7PNBnNvGKw3zBTL7ggZ3i6LEXFcyIu1o8sHriVWBhyov/5R6Wqf/rtU5cwfN22jQLaTylAEaCSsKfiyU1w0ENLOJWMavKW9iQ0BJZxr2E1GEPd74BK0XihKrC7HWh7MTHPe+4YAnf4hj6WnUAeedNBXEZ9MQCIp8uYlKrOxOA7HmOupzeidD4Nw2zDQKrpHV2tkH9+89uGW4xzsiLzQSE5+IswXalsrEVj3RHkYruAVsTuDAF3/MmCnWk21G6BC4B+uq8wU7/5Bt4SUqnpIn4wbc
 J4fpntCq
 Puuny5H34vTTWxQCf8seYlnARLV6XKJDtS0eDy30B5f3drzg4cfUvinB+5zQs2ueVo6KCt1DKsIu0pasX79w5S+fRXM60+UnwPY7w7hlxpwsrdLM/DRfogDxs0dXmjRSjdPwITcv0YEGlG7PP5P/X+Uu0d63/pwJeP+OKSHbbnQOxgBeSYuGZkrlRPDl0zIKpU55FMcUoWR2SD7uJfcWX/BYPfQ/zAPtUvlEWOBOAUstHAictr5/Ffj+a+nSdzLnOa2twJhqpOcbAQ53PiSPdk1MjTTwjYv9HjAztu8zPNb8fbsS0tGVkx9Q6W/vWqMCGz1AS5Pc6niEWCXwlg1v58ka1fOZueqWlV1WRVpCj7B+Cz0GTsBlE3Qm5VbuB5giBBLuEkqLWV+XPrhJDWE/oGSYwA43DZZEgjC5f0yEsZLMHFKBRCYHtreXrzYmgA3rMhQOSehQKQXuiGMJI6GpoUiNrKP05T7KShkTImaWTzFj45A3U7i/XRKQ06BvHV/xBebHqNXcuBfEGb9BCUrmDGzA/LKaY7GE1DsheQTMCE7Kq714VXy+IrtOR/w==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On Tue, Apr 16, 2024 at 1:42=E2=80=AFPM Huang, Ying <ying.huang@intel.com> =
wrote:
>
> Barry Song <21cnbao@gmail.com> writes:
>
> > On Mon, Apr 15, 2024 at 8:53=E2=80=AFPM Huang, Ying <ying.huang@intel.c=
om> wrote:
> >>
> >> Barry Song <21cnbao@gmail.com> writes:
> >>
> >> > On Mon, Apr 15, 2024 at 8:21=E2=80=AFPM Huang, Ying <ying.huang@inte=
l.com> wrote:
> >> >>
> >> >> Barry Song <21cnbao@gmail.com> writes:
> >> >>
> >> >> > On Mon, Apr 15, 2024 at 6:19=E2=80=AFPM Huang, Ying <ying.huang@i=
ntel.com> wrote:
> >> >> >>
> >> >> >> Barry Song <21cnbao@gmail.com> writes:
> >> >> >>
> >> >> >> > From: Chuanhua Han <hanchuanhua@oppo.com>
> >> >> >> >
> >> >> >> > While swapping in a large folio, we need to free swaps related=
 to the whole
> >> >> >> > folio. To avoid frequently acquiring and releasing swap locks,=
 it is better
> >> >> >> > to introduce an API for batched free.
> >> >> >> >
> >> >> >> > Signed-off-by: Chuanhua Han <hanchuanhua@oppo.com>
> >> >> >> > Co-developed-by: Barry Song <v-songbaohua@oppo.com>
> >> >> >> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> >> >> >> > ---
> >> >> >> >  include/linux/swap.h |  5 +++++
> >> >> >> >  mm/swapfile.c        | 51 +++++++++++++++++++++++++++++++++++=
+++++++++
> >> >> >> >  2 files changed, 56 insertions(+)
> >> >> >> >
> >> >> >> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> >> >> > index 11c53692f65f..b7a107e983b8 100644
> >> >> >> > --- a/include/linux/swap.h
> >> >> >> > +++ b/include/linux/swap.h
> >> >> >> > @@ -483,6 +483,7 @@ extern void swap_shmem_alloc(swp_entry_t);
> >> >> >> >  extern int swap_duplicate(swp_entry_t);
> >> >> >> >  extern int swapcache_prepare(swp_entry_t);
> >> >> >> >  extern void swap_free(swp_entry_t);
> >> >> >> > +extern void swap_free_nr(swp_entry_t entry, int nr_pages);
> >> >> >> >  extern void swapcache_free_entries(swp_entry_t *entries, int =
n);
> >> >> >> >  extern void free_swap_and_cache_nr(swp_entry_t entry, int nr)=
;
> >> >> >> >  int swap_type_of(dev_t device, sector_t offset);
> >> >> >> > @@ -564,6 +565,10 @@ static inline void swap_free(swp_entry_t =
swp)
> >> >> >> >  {
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +void swap_free_nr(swp_entry_t entry, int nr_pages)
> >> >> >> > +{
> >> >> >> > +}
> >> >> >> > +
> >> >> >> >  static inline void put_swap_folio(struct folio *folio, swp_en=
try_t swp)
> >> >> >> >  {
> >> >> >> >  }
> >> >> >> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> >> >> >> > index 28642c188c93..f4c65aeb088d 100644
> >> >> >> > --- a/mm/swapfile.c
> >> >> >> > +++ b/mm/swapfile.c
> >> >> >> > @@ -1356,6 +1356,57 @@ void swap_free(swp_entry_t entry)
> >> >> >> >               __swap_entry_free(p, entry);
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +/*
> >> >> >> > + * Free up the maximum number of swap entries at once to limi=
t the
> >> >> >> > + * maximum kernel stack usage.
> >> >> >> > + */
> >> >> >> > +#define SWAP_BATCH_NR (SWAPFILE_CLUSTER > 512 ? 512 : SWAPFIL=
E_CLUSTER)
> >> >> >> > +
> >> >> >> > +/*
> >> >> >> > + * Called after swapping in a large folio,
> >> >> >>
> >> >> >> IMHO, it's not good to document the caller in the function defin=
ition.
> >> >> >> Because this will discourage function reusing.
> >> >> >
> >> >> > ok. right now there is only one user that is why it is added. but=
 i agree
> >> >> > we can actually remove this.
> >> >> >
> >> >> >>
> >> >> >> > batched free swap entries
> >> >> >> > + * for this large folio, entry should be for the first subpag=
e and
> >> >> >> > + * its offset is aligned with nr_pages
> >> >> >>
> >> >> >> Why do we need this?
> >> >> >
> >> >> > This is a fundamental requirement for the existing kernel, folio'=
s
> >> >> > swap offset is naturally aligned from the first moment add_to_swa=
p
> >> >> > to add swapcache's xa. so this comment is describing the existing
> >> >> > fact. In the future, if we want to support swap-out folio to disc=
ontiguous
> >> >> > and not-aligned offsets, we can't pass entry as the parameter, we=
 should
> >> >> > instead pass ptep or another different data struct which can conn=
ect
> >> >> > multiple discontiguous swap offsets.
> >> >> >
> >> >> > I feel like we only need "for this large folio, entry should be f=
or
> >> >> > the first subpage" and drop "and its offset is aligned with nr_pa=
ges",
> >> >> > the latter is not important to this context at all.
> >> >>
> >> >> IIUC, all these are requirements of the only caller now, not the
> >> >> function itself.  If only part of the all swap entries of a mTHP ar=
e
> >> >> called with swap_free_nr(), can swap_free_nr() still do its work?  =
If
> >> >> so, why not make swap_free_nr() as general as possible?
> >> >
> >> > right , i believe we can make swap_free_nr() as general as possible.
> >> >
> >> >>
> >> >> >>
> >> >> >> > + */
> >> >> >> > +void swap_free_nr(swp_entry_t entry, int nr_pages)
> >> >> >> > +{
> >> >> >> > +     int i, j;
> >> >> >> > +     struct swap_cluster_info *ci;
> >> >> >> > +     struct swap_info_struct *p;
> >> >> >> > +     unsigned int type =3D swp_type(entry);
> >> >> >> > +     unsigned long offset =3D swp_offset(entry);
> >> >> >> > +     int batch_nr, remain_nr;
> >> >> >> > +     DECLARE_BITMAP(usage, SWAP_BATCH_NR) =3D { 0 };
> >> >> >> > +
> >> >> >> > +     /* all swap entries are within a cluster for mTHP */
> >> >> >> > +     VM_BUG_ON(offset % SWAPFILE_CLUSTER + nr_pages > SWAPFIL=
E_CLUSTER);
> >> >> >> > +
> >> >> >> > +     if (nr_pages =3D=3D 1) {
> >> >> >> > +             swap_free(entry);
> >> >> >> > +             return;
> >> >> >> > +     }
> >> >> >>
> >> >> >> Is it possible to unify swap_free() and swap_free_nr() into one =
function
> >> >> >> with acceptable performance?  IIUC, the general rule in mTHP eff=
ort is
> >> >> >> to avoid duplicate functions between mTHP and normal small folio=
.
> >> >> >> Right?
> >> >> >
> >> >> > I don't see why.
> >> >>
> >> >> Because duplicated implementation are hard to maintain in the long =
term.
> >> >
> >> > sorry, i actually meant "I don't see why not",  for some reason, the=
 "not"
> >> > was missed. Obviously I meant "why not", there was a "but" after it =
:-)
> >> >
> >> >>
> >> >> > but we have lots of places calling swap_free(), we may
> >> >> > have to change them all to call swap_free_nr(entry, 1); the other=
 possible
> >> >> > way is making swap_free() a wrapper of swap_free_nr() always usin=
g
> >> >> > 1 as the argument. In both cases, we are changing the semantics o=
f
> >> >> > swap_free_nr() to partially freeing large folio cases and have to=
 drop
> >> >> > "entry should be for the first subpage" then.
> >> >> >
> >> >> > Right now, the semantics is
> >> >> > * swap_free_nr() for an entire large folio;
> >> >> > * swap_free() for one entry of either a large folio or a small fo=
lio
> >> >>
> >> >> As above, I don't think the these semantics are important for
> >> >> swap_free_nr() implementation.
> >> >
> >> > right. I agree. If we are ready to change all those callers, nothing
> >> > can stop us from removing swap_free().
> >> >
> >> >>
> >> >> >>
> >> >> >> > +
> >> >> >> > +     remain_nr =3D nr_pages;
> >> >> >> > +     p =3D _swap_info_get(entry);
> >> >> >> > +     if (p) {
> >> >> >> > +             for (i =3D 0; i < nr_pages; i +=3D batch_nr) {
> >> >> >> > +                     batch_nr =3D min_t(int, SWAP_BATCH_NR, r=
emain_nr);
> >> >> >> > +
> >> >> >> > +                     ci =3D lock_cluster_or_swap_info(p, offs=
et);
> >> >> >> > +                     for (j =3D 0; j < batch_nr; j++) {
> >> >> >> > +                             if (__swap_entry_free_locked(p, =
offset + i * SWAP_BATCH_NR + j, 1))
> >> >> >> > +                                     __bitmap_set(usage, j, 1=
);
> >> >> >> > +                     }
> >> >> >> > +                     unlock_cluster_or_swap_info(p, ci);
> >> >> >> > +
> >> >> >> > +                     for_each_clear_bit(j, usage, batch_nr)
> >> >> >> > +                             free_swap_slot(swp_entry(type, o=
ffset + i * SWAP_BATCH_NR + j));
> >> >> >> > +
> >> >> >> > +                     bitmap_clear(usage, 0, SWAP_BATCH_NR);
> >> >> >> > +                     remain_nr -=3D batch_nr;
> >> >> >> > +             }
> >> >> >> > +     }
> >> >> >> > +}
> >> >> >> > +
> >> >> >> >  /*
> >> >> >> >   * Called after dropping swapcache to decrease refcnt to swap=
 entries.
> >> >> >> >   */
> >> >> >>
> >> >> >> put_swap_folio() implements batching in another method.  Do you =
think
> >> >> >> that it's good to use the batching method in that function here?=
  It
> >> >> >> avoids to use bitmap operations and stack space.
> >> >> >
> >> >> > Chuanhua has strictly limited the maximum stack usage to several
> >> >> > unsigned long,
> >> >>
> >> >> 512 / 8 =3D 64 bytes.
> >> >>
> >> >> So, not trivial.
> >> >>
> >> >> > so this should be safe. on the other hand, i believe this
> >> >> > implementation is more efficient, as  put_swap_folio() might lock=
/
> >> >> > unlock much more often whenever __swap_entry_free_locked returns
> >> >> > 0.
> >> >>
> >> >> There are 2 most common use cases,
> >> >>
> >> >> - all swap entries have usage count =3D=3D 0
> >> >> - all swap entries have usage count !=3D 0
> >> >>
> >> >> In both cases, we only need to lock/unlock once.  In fact, I didn't
> >> >> find possible use cases other than above.
> >> >
> >> > i guess the point is free_swap_slot() shouldn't be called within
> >> > lock_cluster_or_swap_info? so when we are freeing nr_pages slots,
> >> > we'll have to unlock and lock nr_pages times?  and this is the most
> >> > common scenario.
> >>
> >> No.  In put_swap_folio(), free_entries is either SWAPFILE_CLUSTER (tha=
t
> >> is, nr_pages) or 0.  These are the most common cases.
> >>
> >
> > i am actually talking about the below code path,
> >
> > void put_swap_folio(struct folio *folio, swp_entry_t entry)
> > {
> >         ...
> >
> >         ci =3D lock_cluster_or_swap_info(si, offset);
> >         ...
> >         for (i =3D 0; i < size; i++, entry.val++) {
> >                 if (!__swap_entry_free_locked(si, offset + i, SWAP_HAS_=
CACHE)) {
> >                         unlock_cluster_or_swap_info(si, ci);
> >                         free_swap_slot(entry);
> >                         if (i =3D=3D size - 1)
> >                                 return;
> >                         lock_cluster_or_swap_info(si, offset);
> >                 }
> >         }
> >         unlock_cluster_or_swap_info(si, ci);
> > }
> >
> > but i guess you are talking about the below code path:
> >
> > void put_swap_folio(struct folio *folio, swp_entry_t entry)
> > {
> >         ...
> >
> >         ci =3D lock_cluster_or_swap_info(si, offset);
> >         if (size =3D=3D SWAPFILE_CLUSTER) {
> >                 map =3D si->swap_map + offset;
> >                 for (i =3D 0; i < SWAPFILE_CLUSTER; i++) {
> >                         val =3D map[i];
> >                         VM_BUG_ON(!(val & SWAP_HAS_CACHE));
> >                         if (val =3D=3D SWAP_HAS_CACHE)
> >                                 free_entries++;
> >                 }
> >                 if (free_entries =3D=3D SWAPFILE_CLUSTER) {
> >                         unlock_cluster_or_swap_info(si, ci);
> >                         spin_lock(&si->lock);
> >                         mem_cgroup_uncharge_swap(entry, SWAPFILE_CLUSTE=
R);
> >                         swap_free_cluster(si, idx);
> >                         spin_unlock(&si->lock);
> >                         return;
> >                 }
> >         }
> > }
>
> I am talking about both code paths.  In 2 most common cases,
> __swap_entry_free_locked() will return 0 or !0 for all entries in range.

I grasp your point, but if conditions involving 0 or non-0 values fail, we'=
ll
end up repeatedly unlocking and locking. Picture a scenario with a large
folio shared by multiple processes. One process might unmap a portion
while another still holds an entire mapping to it. This could lead to situa=
tions
where free_entries doesn't equal 0 and free_entries doesn't equal
nr_pages, resulting in multiple unlock and lock operations.

Chuanhua has invested significant effort in following Ryan's suggestion
for the current approach, which generally handles all cases, especially
partial unmapping. Additionally, the widespread use of swap_free_nr()
as you suggested across various scenarios is noteworthy.

Unless there's evidence indicating performance issues or bugs, I believe
the current approach remains preferable.

>
> > we are mTHP, so we can't assume our size is SWAPFILE_CLUSTER?
> > or you want to check free_entries =3D=3D "1 << swap_entry_order(folio_o=
rder(folio))"
> > instead of SWAPFILE_CLUSTER for the "for (i =3D 0; i < size; i++, entry=
.val++)"
> > path?
>
> Just replace SWAPFILE_CLUSTER with "nr_pages" in your code.
>
> >
> >> >>
> >> >> And, we should add batching in __swap_entry_free().  That will help
> >> >> free_swap_and_cache_nr() too.
> >
> > Chris Li and I actually discussed it before, while I completely
> > agree this can be batched. but i'd like to defer this as an incremental
> > patchset later to keep this swapcache-refault small.
>
> OK.
>
> >>
> >> Please consider this too.
>
> --
> Best Regards,
> Huang, Ying

Thanks
Barry