From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id C4AEEC6FD1F
	for <linux-mm@archiver.kernel.org>; Tue,  2 Apr 2024 08:50:25 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 53E1A6B008A; Tue,  2 Apr 2024 04:50:25 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4EE246B008C; Tue,  2 Apr 2024 04:50:25 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 38EBE6B0093; Tue,  2 Apr 2024 04:50:25 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 10D4A6B008A
	for <linux-mm@kvack.org>; Tue,  2 Apr 2024 04:50:25 -0400 (EDT)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay01.hostedemail.com (Postfix) with ESMTP id 9A5B51C09FC
	for <linux-mm@kvack.org>; Tue,  2 Apr 2024 08:50:24 +0000 (UTC)
X-FDA: 81963970368.13.FE51D1B
Received: from mail-ed1-f54.google.com (mail-ed1-f54.google.com [209.85.208.54])
	by imf24.hostedemail.com (Postfix) with ESMTP id CEEF518001F
	for <linux-mm@kvack.org>; Tue,  2 Apr 2024 08:50:21 +0000 (UTC)
Authentication-Results: imf24.hostedemail.com;
	dkim=pass header.d=bytedance.com header.s=google header.b="SLX/9ZmG";
	spf=pass (imf24.hostedemail.com: domain of liuzhaoyu.zackary@bytedance.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=liuzhaoyu.zackary@bytedance.com;
	dmarc=pass (policy=quarantine) header.from=bytedance.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1712047822;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=95hEJfrm4zab9XpMq1bcmq/VIMuR8HQqeEPC2Drveyw=;
	b=nfUWlHKr3XRs9Z7gJrKyGEzMN09HvLGqPkF+D2xEXV3Du0I6C8/K6pyBmeWyhrVHjCCROZ
	PUDaNqhAtWibVAinKk8khYRyzO1yrBeYv7HVVslgKWwMTAyvVyPNSyWYiqw6cHebINzEpJ
	FQwgsPvZttua7PqNwy6rLLhQVSOinrI=
ARC-Authentication-Results: i=1;
	imf24.hostedemail.com;
	dkim=pass header.d=bytedance.com header.s=google header.b="SLX/9ZmG";
	spf=pass (imf24.hostedemail.com: domain of liuzhaoyu.zackary@bytedance.com designates 209.85.208.54 as permitted sender) smtp.mailfrom=liuzhaoyu.zackary@bytedance.com;
	dmarc=pass (policy=quarantine) header.from=bytedance.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712047822; a=rsa-sha256;
	cv=none;
	b=Yi7d0O2qvbM5c+/Bkn6r1EvEKYy9UOSsDgbXfD6fe6yvrUL3yVrBXSiY1VUVqeHdD7syxg
	6REVNChGj5+FlLHlQlBSEDAyXx9EwBL9hMW2OtUEZxtPyMns8RKRCIrN0seBZb3BokFDxM
	h7wDSUcoQduRtW26X4lEjf2vY0PAE30=
Received: by mail-ed1-f54.google.com with SMTP id 4fb4d7f45d1cf-56bf63af770so2573854a12.3
        for <linux-mm@kvack.org>; Tue, 02 Apr 2024 01:50:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bytedance.com; s=google; t=1712047820; x=1712652620; darn=kvack.org;
        h=cc:to:subject:message-id:date:from:mime-version:in-reply-to
         :references:from:to:cc:subject:date:message-id:reply-to;
        bh=95hEJfrm4zab9XpMq1bcmq/VIMuR8HQqeEPC2Drveyw=;
        b=SLX/9ZmGCUYBuhxirmu8wPoxsKiLjKB3dZQAiupuSaiy0A7BZpV7MaKtu7lwZzthZk
         7cwNaVj+eKlHoXC7uXxe7dkJXaCLSgdsnQXMjBFvABa9zzOn3rAG8yBlmpXdMkcbggRE
         e6QZLcCL6R7L6nHijFaD+53ZtBUtAYzJOa9QOlQA8lzeTg0Jej9f7cOQA1vy0jJXD6n2
         RTzg4huG65+ALXYGjS9/4N9p4L3do9Aj1dmhNEYGMVpoJCaGugsC0logfcV3eCELmxK2
         I5woZmFk7QvnI7R8OLb0nTr0wW9l33DW3qnqy/EvFJS0msWAAgykI8sLtwxVLNjpgpaU
         rJkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1712047820; x=1712652620;
        h=cc:to:subject:message-id:date:from:mime-version:in-reply-to
         :references:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=95hEJfrm4zab9XpMq1bcmq/VIMuR8HQqeEPC2Drveyw=;
        b=dGHBmcJjOPzSppOe4AwfDLj1Raf6f88hDvoUins7wS5yIGD+ly/sVZgPsS+k5fT5Qe
         sYvdeEQn6rpOianVtaN1E7rOZz1ARQf0pnfiiwTqyvanK1s74ItF8vD9EcRx6vS8n8Ru
         St43ny7tPvs+emKGEdqY6DOoHUnx2ynAI0JRKqlYHAuDVFRmEr1oQuORoXcgOE97GaYp
         rFrGygvA20fihJxlM3QENmJ4ZbVauZM9p/BDJIZYqVbifc/6jDw7rqcNnY/H3zSmqgE0
         i9jtZJBfKo09KdbguHIXnt2WfAXuWOca899hDrJaSlSfNLx8htQSszep4kDIFC0lfiQ5
         qWFA==
X-Forwarded-Encrypted: i=1; AJvYcCVVLP4JS2WGmOrzGF7bfGqZs44n/4e7oTqHoIZ+z5knSdxfI5cg0QZ7XqH0E4/r/jYqqA7RuN2DgCJqbXv0sCcyT4k=
X-Gm-Message-State: AOJu0YzP/Q+6WSbLgcXPNRTf6HJh+HEZxqV6nNWFZ7/lZyI/nZ+SylhG
	lj7ax9gjnM/21wOCCHtDDsUCiPcsOUuVPm4lTZR05v7VQNWVGkaFxeJsmcKRgwYJnyOVhvAvKwc
	puWy6dPe0Vk9PLuQn5BPzzmPlGy25WizuN+PBQQ==
X-Google-Smtp-Source: AGHT+IE8IiiOA9SNGZibbU0f7oXahJVeL0JvsWUAOE9xyINtfjH7Zb93YvSDSnHSxUP1zXF8Z4G8dWXfLDEClr1MX0w=
X-Received: by 2002:a05:6402:27c8:b0:56d:f424:8707 with SMTP id
 c8-20020a05640227c800b0056df4248707mr437142ede.4.1712047820100; Tue, 02 Apr
 2024 01:50:20 -0700 (PDT)
Received: from 44278815321 named unknown by gmailapi.google.com with HTTPREST;
 Tue, 2 Apr 2024 03:50:19 -0500
References: <CACFO4hLEReDiPvaLmEj1c105xUnC3o_zXshpw1zF_+2n0CZpiQ@mail.gmail.com>
 <CAMgjq7C-4H5zcbs_-mvSNBBWiGhx__0_sTTG32Nfsz7TnP=i5Q@mail.gmail.com>
X-Original-From: Zhaoyu Liu <liuzhaoyu.zackary@bytedance.com>
In-Reply-To: <CAMgjq7C-4H5zcbs_-mvSNBBWiGhx__0_sTTG32Nfsz7TnP=i5Q@mail.gmail.com>
Mime-Version: 1.0
From: Zhaoyu Liu <liuzhaoyu.zackary@bytedance.com>
Date: Tue, 2 Apr 2024 03:50:19 -0500
Message-ID: <CACFO4hL=cEn7tgCNxPGoUDnz17sHy8L38SOQPfXPhx8-TrTtNg@mail.gmail.com>
Subject: Re: [PATCH] mm: swap: prejudgement swap_has_cache to avoid page allocation
To: Kairui Song <ryncsn@gmail.com>, nphamcs@gmail.com
Cc: akpm@linux-foundation.org, ying.huang@intel.com, songmuchun@bytedance.com, 
	david@redhat.com, willy@infradead.org, chrisl@kernel.org, 
	yosryahmed@google.com, guo.ziliang@zte.com.cn, linux-kernel@vger.kernel.org, 
	linux-mm@kvack.org
Content-Type: multipart/alternative; boundary="00000000000027fdda0615193223"
X-Rspamd-Queue-Id: CEEF518001F
X-Rspam-User: 
X-Stat-Signature: h5zzf93gyrbuz4bn4zjg7ycx3dujkfqn
X-Rspamd-Server: rspam01
X-HE-Tag: 1712047821-189807
X-HE-Meta: U2FsdGVkX1//HH4Zjf6uFdQTWCQ9UVf9dLsyb1KhiCQW/mrZJBCS93iFdGHGEKsYeymlhoF++b/ZQEgyV1WnKG35yNB41zMM5MOMY5ZUmWIrD4VjRz/IRavXbcpFocLBA2i5AKwz2n/UrlUHaPCB3wjYpcy84xb/fkcKNGMyvyQ4OSC8SlPNTnWSXpSYzqeJPzJkeT/JREL46wltU/zShN5awgsvHvoTSP6be65mPtqoFgnxKx+RKNGW/vHPP6I/QezI2HznHlpctVrPNAUA+HrglwbtPb5Mz10RVCIgw8R5Ob2ln+tFP5qFsm8+jzHtw+v3OB4tsZ/1MYe8AbNFIhelf9QUZq14vyC+/DX1pEV75G/AvCAWaauE43tguy5m1qovz2WvPgVqExfA4yGw13AE8CLBEC8Ht9LkmPEvjlowP08oqSKOnY5JrGI/Q2gZ0x8rIzzXNfdB16HR+weLveu0D1CRtR2Txsc59EBOiqqX7CMOKWNKjWZ9nh7eYRDs/n19DgLS+FZ2joyN9Y8wDqKHUWFl9KKWt4l59021tLTzjJEuitC/CXtA8xYTHjBXXYa57TeW7ydWCmMStYyutLE6qJxmr9QL2McYQW2H9eEMcQwl8Ji+SocUnKxW3v7G/BfN1IGOWzpMIwqHAXfY4+K/Y5SIk1Sw9uj9Z8KVN/vmTiA6QviBJqZ4uixbJBGPcvQBXD0wmBIzQx2DuIR8U6meXJabyByEcZsLgXqPsYYXTKnz0Yr8SeDa+reAkWtKGScmDoZP2YmW3rsrvUiH+wntN1NT9mhH8Admeu+xleUdJHdIQbJbNptaOgn0xMvqwrm1oj7Z1alwg3YcPn4ybXR+/X9ZE5tq7ZzCeol6ARC5d2MKnHSRk6wgxnzQ6ad01fVPC4BKXvuJ8S/pNdT5aiTo3233EsMJ1NCOjUmrZYGIzxhU5s056zXxfDuDZcow4oU8SWwa7W2bfreZevl
 dyjURhYn
 9U+zV6xrzfMaExCBTBun6NLmHkCvvtz+fv2ViqAK+KJRNLPtdb2A97/OUa620ou5Cw/A9mKHCAm65nYIZ6jGP7xybsIbyk7pYd3Quqj0wDrqTv4qf4IJeySJmCxkTHqBmzmY8N8KFRy6xZjF2uqfF51qFHwPveEn1MHYfO58ZkMJfK9iOxS3uKPymOKEGSayjnzol6Br69LFwJ0Yd4+3RMcv9kToWDxYBTLX6LmPIhLRmTiVziO0JFy3E5bEV038rGG/SLctXWehOuJ4inHOxWddrHFzm1SOSNN7/cJeC9K6DpZ5mKVV+IQuFceJMY9/u8rcGDdEnxYwQXdkSpBgpfO83bcJ4kT6+eViacyuPb8ax3LM2igd5wrG2axxHc9s/rWAA5alLeOnZKrmGGkGC534pZW1uDbth4yXAq+pKJE/qBKpOmy/hao/HQcjPFy59nStMS/QB8AFZOpDERxpcgVZrclaqB3YeUdlxk2gqZyRJUoGy6a0aO3fK7w4cAdl5ihaE
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

--00000000000027fdda0615193223
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Mon, Apr 01, 2024 at 11:15:18PM +0800, Kairui Song wrote:
> On Mon, Apr 1, 2024 at 10:15=E2=80=AFPM Zhaoyu Liu
> wrote:
> >
>
> Hi Zhaoyu
>
> Not sure why but I can't apply your patch, maybe you need to fix your
> email client?

Thanks for your comment, kairui and Nhat.
OK, I'll check the mail client.

>
> > Based on qemu arm64 - latest kernel + 100M memory + 1024M swapfile.
> > Create 1G anon mmap and set it to shared, and has two processes
> > randomly access the shared memory. When they are racing on swap cache,
> > on average, each "alloc_pages_mpol + swapcache_prepare + folio_put"
> > took about 1475 us.
> >
> > So skip page allocation if SWAP_HAS_CACHE was set, just
> > schedule_timeout_uninterruptible and continue to acquire page
> > via filemap_get_folio() from swap cache, to speedup
> > __read_swap_cache_async.
> >
> > Signed-off-by: Zhaoyu Liu
> > ---
> > include/linux/swap.h | 6 ++++++
> > mm/swap_state.c | 10 ++++++++++
> > mm/swapfile.c | 15 +++++++++++++++
> > 3 files changed, 31 insertions(+)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index a211a0383425..8a0013299f38 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -480,6 +480,7 @@ extern sector_t swapdev_block(int, pgoff_t);
> > extern int __swap_count(swp_entry_t entry);
> > extern int swap_swapcount(struct swap_info_struct *si, swp_entry_t
entry);
> > extern int swp_swapcount(swp_entry_t entry);
> > +extern bool swap_has_cache(struct swap_info_struct *si, swp_entry_t
entry);
> > struct swap_info_struct *swp_swap_info(swp_entry_t entry);
> > struct backing_dev_info;
> > extern int init_swap_address_space(unsigned int type, unsigned long
nr_pages);
> > @@ -570,6 +571,11 @@ static inline int swp_swapcount(swp_entry_t entry)
> > return 0;
> > }
> >
> > +static inline bool swap_has_cache(struct swap_info_struct *si,
swp_entry_t entry)
> > +{
> > + return false;
> > +}
> > +
> > static inline swp_entry_t folio_alloc_swap(struct folio *folio)
> > {
> > swp_entry_t entry;
> > diff --git a/mm/swap_state.c b/mm/swap_state.c
> > index bfc7e8c58a6d..f130cfc669ce 100644
> > --- a/mm/swap_state.c
> > +++ b/mm/swap_state.c
> > @@ -462,6 +462,15 @@ struct folio *__read_swap_cache_async(swp_entry_t
entry, gfp_t gfp_mask,
> > if (!swap_swapcount(si, entry) && swap_slot_cache_enabled)
> > goto fail_put_swap;
> >
> > + /*
> > + * Skipping page allocation if SWAP_HAS_CACHE was set,
> > + * just schedule_timeout_uninterruptible and continue to
> > + * acquire page via filemap_get_folio() from swap cache,
> > + * to speedup __read_swap_cache_async.
> > + */
> > + if (swap_has_cache(si, entry))
> > + goto skip_alloc;
> > +
>
> But will this cause more lock contention? You need to lock the cluster
> for the has_cache now.

Sorry, I don't quite understand. Cluster has be lock/unlock in func.
Same approach as swap_swapcount().

>
> > /*
> > * Get a new folio to read into from swap. Allocate it now,
> > * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will
> > @@ -483,6 +492,7 @@ struct folio *__read_swap_cache_async(swp_entry_t
entry, gfp_t gfp_mask,
> > if (err !=3D -EEXIST)
> > goto fail_put_swap;
> >
> > +skip_alloc:
> > /*
> > * Protect against a recursive call to __read_swap_cache_async()
> > * on the same entry waiting forever here because SWAP_HAS_CACHE
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index cf900794f5ed..5388950c4ca6 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1513,6 +1513,21 @@ int swp_swapcount(swp_entry_t entry)
> > return count;
> > }
> >
> > +/*
> > + * Verify that a swap entry has been tagged with SWAP_HAS_CACHE
> > + */
> > +bool swap_has_cache(struct swap_info_struct *si, swp_entry_t entry)
> > +{
> > + pgoff_t offset =3D swp_offset(entry);
> > + struct swap_cluster_info *ci;
> > + bool has_cache;
> > +
> > + ci =3D lock_cluster_or_swap_info(si, offset);
> > + has_cache =3D !!(si->swap_map[offset] & SWAP_HAS_CACHE);
>
> I think you also need to check swap_count here, if an entry was just
> freed or loaded into slot cache, it will also have SWAP_HAS_CACHE set.

Yeah, you are right. SWAP_HAS_CACHE wouldn't mean that entry must be in
swap cache.
I guess you want to confirm through swap_count that the entry is about
to be added to the swap cache rather than about to be deleted from the
swap cache. But sometimes, when an entry is about to be removed from the
swap cache, the swap_count is not equal to 0,
eg. should_try_to_free_swap() in do_swap_page().

>
> I have a very similar function in my another series (see
__swap_has_cache):
> https://lore.kernel.org/all/20240326185032.72159-10-ryncsn@gmail.com/
>
> The situation is different with this patch though. But this check is
> not reliable in both patches, having SWAP_HAS_CACHE doesn't mean the
> folio is in the cache, and even if it's in the cache, it might get
> freed very soon. So you need to ensure later checks can ensure the
> final result is not affected.

Awesome, I'll study it!

>
> eg. If swap_has_cache returns true, then swap cache is freed, and
> skip_if_exists is set to true, __read_swap_cache_async will return
> NULL for an entry that it should be able to alloc and cache, could
> this be a problem (for example, causing zswap writeback to fail with
> ENOMEM due to readahead)?

That's right. However, even without adding this check swapcache_prepare()
still equals -EEXIST and returns NULL when skip_if_exists equals true.

>
> Also the race window that you are trying to avoid seems to be very
> short and rare? Not sure if the whole idea is worth it and actually
> affects performance in a positive way, any data on that?

As I experimented at the beginning of the introduction, if the system
memory pressure is large, alloc_pages_mpol will be very time-consuming,
so when the entry is about to join the swap cache, I don't think it is
necessary to go to the alloc folio, because swapcache_prepare is likely
to return -EEXIST. Just retry filemap, get the page from the swap cache,
and reduce the impact on memory management.

--00000000000027fdda0615193223
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<p>On Mon, Apr 01, 2024 at 11:15:18PM +0800, Kairui Song wrote:
<br>&gt; On Mon, Apr 1, 2024 at 10:15=E2=80=AFPM Zhaoyu Liu
<br>&gt;  wrote:
<br>&gt; &gt;
<br>&gt;=20
<br>&gt; Hi Zhaoyu
<br>&gt;=20
<br>&gt; Not sure why but I can&#39;t apply your patch, maybe you need to f=
ix your
<br>&gt; email client?
<br>
<br>Thanks for your comment, kairui and Nhat.
<br>OK, I&#39;ll check the mail client.
<br>
<br>&gt;=20
<br>&gt; &gt; Based on qemu arm64 - latest kernel + 100M memory + 1024M swa=
pfile.
<br>&gt; &gt; Create 1G anon mmap and set it to shared, and has two process=
es
<br>&gt; &gt; randomly access the shared memory. When they are racing on sw=
ap cache,
<br>&gt; &gt; on average, each &quot;alloc_pages_mpol + swapcache_prepare +=
 folio_put&quot;
<br>&gt; &gt; took about 1475 us.
<br>&gt; &gt;
<br>&gt; &gt; So skip page allocation if SWAP_HAS_CACHE was set, just
<br>&gt; &gt; schedule_timeout_uninterruptible and continue to acquire page
<br>&gt; &gt; via filemap_get_folio() from swap cache, to speedup
<br>&gt; &gt; __read_swap_cache_async.
<br>&gt; &gt;
<br>&gt; &gt; Signed-off-by: Zhaoyu Liu
<br>&gt; &gt; ---
<br>&gt; &gt; include/linux/swap.h | 6 ++++++
<br>&gt; &gt; mm/swap_state.c | 10 ++++++++++
<br>&gt; &gt; mm/swapfile.c | 15 +++++++++++++++
<br>&gt; &gt; 3 files changed, 31 insertions(+)
<br>&gt; &gt;
<br>&gt; &gt; diff --git a/include/linux/swap.h b/include/linux/swap.h
<br>&gt; &gt; index a211a0383425..8a0013299f38 100644
<br>&gt; &gt; --- a/include/linux/swap.h
<br>&gt; &gt; +++ b/include/linux/swap.h
<br>&gt; &gt; @@ -480,6 +480,7 @@ extern sector_t swapdev_block(int, pgoff_=
t);
<br>&gt; &gt; extern int __swap_count(swp_entry_t entry);
<br>&gt; &gt; extern int swap_swapcount(struct swap_info_struct *si, swp_en=
try_t entry);
<br>&gt; &gt; extern int swp_swapcount(swp_entry_t entry);
<br>&gt; &gt; +extern bool swap_has_cache(struct swap_info_struct *si, swp_=
entry_t entry);
<br>&gt; &gt; struct swap_info_struct *swp_swap_info(swp_entry_t entry);
<br>&gt; &gt; struct backing_dev_info;
<br>&gt; &gt; extern int init_swap_address_space(unsigned int type, unsigne=
d long nr_pages);
<br>&gt; &gt; @@ -570,6 +571,11 @@ static inline int swp_swapcount(swp_entr=
y_t entry)
<br>&gt; &gt; return 0;
<br>&gt; &gt; }
<br>&gt; &gt;
<br>&gt; &gt; +static inline bool swap_has_cache(struct swap_info_struct *s=
i, swp_entry_t entry)
<br>&gt; &gt; +{
<br>&gt; &gt; + return false;
<br>&gt; &gt; +}
<br>&gt; &gt; +
<br>&gt; &gt; static inline swp_entry_t folio_alloc_swap(struct folio *foli=
o)
<br>&gt; &gt; {
<br>&gt; &gt; swp_entry_t entry;
<br>&gt; &gt; diff --git a/mm/swap_state.c b/mm/swap_state.c
<br>&gt; &gt; index bfc7e8c58a6d..f130cfc669ce 100644
<br>&gt; &gt; --- a/mm/swap_state.c
<br>&gt; &gt; +++ b/mm/swap_state.c
<br>&gt; &gt; @@ -462,6 +462,15 @@ struct folio *__read_swap_cache_async(sw=
p_entry_t entry, gfp_t gfp_mask,
<br>&gt; &gt; if (!swap_swapcount(si, entry) &amp;&amp; swap_slot_cache_ena=
bled)
<br>&gt; &gt; goto fail_put_swap;
<br>&gt; &gt;
<br>&gt; &gt; + /*
<br>&gt; &gt; + * Skipping page allocation if SWAP_HAS_CACHE was set,
<br>&gt; &gt; + * just schedule_timeout_uninterruptible and continue to
<br>&gt; &gt; + * acquire page via filemap_get_folio() from swap cache,
<br>&gt; &gt; + * to speedup __read_swap_cache_async.
<br>&gt; &gt; + */
<br>&gt; &gt; + if (swap_has_cache(si, entry))
<br>&gt; &gt; + goto skip_alloc;
<br>&gt; &gt; +
<br>&gt;=20
<br>&gt; But will this cause more lock contention? You need to lock the clu=
ster
<br>&gt; for the has_cache now.
<br>
<br>Sorry, I don&#39;t quite understand. Cluster has be lock/unlock in func=
.
<br>Same approach as swap_swapcount().
<br>
<br>&gt;=20
<br>&gt; &gt; /*
<br>&gt; &gt; * Get a new folio to read into from swap. Allocate it now,
<br>&gt; &gt; * before marking swap_map SWAP_HAS_CACHE, when -EEXIST will
<br>&gt; &gt; @@ -483,6 +492,7 @@ struct folio *__read_swap_cache_async(swp=
_entry_t entry, gfp_t gfp_mask,
<br>&gt; &gt; if (err !=3D -EEXIST)
<br>&gt; &gt; goto fail_put_swap;
<br>&gt; &gt;
<br>&gt; &gt; +skip_alloc:
<br>&gt; &gt; /*
<br>&gt; &gt; * Protect against a recursive call to __read_swap_cache_async=
()
<br>&gt; &gt; * on the same entry waiting forever here because SWAP_HAS_CAC=
HE
<br>&gt; &gt; diff --git a/mm/swapfile.c b/mm/swapfile.c
<br>&gt; &gt; index cf900794f5ed..5388950c4ca6 100644
<br>&gt; &gt; --- a/mm/swapfile.c
<br>&gt; &gt; +++ b/mm/swapfile.c
<br>&gt; &gt; @@ -1513,6 +1513,21 @@ int swp_swapcount(swp_entry_t entry)
<br>&gt; &gt; return count;
<br>&gt; &gt; }
<br>&gt; &gt;
<br>&gt; &gt; +/*
<br>&gt; &gt; + * Verify that a swap entry has been tagged with SWAP_HAS_CA=
CHE
<br>&gt; &gt; + */
<br>&gt; &gt; +bool swap_has_cache(struct swap_info_struct *si, swp_entry_t=
 entry)
<br>&gt; &gt; +{
<br>&gt; &gt; + pgoff_t offset =3D swp_offset(entry);
<br>&gt; &gt; + struct swap_cluster_info *ci;
<br>&gt; &gt; + bool has_cache;
<br>&gt; &gt; +
<br>&gt; &gt; + ci =3D lock_cluster_or_swap_info(si, offset);
<br>&gt; &gt; + has_cache =3D !!(si-&gt;swap_map[offset] &amp; SWAP_HAS_CAC=
HE);
<br>&gt;=20
<br>&gt; I think you also need to check swap_count here, if an entry was ju=
st
<br>&gt; freed or loaded into slot cache, it will also have SWAP_HAS_CACHE =
set.
<br>
<br>Yeah, you are right. SWAP_HAS_CACHE wouldn&#39;t mean that entry must b=
e in
<br>swap cache.
<br>I guess you want to confirm through swap_count that the entry is about
<br>to be added to the swap cache rather than about to be deleted from the
<br>swap cache. But sometimes, when an entry is about to be removed from th=
e
<br>swap cache, the swap_count is not equal to 0,
<br>eg. should_try_to_free_swap() in do_swap_page().
<br>
<br>&gt;=20
<br>&gt; I have a very similar function in my another series (see __swap_ha=
s_cache):
<br>&gt; <a href=3D"https://lore.kernel.org/all/20240326185032.72159-10-ryn=
csn@gmail.com/">https://lore.kernel.org/all/20240326185032.72159-10-ryncsn@=
gmail.com/</a>
<br>&gt;=20
<br>&gt; The situation is different with this patch though. But this check =
is
<br>&gt; not reliable in both patches, having SWAP_HAS_CACHE doesn&#39;t me=
an the
<br>&gt; folio is in the cache, and even if it&#39;s in the cache, it might=
 get
<br>&gt; freed very soon. So you need to ensure later checks can ensure the
<br>&gt; final result is not affected.
<br>
<br>Awesome, I&#39;ll study it!
<br>
<br>&gt;=20
<br>&gt; eg. If swap_has_cache returns true, then swap cache is freed, and
<br>&gt; skip_if_exists is set to true, __read_swap_cache_async will return
<br>&gt; NULL for an entry that it should be able to alloc and cache, could
<br>&gt; this be a problem (for example, causing zswap writeback to fail wi=
th
<br>&gt; ENOMEM due to readahead)?
<br>
<br>That&#39;s right. However, even without adding this check swapcache_pre=
pare()
<br>still equals -EEXIST and returns NULL when skip_if_exists equals true.
<br>
<br>&gt;=20
<br>&gt; Also the race window that you are trying to avoid seems to be very
<br>&gt; short and rare? Not sure if the whole idea is worth it and actuall=
y
<br>&gt; affects performance in a positive way, any data on that?
<br>
<br>As I experimented at the beginning of the introduction, if the system
<br>memory pressure is large, alloc_pages_mpol will be very time-consuming,
<br>so when the entry is about to join the swap cache, I don&#39;t think it=
 is
<br>necessary to go to the alloc folio, because swapcache_prepare is likely
<br>to return -EEXIST. Just retry filemap, get the page from the swap cache=
,
<br>and reduce the impact on memory management.</p>

--00000000000027fdda0615193223--