From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DAC75C4829B
	for <linux-mm@archiver.kernel.org>; Mon, 12 Feb 2024 08:51:48 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5543C6B0075; Mon, 12 Feb 2024 03:51:48 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 4DD9F6B0078; Mon, 12 Feb 2024 03:51:48 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 3569C8D0001; Mon, 12 Feb 2024 03:51:48 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 20B946B0075
	for <linux-mm@kvack.org>; Mon, 12 Feb 2024 03:51:48 -0500 (EST)
Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay02.hostedemail.com (Postfix) with ESMTP id BB0171205A3
	for <linux-mm@kvack.org>; Mon, 12 Feb 2024 08:51:47 +0000 (UTC)
X-FDA: 81782533854.23.E3736D7
Received: from foss.arm.com (foss.arm.com [217.140.110.172])
	by imf19.hostedemail.com (Postfix) with ESMTP id DFB311A000D
	for <linux-mm@kvack.org>; Mon, 12 Feb 2024 08:51:45 +0000 (UTC)
Authentication-Results: imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1707727906;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=q+eNbmXaDYGzq1qszagrsefdhvUHd8vlioCZV8yexAk=;
	b=4EvK/14SzZakhW18KY7WOtkEMALaf6fc/3psHcivj/gwY9d69R3Hz+AZKSbquilZFKv2mv
	2CkUxGEB0DbNV8nEiBdo5+BcTjoDSnwU6IAUqf/uLrcbCmwQr3SxWojew79gPCvjqAxtUr
	6l4pysXWBpLO/Zhj3v+qzVqgGNNzmjw=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707727906; a=rsa-sha256;
	cv=none;
	b=J7hXTaFymoC43yjYAHFmiFQTrkkNuO/ZymefTQeOLd84YH6B3UGzfL6XouPQmEdqAK5g2o
	X5uZ7pw+/X1TuhTBtc86w8ZzwQ3is0vkePkfdEC9h+5EJAPrGfeeu8Z8F4Gb40Wv6On9g5
	Pmduj0BGqlj8DMjp3Zi06zE6Fi8aTs8=
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=none;
	spf=pass (imf19.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com;
	dmarc=pass (policy=none) header.from=arm.com
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14])
	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 412FBDA7;
	Mon, 12 Feb 2024 00:52:26 -0800 (PST)
Received: from [10.57.78.115] (unknown [10.57.78.115])
	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 943C23F762;
	Mon, 12 Feb 2024 00:51:41 -0800 (PST)
Message-ID: <438b22ec-c875-41b6-bfd4-a84966f84853@arm.com>
Date: Mon, 12 Feb 2024 08:51:40 +0000
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH v2 08/10] mm/mmu_gather: add __tlb_remove_folio_pages()
Content-Language: en-GB
To: David Hildenbrand <david@redhat.com>, linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
 Matthew Wilcox <willy@infradead.org>,
 Catalin Marinas <catalin.marinas@arm.com>,
 Yin Fengwei <fengwei.yin@intel.com>, Michal Hocko <mhocko@suse.com>,
 Will Deacon <will@kernel.org>, "Aneesh Kumar K.V"
 <aneesh.kumar@linux.ibm.com>, Nick Piggin <npiggin@gmail.com>,
 Peter Zijlstra <peterz@infradead.org>, Michael Ellerman
 <mpe@ellerman.id.au>, Christophe Leroy <christophe.leroy@csgroup.eu>,
 "Naveen N. Rao" <naveen.n.rao@linux.ibm.com>,
 Heiko Carstens <hca@linux.ibm.com>, Vasily Gorbik <gor@linux.ibm.com>,
 Alexander Gordeev <agordeev@linux.ibm.com>,
 Christian Borntraeger <borntraeger@linux.ibm.com>,
 Sven Schnelle <svens@linux.ibm.com>, Arnd Bergmann <arnd@arndb.de>,
 linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
 linux-s390@vger.kernel.org
References: <20240209221509.585251-1-david@redhat.com>
 <20240209221509.585251-9-david@redhat.com>
From: Ryan Roberts <ryan.roberts@arm.com>
In-Reply-To: <20240209221509.585251-9-david@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Rspamd-Queue-Id: DFB311A000D
X-Rspam-User: 
X-Stat-Signature: cus8tx8j6n6eeuphkhkb7uzw1yjigydk
X-Rspamd-Server: rspam03
X-HE-Tag: 1707727905-857308
X-HE-Meta: U2FsdGVkX19fsTIf++5aIytPZZMM1LKDDUDiVOzCWaKDPiatO4drtCaKt+cIxDB8GJQDjRXKAIVixNKX4L8xhnwb1N3/YVBnIAhCcdAVcUX1fCYo+7SxqPVB0MEcfodq/5yqtfcXYTYpA0owhLR3XtjB6nuOoa1LVnc4sB31S35ndGBxW663j4ZptUaCR8Ez7s2xBbhTw6kQHjd1gVvRiiic/cQTbvnHX/2KklIqDsMvJ1WMCz8vh/jKd1+bVZsIMgaSHf6dZ1D7mCjOTuO8RfzsSAJZrdeNz+tjlu9hqtAC51E5eOJWmr8e4Z8Nt10aBv6LjYu2n5rqX53forTdA+tgNOLJlnKPa6IRK8PaaEb7CXkXey14x1Unjf8RxtxaKpMOqSvBusG23bpFYkw/QRCjMVSq0pEuJ8LjNfOq7FD+7/jnxI7a4IOU1Ea+aFx4KYMwbDZnZA6Lz2JxmjUu5yAJn69cKVw2Vmrb9d7lYY9El+9BNSj9UCSqbyvhzS0TU3SGnOrhxJ+YcIQ9w7BlL2bZ0aqCfpqOdf5meY4HqjOxlfkvCkebwr5woVZqkZBW72YMAIpblgQkAy/tI+YUUa5K3mUPJZ4GScqJUr0L3kjvp6dSJ9szM9Kf7OgbOPt3tXu6Vt6efZw5ei9mO6S6h3dJj7kjDqR/jL8W/IuFTRbdgFtrCivr9kkTBybdRCtEmMr/oiURfnX2p//kaEuAa+7NCqb/feqhhU5TJIBSCA5sDRLHaX5fMxPsJWilNvVbm2AbTh+5DGIac+xvqIZvTQ8irEfsrZRJnlaxcjsOMgIPfuownqqTigNDyBJKzCa7UltRqwiXk/6wXJT9/PJu7PyIbYTBPkDr3W0RmsMGSlR+EbwqJrUksWGz/RNAJoRMsOncghQrhDIpXnakEELI+SwR4KfJCvDpLmh/vodeE6tMG6JTXupq9R9rGgK4jH7iajjsHNrwj5go/oXZ4c+
 1h2+xhwg
 0VT3heeSMdZUZjZr8LJxCtX57gZf0sQi/2dDJ7KuZRl8IYlamKGQjjHm7ZD/sSArgX+bSdjrFyFCrtE7c6aG+ezwYhctUUnGNbFGi4AFJcN69a0etXK7M85OempLITMLHPH8eZoRpeO/8u2oUtHjsVeZniJBcAkar1jhKD3Ov4GkqbOvpbd/GbijYi7DVFW34ZZxwlbHE/mc3zbPD6lHpJToYkty1QgF242fbWvE7Seprdx6Dk0pkIBd/PtoPMHhyu6B9T5lNFQUneTck0zRyu2Q2sw==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

On 09/02/2024 22:15, David Hildenbrand wrote:
> Add __tlb_remove_folio_pages(), which will remove multiple consecutive
> pages that belong to the same large folio, instead of only a single
> page. We'll be using this function when optimizing unmapping/zapping of
> large folios that are mapped by PTEs.
> 
> We're using the remaining spare bit in an encoded_page to indicate that
> the next enoced page in an array contains actually shifted "nr_pages".
> Teach swap/freeing code about putting multiple folio references, and
> delayed rmap handling to remove page ranges of a folio.
> 
> This extension allows for still gathering almost as many small folios
> as we used to (-1, because we have to prepare for a possibly bigger next
> entry), but still allows for gathering consecutive pages that belong to the
> same large folio.
> 
> Note that we don't pass the folio pointer, because it is not required for
> now. Further, we don't support page_size != PAGE_SIZE, it won't be
> required for simple PTE batching.
> 
> We have to provide a separate s390 implementation, but it's fairly
> straight forward.
> 
> Another, more invasive and likely more expensive, approach would be to
> use folio+range or a PFN range instead of page+nr_pages. But, we should
> do that consistently for the whole mmu_gather. For now, let's keep it
> simple and add "nr_pages" only.
> 
> Note that it is now possible to gather significantly more pages: In the
> past, we were able to gather ~10000 pages, now we can gather
> also gather ~5000 folio fragments that span multiple pages. A folio
> fragement on x86-64 can be up to 512 pages (2 MiB THP) and on arm64 with
> 64k in theory 8192 pages (512 MiB THP). Gathering more memory is not
> considered something we should worry about, especially because these are
> already corner cases.
> 
> While we can gather more total memory, we won't free more folio
> fragments. As long as page freeing time primarily only depends on the
> number of involved folios, there is no effective change for !preempt
> configurations. However, we'll adjust tlb_batch_pages_flush() separately to
> handle corner cases where page freeing time grows proportionally with the
> actual memory size.
> 
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/include/asm/tlb.h | 17 +++++++++++
>  include/asm-generic/tlb.h   |  8 +++++
>  include/linux/mm_types.h    | 20 ++++++++++++
>  mm/mmu_gather.c             | 61 +++++++++++++++++++++++++++++++------
>  mm/swap.c                   | 12 ++++++--
>  mm/swap_state.c             | 15 +++++++--
>  6 files changed, 119 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/s390/include/asm/tlb.h b/arch/s390/include/asm/tlb.h
> index 48df896d5b79..e95b2c8081eb 100644
> --- a/arch/s390/include/asm/tlb.h
> +++ b/arch/s390/include/asm/tlb.h
> @@ -26,6 +26,8 @@ void __tlb_remove_table(void *_table);
>  static inline void tlb_flush(struct mmu_gather *tlb);
>  static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
>  		struct page *page, bool delay_rmap, int page_size);
> +static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
> +		struct page *page, unsigned int nr_pages, bool delay_rmap);
>  
>  #define tlb_flush tlb_flush
>  #define pte_free_tlb pte_free_tlb
> @@ -52,6 +54,21 @@ static inline bool __tlb_remove_page_size(struct mmu_gather *tlb,
>  	return false;
>  }
>  
> +static inline bool __tlb_remove_folio_pages(struct mmu_gather *tlb,
> +		struct page *page, unsigned int nr_pages, bool delay_rmap)
> +{
> +	struct encoded_page *encoded_pages[] = {
> +		encode_page(page, ENCODED_PAGE_BIT_NR_PAGES_NEXT),
> +		encode_nr_pages(nr_pages),
> +	};
> +
> +	VM_WARN_ON_ONCE(delay_rmap);
> +	VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));
> +
> +	free_pages_and_swap_cache(encoded_pages, ARRAY_SIZE(encoded_pages));
> +	return false;
> +}
> +
>  static inline void tlb_flush(struct mmu_gather *tlb)
>  {
>  	__tlb_flush_mm_lazy(tlb->mm);
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index 95d60a4f468a..bd00dd238b79 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -69,6 +69,7 @@
>   *
>   *  - tlb_remove_page() / __tlb_remove_page()
>   *  - tlb_remove_page_size() / __tlb_remove_page_size()
> + *  - __tlb_remove_folio_pages()
>   *
>   *    __tlb_remove_page_size() is the basic primitive that queues a page for
>   *    freeing. __tlb_remove_page() assumes PAGE_SIZE. Both will return a
> @@ -78,6 +79,11 @@
>   *    tlb_remove_page() and tlb_remove_page_size() imply the call to
>   *    tlb_flush_mmu() when required and has no return value.
>   *
> + *    __tlb_remove_folio_pages() is similar to __tlb_remove_page(), however,
> + *    instead of removing a single page, remove the given number of consecutive
> + *    pages that are all part of the same (large) folio: just like calling
> + *    __tlb_remove_page() on each page individually.
> + *
>   *  - tlb_change_page_size()
>   *
>   *    call before __tlb_remove_page*() to set the current page-size; implies a
> @@ -262,6 +268,8 @@ struct mmu_gather_batch {
>  
>  extern bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
>  		bool delay_rmap, int page_size);
> +bool __tlb_remove_folio_pages(struct mmu_gather *tlb, struct page *page,
> +		unsigned int nr_pages, bool delay_rmap);
>  
>  #ifdef CONFIG_SMP
>  /*
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 1b89eec0d6df..a7223ba3ea1e 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -226,6 +226,15 @@ struct encoded_page;
>  /* Perform rmap removal after we have flushed the TLB. */
>  #define ENCODED_PAGE_BIT_DELAY_RMAP		1ul
>  
> +/*
> + * The next item in an encoded_page array is the "nr_pages" argument, specifying
> + * the number of consecutive pages starting from this page, that all belong to
> + * the same folio. For example, "nr_pages" corresponds to the number of folio
> + * references that must be dropped. If this bit is not set, "nr_pages" is
> + * implicitly 1.
> + */
> +#define ENCODED_PAGE_BIT_NR_PAGES_NEXT		2ul
> +
>  static __always_inline struct encoded_page *encode_page(struct page *page, unsigned long flags)
>  {
>  	BUILD_BUG_ON(flags > ENCODED_PAGE_BITS);
> @@ -242,6 +251,17 @@ static inline struct page *encoded_page_ptr(struct encoded_page *page)
>  	return (struct page *)(~ENCODED_PAGE_BITS & (unsigned long)page);
>  }
>  
> +static __always_inline struct encoded_page *encode_nr_pages(unsigned long nr)
> +{
> +	VM_WARN_ON_ONCE((nr << 2) >> 2 != nr);
> +	return (struct encoded_page *)(nr << 2);
> +}
> +
> +static __always_inline unsigned long encoded_nr_pages(struct encoded_page *page)
> +{
> +	return ((unsigned long)page) >> 2;
> +}
> +
>  /*
>   * A swap entry has to fit into a "unsigned long", as the entry is hidden
>   * in the "index" field of the swapper address space.
> diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c
> index 6540c99c6758..d175c0f1e2c8 100644
> --- a/mm/mmu_gather.c
> +++ b/mm/mmu_gather.c
> @@ -50,12 +50,21 @@ static bool tlb_next_batch(struct mmu_gather *tlb)
>  #ifdef CONFIG_SMP
>  static void tlb_flush_rmap_batch(struct mmu_gather_batch *batch, struct vm_area_struct *vma)
>  {
> +	struct encoded_page **pages = batch->encoded_pages;
> +
>  	for (int i = 0; i < batch->nr; i++) {
> -		struct encoded_page *enc = batch->encoded_pages[i];
> +		struct encoded_page *enc = pages[i];
>  
>  		if (encoded_page_flags(enc) & ENCODED_PAGE_BIT_DELAY_RMAP) {
>  			struct page *page = encoded_page_ptr(enc);
> -			folio_remove_rmap_pte(page_folio(page), page, vma);
> +			unsigned int nr_pages = 1;
> +
> +			if (unlikely(encoded_page_flags(enc) &
> +				     ENCODED_PAGE_BIT_NR_PAGES_NEXT))
> +				nr_pages = encoded_nr_pages(pages[++i]);
> +
> +			folio_remove_rmap_ptes(page_folio(page), page, nr_pages,
> +					       vma);
>  		}
>  	}
>  }
> @@ -89,18 +98,26 @@ static void tlb_batch_pages_flush(struct mmu_gather *tlb)
>  	for (batch = &tlb->local; batch && batch->nr; batch = batch->next) {
>  		struct encoded_page **pages = batch->encoded_pages;
>  
> -		do {
> +		while (batch->nr) {
>  			/*
>  			 * limit free batch count when PAGE_SIZE > 4K
>  			 */
>  			unsigned int nr = min(512U, batch->nr);
>  
> +			/*
> +			 * Make sure we cover page + nr_pages, and don't leave
> +			 * nr_pages behind when capping the number of entries.
> +			 */
> +			if (unlikely(encoded_page_flags(pages[nr - 1]) &
> +				     ENCODED_PAGE_BIT_NR_PAGES_NEXT))
> +				nr++;
> +
>  			free_pages_and_swap_cache(pages, nr);
>  			pages += nr;
>  			batch->nr -= nr;
>  
>  			cond_resched();
> -		} while (batch->nr);
> +		}
>  	}
>  	tlb->active = &tlb->local;
>  }
> @@ -116,8 +133,9 @@ static void tlb_batch_list_free(struct mmu_gather *tlb)
>  	tlb->local.next = NULL;
>  }
>  
> -bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
> -		bool delay_rmap, int page_size)
> +static bool __tlb_remove_folio_pages_size(struct mmu_gather *tlb,
> +		struct page *page, unsigned int nr_pages, bool delay_rmap,
> +		int page_size)
>  {
>  	int flags = delay_rmap ? ENCODED_PAGE_BIT_DELAY_RMAP : 0;
>  	struct mmu_gather_batch *batch;
> @@ -126,6 +144,8 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
>  
>  #ifdef CONFIG_MMU_GATHER_PAGE_SIZE
>  	VM_WARN_ON(tlb->page_size != page_size);
> +	VM_WARN_ON_ONCE(nr_pages != 1 && page_size != PAGE_SIZE);
> +	VM_WARN_ON_ONCE(page_folio(page) != page_folio(page + nr_pages - 1));

I've forgotten the rules for when it is ok to assume contiguous PFNs' struct
pages are contiguous in virtual memory? I think its fine as long as the pages
belong to the same folio and the folio order <= MAX_ORDER? So `page + nr_pages -
1` is safe?

Assuming this is the case:

Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>


>  #endif
>  
>  	batch = tlb->active;
> @@ -133,17 +153,40 @@ bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
>  	 * Add the page and check if we are full. If so
>  	 * force a flush.
>  	 */
> -	batch->encoded_pages[batch->nr++] = encode_page(page, flags);
> -	if (batch->nr == batch->max) {
> +	if (likely(nr_pages == 1)) {
> +		batch->encoded_pages[batch->nr++] = encode_page(page, flags);
> +	} else {
> +		flags |= ENCODED_PAGE_BIT_NR_PAGES_NEXT;
> +		batch->encoded_pages[batch->nr++] = encode_page(page, flags);
> +		batch->encoded_pages[batch->nr++] = encode_nr_pages(nr_pages);
> +	}
> +	/*
> +	 * Make sure that we can always add another "page" + "nr_pages",
> +	 * requiring two entries instead of only a single one.
> +	 */
> +	if (batch->nr >= batch->max - 1) {
>  		if (!tlb_next_batch(tlb))
>  			return true;
>  		batch = tlb->active;
>  	}
> -	VM_BUG_ON_PAGE(batch->nr > batch->max, page);
> +	VM_BUG_ON_PAGE(batch->nr > batch->max - 1, page);
>  
>  	return false;
>  }
>  
> +bool __tlb_remove_folio_pages(struct mmu_gather *tlb, struct page *page,
> +		unsigned int nr_pages, bool delay_rmap)
> +{
> +	return __tlb_remove_folio_pages_size(tlb, page, nr_pages, delay_rmap,
> +					     PAGE_SIZE);
> +}
> +
> +bool __tlb_remove_page_size(struct mmu_gather *tlb, struct page *page,
> +		bool delay_rmap, int page_size)
> +{
> +	return __tlb_remove_folio_pages_size(tlb, page, 1, delay_rmap, page_size);
> +}
> +
>  #endif /* MMU_GATHER_NO_GATHER */
>  
>  #ifdef CONFIG_MMU_GATHER_TABLE_FREE
> diff --git a/mm/swap.c b/mm/swap.c
> index cd8f0150ba3a..e5380d732c0d 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -967,11 +967,17 @@ void release_pages(release_pages_arg arg, int nr)
>  	unsigned int lock_batch;
>  
>  	for (i = 0; i < nr; i++) {
> +		unsigned int nr_refs = 1;
>  		struct folio *folio;
>  
>  		/* Turn any of the argument types into a folio */
>  		folio = page_folio(encoded_page_ptr(encoded[i]));
>  
> +		/* Is our next entry actually "nr_pages" -> "nr_refs" ? */
> +		if (unlikely(encoded_page_flags(encoded[i]) &
> +			     ENCODED_PAGE_BIT_NR_PAGES_NEXT))
> +			nr_refs = encoded_nr_pages(encoded[++i]);
> +
>  		/*
>  		 * Make sure the IRQ-safe lock-holding time does not get
>  		 * excessive with a continuous string of pages from the
> @@ -990,14 +996,14 @@ void release_pages(release_pages_arg arg, int nr)
>  				unlock_page_lruvec_irqrestore(lruvec, flags);
>  				lruvec = NULL;
>  			}
> -			if (put_devmap_managed_page(&folio->page))
> +			if (put_devmap_managed_page_refs(&folio->page, nr_refs))
>  				continue;
> -			if (folio_put_testzero(folio))
> +			if (folio_ref_sub_and_test(folio, nr_refs))
>  				free_zone_device_page(&folio->page);
>  			continue;
>  		}
>  
> -		if (!folio_put_testzero(folio))
> +		if (!folio_ref_sub_and_test(folio, nr_refs))
>  			continue;
>  
>  		if (folio_test_large(folio)) {
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index 7255c01a1e4e..2f540748f7c0 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -311,8 +311,19 @@ void free_page_and_swap_cache(struct page *page)
>  void free_pages_and_swap_cache(struct encoded_page **pages, int nr)
>  {
>  	lru_add_drain();
> -	for (int i = 0; i < nr; i++)
> -		free_swap_cache(encoded_page_ptr(pages[i]));
> +	for (int i = 0; i < nr; i++) {
> +		struct page *page = encoded_page_ptr(pages[i]);
> +
> +		/*
> +		 * Skip over the "nr_pages" entry. It's sufficient to call
> +		 * free_swap_cache() only once per folio.
> +		 */
> +		if (unlikely(encoded_page_flags(pages[i]) &
> +			     ENCODED_PAGE_BIT_NR_PAGES_NEXT))
> +			i++;
> +
> +		free_swap_cache(page);
> +	}
>  	release_pages(pages, nr);
>  }
>