linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: lizhe.67@bytedance.com
To: akpm@linux-foundation.org
Cc: david@redhat.com, dev.jain@arm.com, jgg@ziepe.ca,
	jhubbard@nvidia.com, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, muchun.song@linux.dev, peterx@redhat.com,
	lizhe.67@bytedance.com
Subject: Re: [PATCH v4] gup: optimize longterm pin_user_pages() for large folio
Date: Fri,  6 Jun 2025 17:19:17 +0800	[thread overview]
Message-ID: <20250606091917.91384-1-lizhe.67@bytedance.com> (raw)
In-Reply-To: <20250606023742.58344-1-lizhe.67@bytedance.com>

On Fri, 6 Jun 2025 10:37:42 +0800, lizhe.67@bytedance.com wrote:

> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function collect_longterm_unpinnable_folios() by reducing
> the using of READ_ONCE() invoked in
> pofs_get_folio()->page_folio()->_compound_head(). Also, we can simplify
> the logic of collect_longterm_unpinnable_folios(). Instead of comparing
> with prev_folio after calling pofs_get_folio(), we can check whether the
> next page is within the same folio.
> 
> The performance test results, based on v6.15, obtained through the
> gup_test tool from the kernel source tree are as follows. We achieve an
> improvement of over 66% for large folio with pagesize=2M. For small folio,
> we have only observed a very slight degradation in performance.
> 
> Without this patch:
> 
>     [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:14391 put:10858 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>     [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:130538 put:31676 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> With this patch:
> 
>     [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:4867 put:10516 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>     [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>     TAP version 13
>     1..1
>     # PIN_LONGTERM_BENCHMARK: Time: get:131798 put:31328 us#
>     ok 1 ioctl status 0
>     # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> Changelogs:
> 
> v3->v4:
> - Fix some issues of code formatting.
> 
> v2->v3:
> - Update performance test data based on v6.15.
> - Refine the description of the optimization approach in commit message.
> - Fix some issues of code formatting.
> - Fine-tune the conditions for entering the optimization path.
> 
> v1->v2:
> - Modify some unreliable code.
> - Update performance test data.
> 
> v3 patch: https://lore.kernel.org/all/20250605033430.83142-1-lizhe.67@bytedance.com/
> v2 patch: https://lore.kernel.org/all/20250604031536.9053-1-lizhe.67@bytedance.com/
> v1 patch: https://lore.kernel.org/all/20250530092351.32709-1-lizhe.67@bytedance.com/
> 
>  mm/gup.c | 37 +++++++++++++++++++++++++++++--------
>  1 file changed, 29 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 84461d384ae2..be968640b935 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2317,6 +2317,31 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>  		unpin_user_pages(pofs->pages, pofs->nr_entries);
>  }
>  
> +static struct folio *pofs_next_folio(struct folio *folio,
> +		struct pages_or_folios *pofs, long *index_ptr)
> +{
> +	long i = *index_ptr + 1;
> +
> +	if (!pofs->has_folios && folio_test_large(folio)) {
> +		const unsigned long start_pfn = folio_pfn(folio);
> +		const unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
> +
> +		for (; i < pofs->nr_entries; i++) {
> +			unsigned long pfn = page_to_pfn(pofs->pages[i]);
> +
> +			/* Is this page part of this folio? */
> +			if (pfn < start_pfn || pfn >= end_pfn)
> +				break;
> +		}
> +	}
> +
> +	if (unlikely(i == pofs->nr_entries))
> +		return NULL;
> +	*index_ptr = i;
> +
> +	return pofs_get_folio(pofs, i);
> +}
> +
>  /*
>   * Returns the number of collected folios. Return value is always >= 0.
>   */
> @@ -2324,16 +2349,12 @@ static void collect_longterm_unpinnable_folios(
>  		struct list_head *movable_folio_list,
>  		struct pages_or_folios *pofs)
>  {
> -	struct folio *prev_folio = NULL;
>  	bool drain_allow = true;
> -	unsigned long i;
> -
> -	for (i = 0; i < pofs->nr_entries; i++) {
> -		struct folio *folio = pofs_get_folio(pofs, i);
> +	struct folio *folio;
> +	long i = 0;
>  
> -		if (folio == prev_folio)
> -			continue;
> -		prev_folio = folio;
> +	for (folio = pofs_get_folio(pofs, i); folio;
> +		 folio = pofs_next_folio(folio, pofs, &i)) {
>  
>  		if (folio_is_longterm_pinnable(folio))
>  			continue;

Hi Andrew,

I apologize for the inconvenience I've caused. It seems that there are
still one formatting issue with the patch (thanks to David for pointing
it out). We need to apply the following fixup.

Thank you for your time and patience!

diff --git a/mm/gup.c b/mm/gup.c
index be968640b935..85112c904a4d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2354,7 +2354,7 @@ static void collect_longterm_unpinnable_folios(
 	long i = 0;
 
 	for (folio = pofs_get_folio(pofs, i); folio;
-		 folio = pofs_next_folio(folio, pofs, &i)) {
+	     folio = pofs_next_folio(folio, pofs, &i)) {
 
 		if (folio_is_longterm_pinnable(folio))
 			continue;

Thanks,
Zhe


      parent reply	other threads:[~2025-06-06  9:19 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-06  2:37 lizhe.67
2025-06-06  7:21 ` David Hildenbrand
2025-06-06  7:37   ` lizhe.67
2025-06-06  7:58     ` David Hildenbrand
2025-06-06  8:27       ` lizhe.67
2025-06-06  8:50         ` David Hildenbrand
2025-06-06  8:58           ` lizhe.67
2025-06-08 17:10   ` David Laight
2025-06-06  9:19 ` lizhe.67 [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250606091917.91384-1-lizhe.67@bytedance.com \
    --to=lizhe.67@bytedance.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=jgg@ziepe.ca \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox