[PATCH] gup: optimize longterm pin_user

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] gup: optimize longterm pin_user_pages() for large folio
@ 2025-05-30  9:23 lizhe.67
  2025-05-30  9:53 ` Dev Jain
  2025-05-30 11:31 ` David Hildenbrand
  0 siblings, 2 replies; 9+ messages in thread
From: lizhe.67 @ 2025-05-30  9:23 UTC (permalink / raw)
  To: akpm, david, jgg, jhubbard, peterx
  Cc: linux-mm, linux-kernel, muchun.song, lizhe.67

From: Li Zhe <lizhe.67@bytedance.com>

In the current implementation of the longterm pin_user_pages() function,
we invoke the collect_longterm_unpinnable_folios() function. This function
iterates through the list to check whether each folio belongs to the
"longterm_unpinnabled" category. The folios in this list essentially
correspond to a contiguous region of user-space addresses, with each folio
representing a physical address in increments of PAGESIZE. If this
user-space address range is mapped with large folio, we can optimize the
performance of function pin_user_pages() by reducing the number of if-else
branches and the frequency of memory accesses using READ_ONCE. This patch
leverages this approach to achieve performance improvements.

The performance test results obtained through the gup_test tool from the
kernel source tree are as follows. We achieve an improvement of over 75%
for large folio with pagesize=2M. For normal page, we have only observed
a very slight degradation in performance.

Without this patch:

    [root@localhost ~] ./gup_test -HL -m 8192 -n 512
    TAP version 13
    1..1
    # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
    ok 1 ioctl status 0
    # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
    [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
    TAP version 13
    1..1
    # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
    ok 1 ioctl status 0
    # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

With this patch:

    [root@localhost ~] ./gup_test -HL -m 8192 -n 512
    TAP version 13
    1..1
    # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
    ok 1 ioctl status 0
    # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
    [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
    TAP version 13
    1..1
    # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
    ok 1 ioctl status 0
    # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 mm/gup.c | 31 +++++++++++++++++++++++--------
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 84461d384ae2..8c11418036e2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
 		unpin_user_pages(pofs->pages, pofs->nr_entries);
 }
 
+static struct folio *pofs_next_folio(struct folio *folio,
+				struct pages_or_folios *pofs, long *index_ptr)
+{
+	long i = *index_ptr + 1;
+	unsigned long nr_pages = folio_nr_pages(folio);
+
+	if (!pofs->has_folios)
+		while ((i < pofs->nr_entries) &&
+			/* Is this page part of this folio? */
+			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
+			i++;
+
+	if (unlikely(i == pofs->nr_entries))
+		return NULL;
+	*index_ptr = i;
+
+	return pofs_get_folio(pofs, i);
+}
+
 /*
  * Returns the number of collected folios. Return value is always >= 0.
  */
@@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
 		struct list_head *movable_folio_list,
 		struct pages_or_folios *pofs)
 {
-	struct folio *prev_folio = NULL;
 	bool drain_allow = true;
-	unsigned long i;
-
-	for (i = 0; i < pofs->nr_entries; i++) {
-		struct folio *folio = pofs_get_folio(pofs, i);
+	long i = 0;
+	struct folio *folio;
 
-		if (folio == prev_folio)
-			continue;
-		prev_folio = folio;
+	for (folio = pofs_get_folio(pofs, 0); folio;
+			folio = pofs_next_folio(folio, pofs, &i)) {
 
 		if (folio_is_longterm_pinnable(folio))
 			continue;
-- 
2.20.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30  9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
@ 2025-05-30  9:53 ` Dev Jain
  2025-05-30 10:04   ` lizhe.67
  2025-05-30 11:31 ` David Hildenbrand
  1 sibling, 1 reply; 9+ messages in thread
From: Dev Jain @ 2025-05-30  9:53 UTC (permalink / raw)
  To: lizhe.67, akpm, david, jgg, jhubbard, peterx
  Cc: linux-mm, linux-kernel, muchun.song, Ryan Roberts

[-- Attachment #1: Type: text/plain, Size: 2510 bytes --]


On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
> From: Li Zhe<lizhe.67@bytedance.com>
>
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the number of if-else
> branches and the frequency of memory accesses using READ_ONCE. This patch
> leverages this approach to achieve performance improvements.
>
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 75%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.


Thanks for the patch! I have no idea on GUP but in my limited understanding
the patch looks fine, let's wait for more comments.

[----]

>   }
>   
> +static struct folio *pofs_next_folio(struct folio *folio,
> +				struct pages_or_folios *pofs, long *index_ptr)
> +{
> +	long i = *index_ptr + 1;
> +	unsigned long nr_pages = folio_nr_pages(folio);
> +
> +	if (!pofs->has_folios)
> +		while ((i < pofs->nr_entries) &&
> +			/* Is this page part of this folio? */
> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> +			i++;
> +
> +	if (unlikely(i == pofs->nr_entries))
> +		return NULL;
> +	*index_ptr = i;
> +
> +	return pofs_get_folio(pofs, i);
> +}
> +
>   /*
>    * Returns the number of collected folios. Return value is always >= 0.
>    */
> @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
>   		struct list_head *movable_folio_list,
>   		struct pages_or_folios *pofs)
>   {
> -	struct folio *prev_folio = NULL;
>   	bool drain_allow = true;
> -	unsigned long i;
> -
> -	for (i = 0; i < pofs->nr_entries; i++) {
> -		struct folio *folio = pofs_get_folio(pofs, i);
> +	long i = 0;


Why not unsigned long?


> +	struct folio *folio;
>   
> -		if (folio == prev_folio)
> -			continue;
> -		prev_folio = folio;
> +	for (folio = pofs_get_folio(pofs, 0); folio;
> +			folio = pofs_next_folio(folio, pofs, &i)) {
>   
>   		if (folio_is_longterm_pinnable(folio))
>   			continue;

[-- Attachment #2: Type: text/html, Size: 3444 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30  9:53 ` Dev Jain
@ 2025-05-30 10:04   ` lizhe.67
  2025-05-30 10:12     ` Dev Jain
  0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 10:04 UTC (permalink / raw)
  To: dev.jain
  Cc: akpm, david, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
	muchun.song, peterx, ryan.roberts

On Fri, 30 May 2025 15:23:49 +0530, dev.jain@arm.com wrote:

> On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
> > From: Li Zhe<lizhe.67@bytedance.com>
> >
> > In the current implementation of the longterm pin_user_pages() function,
> > we invoke the collect_longterm_unpinnable_folios() function. This function
> > iterates through the list to check whether each folio belongs to the
> > "longterm_unpinnabled" category. The folios in this list essentially
> > correspond to a contiguous region of user-space addresses, with each folio
> > representing a physical address in increments of PAGESIZE. If this
> > user-space address range is mapped with large folio, we can optimize the
> > performance of function pin_user_pages() by reducing the number of if-else
> > branches and the frequency of memory accesses using READ_ONCE. This patch
> > leverages this approach to achieve performance improvements.
> >
> > The performance test results obtained through the gup_test tool from the
> > kernel source tree are as follows. We achieve an improvement of over 75%
> > for large folio with pagesize=2M. For normal page, we have only observed
> > a very slight degradation in performance.
> 
> 
> Thanks for the patch! I have no idea on GUP but in my limited understanding
> the patch looks fine, let's wait for more comments.
> 
> [----]
> 
> >   }
> >   
> > +static struct folio *pofs_next_folio(struct folio *folio,
> > +				struct pages_or_folios *pofs, long *index_ptr)
> > +{
> > +	long i = *index_ptr + 1;
> > +	unsigned long nr_pages = folio_nr_pages(folio);
> > +
> > +	if (!pofs->has_folios)
> > +		while ((i < pofs->nr_entries) &&
> > +			/* Is this page part of this folio? */
> > +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> > +			i++;
> > +
> > +	if (unlikely(i == pofs->nr_entries))
> > +		return NULL;
> > +	*index_ptr = i;
> > +
> > +	return pofs_get_folio(pofs, i);
> > +}
> > +
> >   /*
> >    * Returns the number of collected folios. Return value is always >= 0.
> >    */
> > @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
> >   		struct list_head *movable_folio_list,
> >   		struct pages_or_folios *pofs)
> >   {
> > -	struct folio *prev_folio = NULL;
> >   	bool drain_allow = true;
> > -	unsigned long i;
> > -
> > -	for (i = 0; i < pofs->nr_entries; i++) {
> > -		struct folio *folio = pofs_get_folio(pofs, i);
> > +	long i = 0;
> 
> 
> Why not unsigned long?

This is because I want to match the type of pages_or_folios->nr_entries.

Thanks,
Zhe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30 10:04   ` lizhe.67
@ 2025-05-30 10:12     ` Dev Jain
  0 siblings, 0 replies; 9+ messages in thread
From: Dev Jain @ 2025-05-30 10:12 UTC (permalink / raw)
  To: lizhe.67
  Cc: akpm, david, jgg, jhubbard, linux-kernel, linux-mm, muchun.song,
	peterx, ryan.roberts


On 30/05/25 3:34 pm, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 15:23:49 +0530, dev.jain@arm.com wrote:
>
>> On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
>>> From: Li Zhe<lizhe.67@bytedance.com>
>>>
>>> In the current implementation of the longterm pin_user_pages() function,
>>> we invoke the collect_longterm_unpinnable_folios() function. This function
>>> iterates through the list to check whether each folio belongs to the
>>> "longterm_unpinnabled" category. The folios in this list essentially
>>> correspond to a contiguous region of user-space addresses, with each folio
>>> representing a physical address in increments of PAGESIZE. If this
>>> user-space address range is mapped with large folio, we can optimize the
>>> performance of function pin_user_pages() by reducing the number of if-else
>>> branches and the frequency of memory accesses using READ_ONCE. This patch
>>> leverages this approach to achieve performance improvements.
>>>
>>> The performance test results obtained through the gup_test tool from the
>>> kernel source tree are as follows. We achieve an improvement of over 75%
>>> for large folio with pagesize=2M. For normal page, we have only observed
>>> a very slight degradation in performance.
>>
>> Thanks for the patch! I have no idea on GUP but in my limited understanding
>> the patch looks fine, let's wait for more comments.
>>
>> [----]
>>
>>>    }
>>>    
>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>> +				struct pages_or_folios *pofs, long *index_ptr)
>>> +{
>>> +	long i = *index_ptr + 1;
>>> +	unsigned long nr_pages = folio_nr_pages(folio);
>>> +
>>> +	if (!pofs->has_folios)
>>> +		while ((i < pofs->nr_entries) &&
>>> +			/* Is this page part of this folio? */
>>> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>> +			i++;
>>> +
>>> +	if (unlikely(i == pofs->nr_entries))
>>> +		return NULL;
>>> +	*index_ptr = i;
>>> +
>>> +	return pofs_get_folio(pofs, i);
>>> +}
>>> +
>>>    /*
>>>     * Returns the number of collected folios. Return value is always >= 0.
>>>     */
>>> @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
>>>    		struct list_head *movable_folio_list,
>>>    		struct pages_or_folios *pofs)
>>>    {
>>> -	struct folio *prev_folio = NULL;
>>>    	bool drain_allow = true;
>>> -	unsigned long i;
>>> -
>>> -	for (i = 0; i < pofs->nr_entries; i++) {
>>> -		struct folio *folio = pofs_get_folio(pofs, i);
>>> +	long i = 0;
>>
>> Why not unsigned long?
> This is because I want to match the type of pages_or_folios->nr_entries.


Ah right, although I wonder why we are not using unsigned long in the first place...


>
> Thanks,
> Zhe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30  9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
  2025-05-30  9:53 ` Dev Jain
@ 2025-05-30 11:31 ` David Hildenbrand
  2025-05-30 12:20   ` lizhe.67
  1 sibling, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 11:31 UTC (permalink / raw)
  To: lizhe.67, akpm, jgg, jhubbard, peterx; +Cc: linux-mm, linux-kernel, muchun.song

On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
> 
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the number of if-else
> branches and the frequency of memory accesses using READ_ONCE. This patch
> leverages this approach to achieve performance improvements.
> 
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 75%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.
> 
> Without this patch:
> 
>      [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>      TAP version 13
>      1..1
>      # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
>      ok 1 ioctl status 0
>      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>      [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>      TAP version 13
>      1..1
>      # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
>      ok 1 ioctl status 0
>      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> With this patch:
> 
>      [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>      TAP version 13
>      1..1
>      # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
>      ok 1 ioctl status 0
>      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>      [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>      TAP version 13
>      1..1
>      # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
>      ok 1 ioctl status 0
>      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
>   mm/gup.c | 31 +++++++++++++++++++++++--------
>   1 file changed, 23 insertions(+), 8 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 84461d384ae2..8c11418036e2 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>   		unpin_user_pages(pofs->pages, pofs->nr_entries);
>   }
>   
> +static struct folio *pofs_next_folio(struct folio *folio,
> +				struct pages_or_folios *pofs, long *index_ptr)
> +{
> +	long i = *index_ptr + 1;
> +	unsigned long nr_pages = folio_nr_pages(folio);
> +
> +	if (!pofs->has_folios)
> +		while ((i < pofs->nr_entries) &&
> +			/* Is this page part of this folio? */
> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))

passing in a page that does not belong to the folio looks shaky and not 
future proof.

folio_page() == folio

is cleaner

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30 11:31 ` David Hildenbrand
@ 2025-05-30 12:20   ` lizhe.67
  2025-05-30 13:08     ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 12:20 UTC (permalink / raw)
  To: david
  Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
	muchun.song, peterx

On Fri, 30 May 2025 13:31:26 +0200, david@redhat.com wrote:

> On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> > 
> > In the current implementation of the longterm pin_user_pages() function,
> > we invoke the collect_longterm_unpinnable_folios() function. This function
> > iterates through the list to check whether each folio belongs to the
> > "longterm_unpinnabled" category. The folios in this list essentially
> > correspond to a contiguous region of user-space addresses, with each folio
> > representing a physical address in increments of PAGESIZE. If this
> > user-space address range is mapped with large folio, we can optimize the
> > performance of function pin_user_pages() by reducing the number of if-else
> > branches and the frequency of memory accesses using READ_ONCE. This patch
> > leverages this approach to achieve performance improvements.
> > 
> > The performance test results obtained through the gup_test tool from the
> > kernel source tree are as follows. We achieve an improvement of over 75%
> > for large folio with pagesize=2M. For normal page, we have only observed
> > a very slight degradation in performance.
> > 
> > Without this patch:
> > 
> >      [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> >      TAP version 13
> >      1..1
> >      # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
> >      ok 1 ioctl status 0
> >      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >      [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> >      TAP version 13
> >      1..1
> >      # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
> >      ok 1 ioctl status 0
> >      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > 
> > With this patch:
> > 
> >      [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> >      TAP version 13
> >      1..1
> >      # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
> >      ok 1 ioctl status 0
> >      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >      [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> >      TAP version 13
> >      1..1
> >      # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
> >      ok 1 ioctl status 0
> >      # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > 
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > ---
> >   mm/gup.c | 31 +++++++++++++++++++++++--------
> >   1 file changed, 23 insertions(+), 8 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 84461d384ae2..8c11418036e2 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> >   		unpin_user_pages(pofs->pages, pofs->nr_entries);
> >   }
> >   
> > +static struct folio *pofs_next_folio(struct folio *folio,
> > +				struct pages_or_folios *pofs, long *index_ptr)
> > +{
> > +	long i = *index_ptr + 1;
> > +	unsigned long nr_pages = folio_nr_pages(folio);
> > +
> > +	if (!pofs->has_folios)
> > +		while ((i < pofs->nr_entries) &&
> > +			/* Is this page part of this folio? */
> > +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> 
> passing in a page that does not belong to the folio looks shaky and not 
> future proof.
> 
> folio_page() == folio
> 
> is cleaner

Yes, this approach is cleaner. However, when obtaining a folio
corresponding to a page through the page_folio() interface,
READ_ONCE() is used internally to read from memory, which results
in the performance of pin_user_pages() being worse than before.
Could you please suggest an alternative approach to address this
problem?

Thanks,
Zhe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30 12:20   ` lizhe.67
@ 2025-05-30 13:08     ` David Hildenbrand
  2025-05-30 15:02       ` lizhe.67
  0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 13:08 UTC (permalink / raw)
  To: lizhe.67; +Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, muchun.song, peterx

On 30.05.25 14:20, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 13:31:26 +0200, david@redhat.com wrote:
> 
>> On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
>>> From: Li Zhe <lizhe.67@bytedance.com>
>>>
>>> In the current implementation of the longterm pin_user_pages() function,
>>> we invoke the collect_longterm_unpinnable_folios() function. This function
>>> iterates through the list to check whether each folio belongs to the
>>> "longterm_unpinnabled" category. The folios in this list essentially
>>> correspond to a contiguous region of user-space addresses, with each folio
>>> representing a physical address in increments of PAGESIZE. If this
>>> user-space address range is mapped with large folio, we can optimize the
>>> performance of function pin_user_pages() by reducing the number of if-else
>>> branches and the frequency of memory accesses using READ_ONCE. This patch
>>> leverages this approach to achieve performance improvements.
>>>
>>> The performance test results obtained through the gup_test tool from the
>>> kernel source tree are as follows. We achieve an improvement of over 75%
>>> for large folio with pagesize=2M. For normal page, we have only observed
>>> a very slight degradation in performance.
>>>
>>> Without this patch:
>>>
>>>       [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>>>       TAP version 13
>>>       1..1
>>>       # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
>>>       ok 1 ioctl status 0
>>>       # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>       [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>>>       TAP version 13
>>>       1..1
>>>       # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
>>>       ok 1 ioctl status 0
>>>       # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>
>>> With this patch:
>>>
>>>       [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>>>       TAP version 13
>>>       1..1
>>>       # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
>>>       ok 1 ioctl status 0
>>>       # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>       [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>>>       TAP version 13
>>>       1..1
>>>       # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
>>>       ok 1 ioctl status 0
>>>       # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>
>>> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
>>> ---
>>>    mm/gup.c | 31 +++++++++++++++++++++++--------
>>>    1 file changed, 23 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 84461d384ae2..8c11418036e2 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>    		unpin_user_pages(pofs->pages, pofs->nr_entries);
>>>    }
>>>    
>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>> +				struct pages_or_folios *pofs, long *index_ptr)
>>> +{
>>> +	long i = *index_ptr + 1;
>>> +	unsigned long nr_pages = folio_nr_pages(folio);
>>> +
>>> +	if (!pofs->has_folios)
>>> +		while ((i < pofs->nr_entries) &&
>>> +			/* Is this page part of this folio? */
>>> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>
>> passing in a page that does not belong to the folio looks shaky and not
>> future proof.
>>
>> folio_page() == folio
>>
>> is cleaner
> 
> Yes, this approach is cleaner. However, when obtaining a folio
> corresponding to a page through the page_folio() interface,

Right, I meant page_folio().

> READ_ONCE() is used internally to read from memory, which results
> in the performance of pin_user_pages() being worse than before.

See contig_pages in [1] how it can be done using folio_page().


[1] 
https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30 13:08     ` David Hildenbrand
@ 2025-05-30 15:02       ` lizhe.67
  2025-05-30 20:37         ` David Hildenbrand
  0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 15:02 UTC (permalink / raw)
  To: david
  Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
	muchun.song, peterx

On Fri, 30 May 2025 15:08:06 +0200, david@redhat.com wrote:

> >>> diff --git a/mm/gup.c b/mm/gup.c
> >>> index 84461d384ae2..8c11418036e2 100644
> >>> --- a/mm/gup.c
> >>> +++ b/mm/gup.c
> >>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> >>>    		unpin_user_pages(pofs->pages, pofs->nr_entries);
> >>>    }
> >>>    
> >>> +static struct folio *pofs_next_folio(struct folio *folio,
> >>> +				struct pages_or_folios *pofs, long *index_ptr)
> >>> +{
> >>> +	long i = *index_ptr + 1;
> >>> +	unsigned long nr_pages = folio_nr_pages(folio);
> >>> +
> >>> +	if (!pofs->has_folios)
> >>> +		while ((i < pofs->nr_entries) &&
> >>> +			/* Is this page part of this folio? */
> >>> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> >>
> >> passing in a page that does not belong to the folio looks shaky and not
> >> future proof.
> >>
> >> folio_page() == folio
> >>
> >> is cleaner
> > 
> > Yes, this approach is cleaner. However, when obtaining a folio
> > corresponding to a page through the page_folio() interface,
> 
> Right, I meant page_folio().
> 
> > READ_ONCE() is used internally to read from memory, which results
> > in the performance of pin_user_pages() being worse than before.
> 
> See contig_pages in [1] how it can be done using folio_page().
> 
> [1] 
> https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u

Thank you for your suggestion. It is indeed a good idea. I
initially thought along the same lines. However, I found that
the conditions for optimization here are slightly different
from those in contig_pages(). Here, it is only necessary to
ensure that the page is within the folio, rather than
requiring contiguity.

I have made some preliminary attempts: using the method of
contig_pages() still gets an optimization effect of
approximately 73%. On the other hand, if we use the following
code to determine whether page_to_pfn(pofs->pages[i]) belongs
to the range
[folio_pfn(folio), folio_pfn(folio) + folio_nr_pages(folio)),
the optimization effect is about 70%. I sincerely hope to
hear your thoughts on which solution you might favor.

+static struct folio *pofs_next_folio(struct folio *folio,
+		struct pages_or_folios *pofs, long *index_ptr)
+{
+	long i = *index_ptr + 1;
+
+	if (!pofs->has_folios) {
+		unsigned long start_pfn = folio_pfn(folio);
+		unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
+
+		for (; i < pofs->nr_entries; i++) {
+			unsigned long pfn = page_to_pfn(pofs->pages[i]);
+
+			/* Is this page part of this folio? */
+			if ((pfn < start_pfn) || (pfn >= end_pfn))
+				break;
+		}
+	}
+
+	if (unlikely(i == pofs->nr_entries))
+		return NULL;
+	*index_ptr = i;
+
+	return pofs_get_folio(pofs, i);
+}

Thanks,
Zhe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
  2025-05-30 15:02       ` lizhe.67
@ 2025-05-30 20:37         ` David Hildenbrand
  0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 20:37 UTC (permalink / raw)
  To: lizhe.67; +Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, muchun.song, peterx

On 30.05.25 17:02, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 15:08:06 +0200, david@redhat.com wrote:
> 
>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>> index 84461d384ae2..8c11418036e2 100644
>>>>> --- a/mm/gup.c
>>>>> +++ b/mm/gup.c
>>>>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>>>     		unpin_user_pages(pofs->pages, pofs->nr_entries);
>>>>>     }
>>>>>     
>>>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>>>> +				struct pages_or_folios *pofs, long *index_ptr)
>>>>> +{
>>>>> +	long i = *index_ptr + 1;
>>>>> +	unsigned long nr_pages = folio_nr_pages(folio);
>>>>> +
>>>>> +	if (!pofs->has_folios)
>>>>> +		while ((i < pofs->nr_entries) &&
>>>>> +			/* Is this page part of this folio? */
>>>>> +			(folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>>>
>>>> passing in a page that does not belong to the folio looks shaky and not
>>>> future proof.
>>>>
>>>> folio_page() == folio
>>>>
>>>> is cleaner
>>>
>>> Yes, this approach is cleaner. However, when obtaining a folio
>>> corresponding to a page through the page_folio() interface,
>>
>> Right, I meant page_folio().
>>
>>> READ_ONCE() is used internally to read from memory, which results
>>> in the performance of pin_user_pages() being worse than before.
>>
>> See contig_pages in [1] how it can be done using folio_page().
>>
>> [1]
>> https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u
> 
> Thank you for your suggestion. It is indeed a good idea. I
> initially thought along the same lines. However, I found that
> the conditions for optimization here are slightly different
> from those in contig_pages(). Here, it is only necessary to
> ensure that the page is within the folio, rather than
> requiring contiguity.

Yes.

> 
> I have made some preliminary attempts: using the method of
> contig_pages() still gets an optimization effect of
> approximately 73%. On the other hand, if we use the following
> code to determine whether page_to_pfn(pofs->pages[i]) belongs
> to the range
> [folio_pfn(folio), folio_pfn(folio) + folio_nr_pages(folio)),
> the optimization effect is about 70%. I sincerely hope to
> hear your thoughts on which solution you might favor.
> 
> +static struct folio *pofs_next_folio(struct folio *folio,
> +		struct pages_or_folios *pofs, long *index_ptr)
> +{
> +	long i = *index_ptr + 1;
> +
> +	if (!pofs->has_folios) {
> +		unsigned long start_pfn = folio_pfn(folio);
> +		unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
> +
> +		for (; i < pofs->nr_entries; i++) {
> +			unsigned long pfn = page_to_pfn(pofs->pages[i]);
> +
> +			/* Is this page part of this folio? */
> +			if ((pfn < start_pfn) || (pfn >= end_pfn))
> +				break;

folio_page() is extremely efficient with CONFIG_SPARSEMEM_VMEMMAP.  I am 
not sure how efficient it will be in the future once "struct folio" is 
no longer an overlay of "struct page".

page_to_pfn() should be slightly more expensive than folio_page() right 
now, but maybe more efficient in the future (maybe).

I don't particularly care, whatever you prefer :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-05-30 20:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-30  9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
2025-05-30  9:53 ` Dev Jain
2025-05-30 10:04   ` lizhe.67
2025-05-30 10:12     ` Dev Jain
2025-05-30 11:31 ` David Hildenbrand
2025-05-30 12:20   ` lizhe.67
2025-05-30 13:08     ` David Hildenbrand
2025-05-30 15:02       ` lizhe.67
2025-05-30 20:37         ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox