* [PATCH] gup: optimize longterm pin_user_pages() for large folio
@ 2025-05-30 9:23 lizhe.67
2025-05-30 9:53 ` Dev Jain
2025-05-30 11:31 ` David Hildenbrand
0 siblings, 2 replies; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 9:23 UTC (permalink / raw)
To: akpm, david, jgg, jhubbard, peterx
Cc: linux-mm, linux-kernel, muchun.song, lizhe.67
From: Li Zhe <lizhe.67@bytedance.com>
In the current implementation of the longterm pin_user_pages() function,
we invoke the collect_longterm_unpinnable_folios() function. This function
iterates through the list to check whether each folio belongs to the
"longterm_unpinnabled" category. The folios in this list essentially
correspond to a contiguous region of user-space addresses, with each folio
representing a physical address in increments of PAGESIZE. If this
user-space address range is mapped with large folio, we can optimize the
performance of function pin_user_pages() by reducing the number of if-else
branches and the frequency of memory accesses using READ_ONCE. This patch
leverages this approach to achieve performance improvements.
The performance test results obtained through the gup_test tool from the
kernel source tree are as follows. We achieve an improvement of over 75%
for large folio with pagesize=2M. For normal page, we have only observed
a very slight degradation in performance.
Without this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
With this patch:
[root@localhost ~] ./gup_test -HL -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
[root@localhost ~]# ./gup_test -LT -m 8192 -n 512
TAP version 13
1..1
# PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
ok 1 ioctl status 0
# Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
mm/gup.c | 31 +++++++++++++++++++++++--------
1 file changed, 23 insertions(+), 8 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index 84461d384ae2..8c11418036e2 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
unpin_user_pages(pofs->pages, pofs->nr_entries);
}
+static struct folio *pofs_next_folio(struct folio *folio,
+ struct pages_or_folios *pofs, long *index_ptr)
+{
+ long i = *index_ptr + 1;
+ unsigned long nr_pages = folio_nr_pages(folio);
+
+ if (!pofs->has_folios)
+ while ((i < pofs->nr_entries) &&
+ /* Is this page part of this folio? */
+ (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
+ i++;
+
+ if (unlikely(i == pofs->nr_entries))
+ return NULL;
+ *index_ptr = i;
+
+ return pofs_get_folio(pofs, i);
+}
+
/*
* Returns the number of collected folios. Return value is always >= 0.
*/
@@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
struct list_head *movable_folio_list,
struct pages_or_folios *pofs)
{
- struct folio *prev_folio = NULL;
bool drain_allow = true;
- unsigned long i;
-
- for (i = 0; i < pofs->nr_entries; i++) {
- struct folio *folio = pofs_get_folio(pofs, i);
+ long i = 0;
+ struct folio *folio;
- if (folio == prev_folio)
- continue;
- prev_folio = folio;
+ for (folio = pofs_get_folio(pofs, 0); folio;
+ folio = pofs_next_folio(folio, pofs, &i)) {
if (folio_is_longterm_pinnable(folio))
continue;
--
2.20.1
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
@ 2025-05-30 9:53 ` Dev Jain
2025-05-30 10:04 ` lizhe.67
2025-05-30 11:31 ` David Hildenbrand
1 sibling, 1 reply; 9+ messages in thread
From: Dev Jain @ 2025-05-30 9:53 UTC (permalink / raw)
To: lizhe.67, akpm, david, jgg, jhubbard, peterx
Cc: linux-mm, linux-kernel, muchun.song, Ryan Roberts
[-- Attachment #1: Type: text/plain, Size: 2510 bytes --]
On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
> From: Li Zhe<lizhe.67@bytedance.com>
>
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the number of if-else
> branches and the frequency of memory accesses using READ_ONCE. This patch
> leverages this approach to achieve performance improvements.
>
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 75%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.
Thanks for the patch! I have no idea on GUP but in my limited understanding
the patch looks fine, let's wait for more comments.
[----]
> }
>
> +static struct folio *pofs_next_folio(struct folio *folio,
> + struct pages_or_folios *pofs, long *index_ptr)
> +{
> + long i = *index_ptr + 1;
> + unsigned long nr_pages = folio_nr_pages(folio);
> +
> + if (!pofs->has_folios)
> + while ((i < pofs->nr_entries) &&
> + /* Is this page part of this folio? */
> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> + i++;
> +
> + if (unlikely(i == pofs->nr_entries))
> + return NULL;
> + *index_ptr = i;
> +
> + return pofs_get_folio(pofs, i);
> +}
> +
> /*
> * Returns the number of collected folios. Return value is always >= 0.
> */
> @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
> struct list_head *movable_folio_list,
> struct pages_or_folios *pofs)
> {
> - struct folio *prev_folio = NULL;
> bool drain_allow = true;
> - unsigned long i;
> -
> - for (i = 0; i < pofs->nr_entries; i++) {
> - struct folio *folio = pofs_get_folio(pofs, i);
> + long i = 0;
Why not unsigned long?
> + struct folio *folio;
>
> - if (folio == prev_folio)
> - continue;
> - prev_folio = folio;
> + for (folio = pofs_get_folio(pofs, 0); folio;
> + folio = pofs_next_folio(folio, pofs, &i)) {
>
> if (folio_is_longterm_pinnable(folio))
> continue;
[-- Attachment #2: Type: text/html, Size: 3444 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 9:53 ` Dev Jain
@ 2025-05-30 10:04 ` lizhe.67
2025-05-30 10:12 ` Dev Jain
0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 10:04 UTC (permalink / raw)
To: dev.jain
Cc: akpm, david, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
muchun.song, peterx, ryan.roberts
On Fri, 30 May 2025 15:23:49 +0530, dev.jain@arm.com wrote:
> On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
> > From: Li Zhe<lizhe.67@bytedance.com>
> >
> > In the current implementation of the longterm pin_user_pages() function,
> > we invoke the collect_longterm_unpinnable_folios() function. This function
> > iterates through the list to check whether each folio belongs to the
> > "longterm_unpinnabled" category. The folios in this list essentially
> > correspond to a contiguous region of user-space addresses, with each folio
> > representing a physical address in increments of PAGESIZE. If this
> > user-space address range is mapped with large folio, we can optimize the
> > performance of function pin_user_pages() by reducing the number of if-else
> > branches and the frequency of memory accesses using READ_ONCE. This patch
> > leverages this approach to achieve performance improvements.
> >
> > The performance test results obtained through the gup_test tool from the
> > kernel source tree are as follows. We achieve an improvement of over 75%
> > for large folio with pagesize=2M. For normal page, we have only observed
> > a very slight degradation in performance.
>
>
> Thanks for the patch! I have no idea on GUP but in my limited understanding
> the patch looks fine, let's wait for more comments.
>
> [----]
>
> > }
> >
> > +static struct folio *pofs_next_folio(struct folio *folio,
> > + struct pages_or_folios *pofs, long *index_ptr)
> > +{
> > + long i = *index_ptr + 1;
> > + unsigned long nr_pages = folio_nr_pages(folio);
> > +
> > + if (!pofs->has_folios)
> > + while ((i < pofs->nr_entries) &&
> > + /* Is this page part of this folio? */
> > + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> > + i++;
> > +
> > + if (unlikely(i == pofs->nr_entries))
> > + return NULL;
> > + *index_ptr = i;
> > +
> > + return pofs_get_folio(pofs, i);
> > +}
> > +
> > /*
> > * Returns the number of collected folios. Return value is always >= 0.
> > */
> > @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
> > struct list_head *movable_folio_list,
> > struct pages_or_folios *pofs)
> > {
> > - struct folio *prev_folio = NULL;
> > bool drain_allow = true;
> > - unsigned long i;
> > -
> > - for (i = 0; i < pofs->nr_entries; i++) {
> > - struct folio *folio = pofs_get_folio(pofs, i);
> > + long i = 0;
>
>
> Why not unsigned long?
This is because I want to match the type of pages_or_folios->nr_entries.
Thanks,
Zhe
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 10:04 ` lizhe.67
@ 2025-05-30 10:12 ` Dev Jain
0 siblings, 0 replies; 9+ messages in thread
From: Dev Jain @ 2025-05-30 10:12 UTC (permalink / raw)
To: lizhe.67
Cc: akpm, david, jgg, jhubbard, linux-kernel, linux-mm, muchun.song,
peterx, ryan.roberts
On 30/05/25 3:34 pm, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 15:23:49 +0530, dev.jain@arm.com wrote:
>
>> On 30/05/25 2:53 pm, lizhe.67@bytedance.com wrote:
>>> From: Li Zhe<lizhe.67@bytedance.com>
>>>
>>> In the current implementation of the longterm pin_user_pages() function,
>>> we invoke the collect_longterm_unpinnable_folios() function. This function
>>> iterates through the list to check whether each folio belongs to the
>>> "longterm_unpinnabled" category. The folios in this list essentially
>>> correspond to a contiguous region of user-space addresses, with each folio
>>> representing a physical address in increments of PAGESIZE. If this
>>> user-space address range is mapped with large folio, we can optimize the
>>> performance of function pin_user_pages() by reducing the number of if-else
>>> branches and the frequency of memory accesses using READ_ONCE. This patch
>>> leverages this approach to achieve performance improvements.
>>>
>>> The performance test results obtained through the gup_test tool from the
>>> kernel source tree are as follows. We achieve an improvement of over 75%
>>> for large folio with pagesize=2M. For normal page, we have only observed
>>> a very slight degradation in performance.
>>
>> Thanks for the patch! I have no idea on GUP but in my limited understanding
>> the patch looks fine, let's wait for more comments.
>>
>> [----]
>>
>>> }
>>>
>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>> + struct pages_or_folios *pofs, long *index_ptr)
>>> +{
>>> + long i = *index_ptr + 1;
>>> + unsigned long nr_pages = folio_nr_pages(folio);
>>> +
>>> + if (!pofs->has_folios)
>>> + while ((i < pofs->nr_entries) &&
>>> + /* Is this page part of this folio? */
>>> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>> + i++;
>>> +
>>> + if (unlikely(i == pofs->nr_entries))
>>> + return NULL;
>>> + *index_ptr = i;
>>> +
>>> + return pofs_get_folio(pofs, i);
>>> +}
>>> +
>>> /*
>>> * Returns the number of collected folios. Return value is always >= 0.
>>> */
>>> @@ -2324,16 +2343,12 @@ static void collect_longterm_unpinnable_folios(
>>> struct list_head *movable_folio_list,
>>> struct pages_or_folios *pofs)
>>> {
>>> - struct folio *prev_folio = NULL;
>>> bool drain_allow = true;
>>> - unsigned long i;
>>> -
>>> - for (i = 0; i < pofs->nr_entries; i++) {
>>> - struct folio *folio = pofs_get_folio(pofs, i);
>>> + long i = 0;
>>
>> Why not unsigned long?
> This is because I want to match the type of pages_or_folios->nr_entries.
Ah right, although I wonder why we are not using unsigned long in the first place...
>
> Thanks,
> Zhe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
2025-05-30 9:53 ` Dev Jain
@ 2025-05-30 11:31 ` David Hildenbrand
2025-05-30 12:20 ` lizhe.67
1 sibling, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 11:31 UTC (permalink / raw)
To: lizhe.67, akpm, jgg, jhubbard, peterx; +Cc: linux-mm, linux-kernel, muchun.song
On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
> From: Li Zhe <lizhe.67@bytedance.com>
>
> In the current implementation of the longterm pin_user_pages() function,
> we invoke the collect_longterm_unpinnable_folios() function. This function
> iterates through the list to check whether each folio belongs to the
> "longterm_unpinnabled" category. The folios in this list essentially
> correspond to a contiguous region of user-space addresses, with each folio
> representing a physical address in increments of PAGESIZE. If this
> user-space address range is mapped with large folio, we can optimize the
> performance of function pin_user_pages() by reducing the number of if-else
> branches and the frequency of memory accesses using READ_ONCE. This patch
> leverages this approach to achieve performance improvements.
>
> The performance test results obtained through the gup_test tool from the
> kernel source tree are as follows. We achieve an improvement of over 75%
> for large folio with pagesize=2M. For normal page, we have only observed
> a very slight degradation in performance.
>
> Without this patch:
>
> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> With this patch:
>
> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> TAP version 13
> 1..1
> # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
> ok 1 ioctl status 0
> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> ---
> mm/gup.c | 31 +++++++++++++++++++++++--------
> 1 file changed, 23 insertions(+), 8 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 84461d384ae2..8c11418036e2 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> unpin_user_pages(pofs->pages, pofs->nr_entries);
> }
>
> +static struct folio *pofs_next_folio(struct folio *folio,
> + struct pages_or_folios *pofs, long *index_ptr)
> +{
> + long i = *index_ptr + 1;
> + unsigned long nr_pages = folio_nr_pages(folio);
> +
> + if (!pofs->has_folios)
> + while ((i < pofs->nr_entries) &&
> + /* Is this page part of this folio? */
> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
passing in a page that does not belong to the folio looks shaky and not
future proof.
folio_page() == folio
is cleaner
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 11:31 ` David Hildenbrand
@ 2025-05-30 12:20 ` lizhe.67
2025-05-30 13:08 ` David Hildenbrand
0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 12:20 UTC (permalink / raw)
To: david
Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
muchun.song, peterx
On Fri, 30 May 2025 13:31:26 +0200, david@redhat.com wrote:
> On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
> > From: Li Zhe <lizhe.67@bytedance.com>
> >
> > In the current implementation of the longterm pin_user_pages() function,
> > we invoke the collect_longterm_unpinnable_folios() function. This function
> > iterates through the list to check whether each folio belongs to the
> > "longterm_unpinnabled" category. The folios in this list essentially
> > correspond to a contiguous region of user-space addresses, with each folio
> > representing a physical address in increments of PAGESIZE. If this
> > user-space address range is mapped with large folio, we can optimize the
> > performance of function pin_user_pages() by reducing the number of if-else
> > branches and the frequency of memory accesses using READ_ONCE. This patch
> > leverages this approach to achieve performance improvements.
> >
> > The performance test results obtained through the gup_test tool from the
> > kernel source tree are as follows. We achieve an improvement of over 75%
> > for large folio with pagesize=2M. For normal page, we have only observed
> > a very slight degradation in performance.
> >
> > Without this patch:
> >
> > [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >
> > With this patch:
> >
> > [root@localhost ~] ./gup_test -HL -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> > [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
> > TAP version 13
> > 1..1
> > # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
> > ok 1 ioctl status 0
> > # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
> >
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> > ---
> > mm/gup.c | 31 +++++++++++++++++++++++--------
> > 1 file changed, 23 insertions(+), 8 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 84461d384ae2..8c11418036e2 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> > unpin_user_pages(pofs->pages, pofs->nr_entries);
> > }
> >
> > +static struct folio *pofs_next_folio(struct folio *folio,
> > + struct pages_or_folios *pofs, long *index_ptr)
> > +{
> > + long i = *index_ptr + 1;
> > + unsigned long nr_pages = folio_nr_pages(folio);
> > +
> > + if (!pofs->has_folios)
> > + while ((i < pofs->nr_entries) &&
> > + /* Is this page part of this folio? */
> > + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>
> passing in a page that does not belong to the folio looks shaky and not
> future proof.
>
> folio_page() == folio
>
> is cleaner
Yes, this approach is cleaner. However, when obtaining a folio
corresponding to a page through the page_folio() interface,
READ_ONCE() is used internally to read from memory, which results
in the performance of pin_user_pages() being worse than before.
Could you please suggest an alternative approach to address this
problem?
Thanks,
Zhe
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 12:20 ` lizhe.67
@ 2025-05-30 13:08 ` David Hildenbrand
2025-05-30 15:02 ` lizhe.67
0 siblings, 1 reply; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 13:08 UTC (permalink / raw)
To: lizhe.67; +Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, muchun.song, peterx
On 30.05.25 14:20, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 13:31:26 +0200, david@redhat.com wrote:
>
>> On 30.05.25 11:23, lizhe.67@bytedance.com wrote:
>>> From: Li Zhe <lizhe.67@bytedance.com>
>>>
>>> In the current implementation of the longterm pin_user_pages() function,
>>> we invoke the collect_longterm_unpinnable_folios() function. This function
>>> iterates through the list to check whether each folio belongs to the
>>> "longterm_unpinnabled" category. The folios in this list essentially
>>> correspond to a contiguous region of user-space addresses, with each folio
>>> representing a physical address in increments of PAGESIZE. If this
>>> user-space address range is mapped with large folio, we can optimize the
>>> performance of function pin_user_pages() by reducing the number of if-else
>>> branches and the frequency of memory accesses using READ_ONCE. This patch
>>> leverages this approach to achieve performance improvements.
>>>
>>> The performance test results obtained through the gup_test tool from the
>>> kernel source tree are as follows. We achieve an improvement of over 75%
>>> for large folio with pagesize=2M. For normal page, we have only observed
>>> a very slight degradation in performance.
>>>
>>> Without this patch:
>>>
>>> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>>> TAP version 13
>>> 1..1
>>> # PIN_LONGTERM_BENCHMARK: Time: get:13623 put:10799 us#
>>> ok 1 ioctl status 0
>>> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>>> TAP version 13
>>> 1..1
>>> # PIN_LONGTERM_BENCHMARK: Time: get:129733 put:31753 us#
>>> ok 1 ioctl status 0
>>> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>
>>> With this patch:
>>>
>>> [root@localhost ~] ./gup_test -HL -m 8192 -n 512
>>> TAP version 13
>>> 1..1
>>> # PIN_LONGTERM_BENCHMARK: Time: get:3386 put:10844 us#
>>> ok 1 ioctl status 0
>>> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>> [root@localhost ~]# ./gup_test -LT -m 8192 -n 512
>>> TAP version 13
>>> 1..1
>>> # PIN_LONGTERM_BENCHMARK: Time: get:131652 put:31393 us#
>>> ok 1 ioctl status 0
>>> # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0
>>>
>>> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
>>> ---
>>> mm/gup.c | 31 +++++++++++++++++++++++--------
>>> 1 file changed, 23 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index 84461d384ae2..8c11418036e2 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>> unpin_user_pages(pofs->pages, pofs->nr_entries);
>>> }
>>>
>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>> + struct pages_or_folios *pofs, long *index_ptr)
>>> +{
>>> + long i = *index_ptr + 1;
>>> + unsigned long nr_pages = folio_nr_pages(folio);
>>> +
>>> + if (!pofs->has_folios)
>>> + while ((i < pofs->nr_entries) &&
>>> + /* Is this page part of this folio? */
>>> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>
>> passing in a page that does not belong to the folio looks shaky and not
>> future proof.
>>
>> folio_page() == folio
>>
>> is cleaner
>
> Yes, this approach is cleaner. However, when obtaining a folio
> corresponding to a page through the page_folio() interface,
Right, I meant page_folio().
> READ_ONCE() is used internally to read from memory, which results
> in the performance of pin_user_pages() being worse than before.
See contig_pages in [1] how it can be done using folio_page().
[1]
https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 13:08 ` David Hildenbrand
@ 2025-05-30 15:02 ` lizhe.67
2025-05-30 20:37 ` David Hildenbrand
0 siblings, 1 reply; 9+ messages in thread
From: lizhe.67 @ 2025-05-30 15:02 UTC (permalink / raw)
To: david
Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, lizhe.67,
muchun.song, peterx
On Fri, 30 May 2025 15:08:06 +0200, david@redhat.com wrote:
> >>> diff --git a/mm/gup.c b/mm/gup.c
> >>> index 84461d384ae2..8c11418036e2 100644
> >>> --- a/mm/gup.c
> >>> +++ b/mm/gup.c
> >>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
> >>> unpin_user_pages(pofs->pages, pofs->nr_entries);
> >>> }
> >>>
> >>> +static struct folio *pofs_next_folio(struct folio *folio,
> >>> + struct pages_or_folios *pofs, long *index_ptr)
> >>> +{
> >>> + long i = *index_ptr + 1;
> >>> + unsigned long nr_pages = folio_nr_pages(folio);
> >>> +
> >>> + if (!pofs->has_folios)
> >>> + while ((i < pofs->nr_entries) &&
> >>> + /* Is this page part of this folio? */
> >>> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
> >>
> >> passing in a page that does not belong to the folio looks shaky and not
> >> future proof.
> >>
> >> folio_page() == folio
> >>
> >> is cleaner
> >
> > Yes, this approach is cleaner. However, when obtaining a folio
> > corresponding to a page through the page_folio() interface,
>
> Right, I meant page_folio().
>
> > READ_ONCE() is used internally to read from memory, which results
> > in the performance of pin_user_pages() being worse than before.
>
> See contig_pages in [1] how it can be done using folio_page().
>
> [1]
> https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u
Thank you for your suggestion. It is indeed a good idea. I
initially thought along the same lines. However, I found that
the conditions for optimization here are slightly different
from those in contig_pages(). Here, it is only necessary to
ensure that the page is within the folio, rather than
requiring contiguity.
I have made some preliminary attempts: using the method of
contig_pages() still gets an optimization effect of
approximately 73%. On the other hand, if we use the following
code to determine whether page_to_pfn(pofs->pages[i]) belongs
to the range
[folio_pfn(folio), folio_pfn(folio) + folio_nr_pages(folio)),
the optimization effect is about 70%. I sincerely hope to
hear your thoughts on which solution you might favor.
+static struct folio *pofs_next_folio(struct folio *folio,
+ struct pages_or_folios *pofs, long *index_ptr)
+{
+ long i = *index_ptr + 1;
+
+ if (!pofs->has_folios) {
+ unsigned long start_pfn = folio_pfn(folio);
+ unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
+
+ for (; i < pofs->nr_entries; i++) {
+ unsigned long pfn = page_to_pfn(pofs->pages[i]);
+
+ /* Is this page part of this folio? */
+ if ((pfn < start_pfn) || (pfn >= end_pfn))
+ break;
+ }
+ }
+
+ if (unlikely(i == pofs->nr_entries))
+ return NULL;
+ *index_ptr = i;
+
+ return pofs_get_folio(pofs, i);
+}
Thanks,
Zhe
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH] gup: optimize longterm pin_user_pages() for large folio
2025-05-30 15:02 ` lizhe.67
@ 2025-05-30 20:37 ` David Hildenbrand
0 siblings, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-05-30 20:37 UTC (permalink / raw)
To: lizhe.67; +Cc: akpm, jgg, jhubbard, linux-kernel, linux-mm, muchun.song, peterx
On 30.05.25 17:02, lizhe.67@bytedance.com wrote:
> On Fri, 30 May 2025 15:08:06 +0200, david@redhat.com wrote:
>
>>>>> diff --git a/mm/gup.c b/mm/gup.c
>>>>> index 84461d384ae2..8c11418036e2 100644
>>>>> --- a/mm/gup.c
>>>>> +++ b/mm/gup.c
>>>>> @@ -2317,6 +2317,25 @@ static void pofs_unpin(struct pages_or_folios *pofs)
>>>>> unpin_user_pages(pofs->pages, pofs->nr_entries);
>>>>> }
>>>>>
>>>>> +static struct folio *pofs_next_folio(struct folio *folio,
>>>>> + struct pages_or_folios *pofs, long *index_ptr)
>>>>> +{
>>>>> + long i = *index_ptr + 1;
>>>>> + unsigned long nr_pages = folio_nr_pages(folio);
>>>>> +
>>>>> + if (!pofs->has_folios)
>>>>> + while ((i < pofs->nr_entries) &&
>>>>> + /* Is this page part of this folio? */
>>>>> + (folio_page_idx(folio, pofs->pages[i]) < nr_pages))
>>>>
>>>> passing in a page that does not belong to the folio looks shaky and not
>>>> future proof.
>>>>
>>>> folio_page() == folio
>>>>
>>>> is cleaner
>>>
>>> Yes, this approach is cleaner. However, when obtaining a folio
>>> corresponding to a page through the page_folio() interface,
>>
>> Right, I meant page_folio().
>>
>>> READ_ONCE() is used internally to read from memory, which results
>>> in the performance of pin_user_pages() being worse than before.
>>
>> See contig_pages in [1] how it can be done using folio_page().
>>
>> [1]
>> https://lore.kernel.org/all/20250529064947.38433-1-lizhe.67@bytedance.com/T/#u
>
> Thank you for your suggestion. It is indeed a good idea. I
> initially thought along the same lines. However, I found that
> the conditions for optimization here are slightly different
> from those in contig_pages(). Here, it is only necessary to
> ensure that the page is within the folio, rather than
> requiring contiguity.
Yes.
>
> I have made some preliminary attempts: using the method of
> contig_pages() still gets an optimization effect of
> approximately 73%. On the other hand, if we use the following
> code to determine whether page_to_pfn(pofs->pages[i]) belongs
> to the range
> [folio_pfn(folio), folio_pfn(folio) + folio_nr_pages(folio)),
> the optimization effect is about 70%. I sincerely hope to
> hear your thoughts on which solution you might favor.
>
> +static struct folio *pofs_next_folio(struct folio *folio,
> + struct pages_or_folios *pofs, long *index_ptr)
> +{
> + long i = *index_ptr + 1;
> +
> + if (!pofs->has_folios) {
> + unsigned long start_pfn = folio_pfn(folio);
> + unsigned long end_pfn = start_pfn + folio_nr_pages(folio);
> +
> + for (; i < pofs->nr_entries; i++) {
> + unsigned long pfn = page_to_pfn(pofs->pages[i]);
> +
> + /* Is this page part of this folio? */
> + if ((pfn < start_pfn) || (pfn >= end_pfn))
> + break;
folio_page() is extremely efficient with CONFIG_SPARSEMEM_VMEMMAP. I am
not sure how efficient it will be in the future once "struct folio" is
no longer an overlay of "struct page".
page_to_pfn() should be slightly more expensive than folio_page() right
now, but maybe more efficient in the future (maybe).
I don't particularly care, whatever you prefer :)
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2025-05-30 20:37 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-30 9:23 [PATCH] gup: optimize longterm pin_user_pages() for large folio lizhe.67
2025-05-30 9:53 ` Dev Jain
2025-05-30 10:04 ` lizhe.67
2025-05-30 10:12 ` Dev Jain
2025-05-30 11:31 ` David Hildenbrand
2025-05-30 12:20 ` lizhe.67
2025-05-30 13:08 ` David Hildenbrand
2025-05-30 15:02 ` lizhe.67
2025-05-30 20:37 ` David Hildenbrand
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox