* [RFC] pin_user_pages_fast failure count increased
@ 2025-04-28 15:17 Jaewon Kim
2025-04-28 15:21 ` David Hildenbrand
2025-04-28 20:14 ` John Hubbard
0 siblings, 2 replies; 6+ messages in thread
From: Jaewon Kim @ 2025-04-28 15:17 UTC (permalink / raw)
To: zhaoyang.huang, jhubbard, david; +Cc: surenb, linux-mm
Hi
If pin_user_pages_fast does not pin all the requested number of pages,
then drivers calling to pin_user_pages_fast should retry until the gup
pins all?
Our GPU driver uses pin_user_pages_fast.
After the recent kernel update, pin_user_pages_fast sometimes does not
pin all the pages, nr_pinned < nr_pages.
I think the following patch affected the pin_user_pages_fast behavior.
1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
BR
Jaewon Kim
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] pin_user_pages_fast failure count increased
2025-04-28 15:17 [RFC] pin_user_pages_fast failure count increased Jaewon Kim
@ 2025-04-28 15:21 ` David Hildenbrand
2025-04-28 20:14 ` John Hubbard
1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2025-04-28 15:21 UTC (permalink / raw)
To: Jaewon Kim, zhaoyang.huang, jhubbard; +Cc: surenb, linux-mm
On 28.04.25 17:17, Jaewon Kim wrote:
> Hi
Hi,
>
> If pin_user_pages_fast does not pin all the requested number of pages,
> then drivers calling to pin_user_pages_fast should retry until the gup
> pins all?
It should continue from the first-not-pinned-one. If there is an error,
it will be reported, otherwise it will make progress.
>
> Our GPU driver uses pin_user_pages_fast.
> After the recent kernel update, pin_user_pages_fast sometimes does not
> pin all the pages, nr_pinned < nr_pages.
Yeah, that can happen as documented.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] pin_user_pages_fast failure count increased
2025-04-28 15:17 [RFC] pin_user_pages_fast failure count increased Jaewon Kim
2025-04-28 15:21 ` David Hildenbrand
@ 2025-04-28 20:14 ` John Hubbard
2025-04-28 20:56 ` David Hildenbrand
1 sibling, 1 reply; 6+ messages in thread
From: John Hubbard @ 2025-04-28 20:14 UTC (permalink / raw)
To: Jaewon Kim, zhaoyang.huang, david; +Cc: surenb, linux-mm
On 4/28/25 8:17 AM, Jaewon Kim wrote:
> Hi
>
> If pin_user_pages_fast does not pin all the requested number of pages,
> then drivers calling to pin_user_pages_fast should retry until the gup
> pins all?
>
Approaches vary, for handling partial success of pin_user_pages().
* Many drivers unpin everything and either bail out entirely, or retry
pinning the entire original range.
* A few drivers try to pin the remain pages, in a retry loop (I think
the gpu/drm drivers IIRC).
It's really up to the driver author, how to respond to the inability
(which may be temporary) to pin the entire range all at once.
> Our GPU driver uses pin_user_pages_fast.
> After the recent kernel update, pin_user_pages_fast sometimes does not
> pin all the pages, nr_pinned < nr_pages.
>
> I think the following patch affected the pin_user_pages_fast behavior.
> 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
>
OK. If you see a bug or a problem, please elaborate.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] pin_user_pages_fast failure count increased
2025-04-28 20:14 ` John Hubbard
@ 2025-04-28 20:56 ` David Hildenbrand
2025-04-28 21:12 ` John Hubbard
0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-04-28 20:56 UTC (permalink / raw)
To: John Hubbard, Jaewon Kim, zhaoyang.huang; +Cc: surenb, linux-mm
On 28.04.25 22:14, John Hubbard wrote:
> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>> Hi
>>
>> If pin_user_pages_fast does not pin all the requested number of pages,
>> then drivers calling to pin_user_pages_fast should retry until the gup
>> pins all?
>>
>
> Approaches vary, for handling partial success of pin_user_pages().
>
> * Many drivers unpin everything and either bail out entirely, or retry
> pinning the entire original range.
Hm, unpinning + trying to repin the entire range can easily result in an
endless loop on persistent errors IIRC?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] pin_user_pages_fast failure count increased
2025-04-28 20:56 ` David Hildenbrand
@ 2025-04-28 21:12 ` John Hubbard
[not found] ` <CGME20250522092944epcas2p1ca46168564555aad6d9880bda26ec8de@epcas2p1.samsung.com>
0 siblings, 1 reply; 6+ messages in thread
From: John Hubbard @ 2025-04-28 21:12 UTC (permalink / raw)
To: David Hildenbrand, Jaewon Kim, zhaoyang.huang; +Cc: surenb, linux-mm
On 4/28/25 1:56 PM, David Hildenbrand wrote:
> On 28.04.25 22:14, John Hubbard wrote:
>> On 4/28/25 8:17 AM, Jaewon Kim wrote:
>>> Hi
>>>
>>> If pin_user_pages_fast does not pin all the requested number of pages,
>>> then drivers calling to pin_user_pages_fast should retry until the gup
>>> pins all?
>>>
>>
>> Approaches vary, for handling partial success of pin_user_pages().
>>
>> * Many drivers unpin everything and either bail out entirely, or retry
>> pinning the entire original range.
>
> Hm, unpinning + trying to repin the entire range can easily result in an
> endless loop on persistent errors IIRC?
>
I vaguely recall a limited number of retries, yes.
thanks,
--
John Hubbard
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [RFC] pin_user_pages_fast failure count increased
[not found] ` <CGME20250522092944epcas2p1ca46168564555aad6d9880bda26ec8de@epcas2p1.samsung.com>
@ 2025-05-22 9:27 ` Hyesoo Yu
0 siblings, 0 replies; 6+ messages in thread
From: Hyesoo Yu @ 2025-05-22 9:27 UTC (permalink / raw)
To: John Hubbard
Cc: David Hildenbrand, Jaewon Kim, zhaoyang.huang, surenb, linux-mm
[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]
On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote:
> On 4/28/25 1:56 PM, David Hildenbrand wrote:
> > On 28.04.25 22:14, John Hubbard wrote:
> > > On 4/28/25 8:17 AM, Jaewon Kim wrote:
> > > > Hi
> > > >
> > > > If pin_user_pages_fast does not pin all the requested number of pages,
> > > > then drivers calling to pin_user_pages_fast should retry until the gup
> > > > pins all?
> > > >
> > >
> > > Approaches vary, for handling partial success of pin_user_pages().
> > >
> > > * Many drivers unpin everything and either bail out entirely, or retry
> > > pinning the entire original range.
> >
> > Hm, unpinning + trying to repin the entire range can easily result in an
> > endless loop on persistent errors IIRC?
> >
>
> I vaguely recall a limited number of retries, yes.
>
> thanks,
> --
> John Hubbard
>
>
Hi,
I'd like to report a potential issue introduced by a recent change in
1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked
Previously, the call to migrate_longterm_unpinnable_folio() was guarded by
the collected variable. This meant that if a CMA page was temporarily held
in the pagevec and failed LRU isolation, it wouldn't be added to the movable_page_list,
but the collected counter would still be incremented.
As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN,
and the process would be retried until migration of the CMA page succeeded.
However, in the recent patch merged into mainline, the logic now only checks
whether movable_page_list is empty, and no longer relies on the collected count.
This can cause CMA pages that fail isolation to bypass retry logic and remain pinned.
Effectively,long-term pinning is now possible for CMA pages — something that previously
would have been avoided through repeated attempts.
We've observed this behavior in practice, which has led to issues such as CMA allocation
failures under memory pressure. This may indicate a regression in the logic that prevents
pinning of unmovable CMA pages.
I believe this warrants further discussion or possibly a fix to restore the intended retry
behavior for pages that fail LRU isolation.
Thanks,
Hyesoo Yu.
[-- Attachment #2: Type: text/plain, Size: 0 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-05-22 9:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-04-28 15:17 [RFC] pin_user_pages_fast failure count increased Jaewon Kim
2025-04-28 15:21 ` David Hildenbrand
2025-04-28 20:14 ` John Hubbard
2025-04-28 20:56 ` David Hildenbrand
2025-04-28 21:12 ` John Hubbard
[not found] ` <CGME20250522092944epcas2p1ca46168564555aad6d9880bda26ec8de@epcas2p1.samsung.com>
2025-05-22 9:27 ` Hyesoo Yu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox