On Mon, Apr 28, 2025 at 02:12:57PM -0700, John Hubbard wrote: > On 4/28/25 1:56 PM, David Hildenbrand wrote: > > On 28.04.25 22:14, John Hubbard wrote: > > > On 4/28/25 8:17 AM, Jaewon Kim wrote: > > > > Hi > > > > > > > > If pin_user_pages_fast does not pin all the requested number of pages, > > > > then drivers calling to pin_user_pages_fast should retry until the gup > > > > pins all? > > > > > > > > > > Approaches vary, for handling partial success of pin_user_pages(). > > > > > > * Many drivers unpin everything and either bail out entirely, or retry > > > pinning the entire original range. > > > > Hm, unpinning + trying to repin the entire range can easily result in an > > endless loop on persistent errors IIRC? > > > > I vaguely recall a limited number of retries, yes. > > thanks, > -- > John Hubbard > > Hi, I'd like to report a potential issue introduced by a recent change in 1aaf8c122918 mm: gup: fix infinite loop within __get_longterm_locked Previously, the call to migrate_longterm_unpinnable_folio() was guarded by the collected variable. This meant that if a CMA page was temporarily held in the pagevec and failed LRU isolation, it wouldn't be added to the movable_page_list, but the collected counter would still be incremented. As a result, migrate_longterm_unpinnable_folio() would return -EAGAIN, and the process would be retried until migration of the CMA page succeeded. However, in the recent patch merged into mainline, the logic now only checks whether movable_page_list is empty, and no longer relies on the collected count. This can cause CMA pages that fail isolation to bypass retry logic and remain pinned. Effectively,long-term pinning is now possible for CMA pages — something that previously would have been avoided through repeated attempts. We've observed this behavior in practice, which has led to issues such as CMA allocation failures under memory pressure. This may indicate a regression in the logic that prevents pinning of unmovable CMA pages. I believe this warrants further discussion or possibly a fix to restore the intended retry behavior for pages that fail LRU isolation. Thanks, Hyesoo Yu.