On Thu, Jun 05, 2025 at 05:04:31PM +0900, Hyesoo Yu wrote: > Commit 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked") > introduced an issue where CMA pages could be pinned by longterm GUP requests. > This occurs when unpinnable pages are detected but the movable_page_list is empty; > the commit would return success without retrying, allowing unpinnable > pages (such as CMA) to become pinned. > > CMA pages may be temporarily off the LRU due to concurrent isolation, > for example when multiple longterm GUP requests are racing and therefore > not appear in movable_page_list. Before commit 1aaf8c, the kernel would > retry migration in such cases, which helped avoid accidental CMA pinning. > > The original intent of the commit was to support longterm GUP on non-LRU > CMA pages in out-of-tree use cases such as pKVM. However, allowing this > can lead to broader CMA pinning issues. > > To avoid this, the logic is restored to return -EAGAIN instead of success > when no folios could be collected but unpinnable pages were found. > This ensures that migration is retried until success, and avoids > inadvertently pinning unpinnable pages. > > Fixes: 1aaf8c122918 ("mm: gup: fix infinite loop within __get_longterm_locked") > Acked-by: David Hildenbrand > Signed-off-by: Hyesoo Yu > > --- > We have confirmed that this regression causes CMA pages to be pinned > in our kernel 6.12-based environment. > > In addition to CMA allocation failures, we also observed longterm GUP failures > when repeatedly accessing the same VMA. Specifically, the first longterm GUP > call would pin a CMA page, and a second call on the same region > would fail the migration because the cma page was already pinned. > > After reverting commit 1aaf8c122918, the issue no longer reproduced. > > Therefore, this fix is important to ensure reliable behavior of longterm GUP > and CMA-backed memory, and should be backported to stable. > --- > mm/gup.c | 28 ++++++++++++++++++++++------ > 1 file changed, 22 insertions(+), 6 deletions(-) > > diff --git a/mm/gup.c b/mm/gup.c > index e065a49842a8..66193421c1d4 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2300,14 +2300,12 @@ static void pofs_unpin(struct pages_or_folios *pofs) > unpin_user_pages(pofs->pages, pofs->nr_entries); > } > > -/* > - * Returns the number of collected folios. Return value is always >= 0. > - */ > -static void collect_longterm_unpinnable_folios( > +static bool collect_longterm_unpinnable_folios( > struct list_head *movable_folio_list, > struct pages_or_folios *pofs) > { > struct folio *prev_folio = NULL; > + bool any_unpinnable = false; > bool drain_allow = true; > unsigned long i; > > @@ -2321,6 +2319,8 @@ static void collect_longterm_unpinnable_folios( > if (folio_is_longterm_pinnable(folio)) > continue; > > + any_unpinnable = true; > + > if (folio_is_device_coherent(folio)) > continue; > > @@ -2342,6 +2342,8 @@ static void collect_longterm_unpinnable_folios( > NR_ISOLATED_ANON + folio_is_file_lru(folio), > folio_nr_pages(folio)); > } > + > + return any_unpinnable; > } > > /* > @@ -2417,11 +2419,25 @@ migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, > static long > check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) > { > + bool any_unpinnable; > + > LIST_HEAD(movable_folio_list); > > - collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > - if (list_empty(&movable_folio_list)) > + any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > + Hi David, While re-reviewing the v3 patch, I realized that migrate_longterm_unpinnable_folios() should always be called whenever unpinnable folios are present, regardless of whether the movable_folio_list is empty. In collect_longterm_unpinnable_folios(), if folio_is_device_coherent() is true, the folio is not added to movable_folio_list. However, such device-coherent folios still need to be migrated later in migrate_longterm_unpinnable_folios(). I think the condition `list_empty(&movable_folio_list)` should be removed, and it might be better to revert commit 1aaf8c122918 rather than adding a new patch. What do you think? Thanks, Regards. > + if (list_empty(&movable_folio_list)) { > + /* > + * If we find any longterm unpinnable page that we failed to > + * isolated for migration, it might be because someone else > + * concurrently isolated it. Make the caller retry until it > + * succeeds. > + */ > + if (any_unpinnable) { > + pofs_unpin(pofs); > + return -EAGAIN; > + } > return 0; > + } > > return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs); > } > -- > 2.49.0 > >