On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote: > On Mon, May 26, 2025 at 7:17 PM Jaewon Kim wrote: > > > > >On 26.05.25 11:33, Hyesoo Yu wrote: > > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote: > > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu wrote: > > >>>> > > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote: > > >>>>> On 5/22/25 7:37 PM, 김재원 wrote: > > >>>>> ... > > >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer. > > >>>>>> We may be to able to prepare the patch next week. > > >>>>>> > > >>>>>> static long > > >>>>>> check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) > > >>>>>> { > > >>>>>> + bool any_unpinnable; > > >>>>>> LIST_HEAD(movable_folio_list); > > >>>>>> > > >>>>>> - collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > > >>>>>> - if (list_empty(&movable_folio_list)) > > >>>>>> - return 0; > > >>>>>> + any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs); > > >>>>>> + if (list_empty(&movable_folio_list)) { > > >>>>>> + if (any_unpinnable) > > >>>>>> + pofs_unpin(pofs); > > >>>>> > > >>>>> I think this is correct, although as I mentioned in the other thread, > > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove > > >>>>> any pof unpinning) is probably not the true or only culprit, right? > > >>>>> > > >>>>>> + return any_unpinnable ? -EAGAIN : 0; > > >>>>> > > >>>>> Ha, the "?" operator almost always does more harm than good. > > >>>>> > > >>>>> Here, for example, it has obscured from you the fact that any_unpinnable > > >>>>> is being checked twice, when you could have merged those into a single "if". > > >>>>> > > >>>> > > >>>> Hello, > > >>>> > > >>>> I was wondering if the original problem - an infinite loop when pages allocated by > > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved. > > >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP. > > >>>> Is that the expected behavior, or could it possibly indicate a bug ?) > > >>> The original problem arises from applying CMA as guestOS's memory > > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN). > > >>> You can check KVM code if you are interested. > > >>> > > >> > > >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding > > >> is that there are cases where GUP is used on CMA. > > >> > > >> So does that mean pinning memory from the CMA was actually intended to succeed ? > > > > > >Careful: KVM uses ordinary GUP, not GUP-longterm. > > > > Hi. David and Zhaoyang > > > > If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded? > > If KVM does not user FOLL_LONGTERM, then why the function, > > collect_longterm_unpinnable_folios, was changed at that time? > > > > First of all, I'm not a KVM expert. After reading Zhaoyang's mail, > > I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM, > > during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM, > > I am confused. > > > > Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN > > might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios > > would put the page to LRU. And other cma_alloc context or migration context, I guess, > > put the pages back to LRU if there was race. > Actually, it is pkvm which was introduced by google in AOSP. I am > afraid I can just brief the callstack here for security reasons. The > pin_user_pages will setup the 2nd stage mapping for the hva by the > vm_ops->fault which is registered by kvm memfd driver and all PFNs are > from CMA area. The driver will keep the pages out of the LRU which hit > the original bug as it is counted but have the movable_page_list be > empty and lead to infinite loop within __gup_longterm_locked > > pkvm_xxx_xxx(equal to user_mem_abort in kvm) > { > unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE; > ... > ret = pin_user_pages(hva, 1, flags, &page); > __gup_longterm_locked > do { > nr_pinned_pages = __get_user_pages_locked(mm, > start, nr_pages, > pages, locked, gup_flags); > rc = > check_and_migrate_movable_pages(nr_pinned_pages, pages); > } while (rc == -EAGAIN); > } Hello, Zhaoyang. I don't believe commit 1aaf8c was just intended to prevent an infinite loop. The commit was introduced to allow pinning CMA memory in the pKVM on AOSP. That leads me to question whether the assumption that CMA can be long-term pinned is actually valid. In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure that pKVM avoids using CMA for memory that requires long-term pinning through GUP ? Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA, it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ? Thanks, Regards.