From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: Hyesoo Yu <hyesoo.yu@samsung.com>
Cc: jaewon31.kim@samsung.com, David Hildenbrand <david@redhat.com>,
John Hubbard <jhubbard@nvidia.com>,
"zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
"surenb@google.com" <surenb@google.com>,
"Steve.Kang@unisoc.com" <Steve.Kang@unisoc.com>,
Jaewon Kim <jaewon31.kim@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Jang-Hyuck Kim <janghyuck.kim@samsung.com>
Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased
Date: Wed, 28 May 2025 10:49:36 +0800 [thread overview]
Message-ID: <CAGWkznH9xc9m+LrqWniKoBj=8_0=iUTW24eTA8ONLhS=r_rTiA@mail.gmail.com> (raw)
In-Reply-To: <20250528012329.GA1545287@tiffany>
On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>
> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> > On Mon, May 26, 2025 at 7:17 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> > >
> > > >On 26.05.25 11:33, Hyesoo Yu wrote:
> > > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> > > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> > > >>>>
> > > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > > >>>>> On 5/22/25 7:37 PM, 김재원 wrote:
> > > >>>>> ...
> > > >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
> > > >>>>>> We may be to able to prepare the patch next week.
> > > >>>>>>
> > > >>>>>> static long
> > > >>>>>> check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > > >>>>>> {
> > > >>>>>> + bool any_unpinnable;
> > > >>>>>> LIST_HEAD(movable_folio_list);
> > > >>>>>>
> > > >>>>>> - collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> - if (list_empty(&movable_folio_list))
> > > >>>>>> - return 0;
> > > >>>>>> + any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> + if (list_empty(&movable_folio_list)) {
> > > >>>>>> + if (any_unpinnable)
> > > >>>>>> + pofs_unpin(pofs);
> > > >>>>>
> > > >>>>> I think this is correct, although as I mentioned in the other thread,
> > > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
> > > >>>>> any pof unpinning) is probably not the true or only culprit, right?
> > > >>>>>
> > > >>>>>> + return any_unpinnable ? -EAGAIN : 0;
> > > >>>>>
> > > >>>>> Ha, the "?" operator almost always does more harm than good.
> > > >>>>>
> > > >>>>> Here, for example, it has obscured from you the fact that any_unpinnable
> > > >>>>> is being checked twice, when you could have merged those into a single "if".
> > > >>>>>
> > > >>>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I was wondering if the original problem - an infinite loop when pages allocated by
> > > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> > > >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> > > >>>> Is that the expected behavior, or could it possibly indicate a bug ?)
> > > >>> The original problem arises from applying CMA as guestOS's memory
> > > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> > > >>> You can check KVM code if you are interested.
> > > >>>
> > > >>
> > > >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> > > >> is that there are cases where GUP is used on CMA.
> > > >>
> > > >> So does that mean pinning memory from the CMA was actually intended to succeed ?
> > > >
> > > >Careful: KVM uses ordinary GUP, not GUP-longterm.
> > >
> > > Hi. David and Zhaoyang
> > >
> > > If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
> > > If KVM does not user FOLL_LONGTERM, then why the function,
> > > collect_longterm_unpinnable_folios, was changed at that time?
> > >
> > > First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
> > > I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
> > > during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
> > > I am confused.
> > >
> > > Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
> > > might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
> > > would put the page to LRU. And other cma_alloc context or migration context, I guess,
> > > put the pages back to LRU if there was race.
> > Actually, it is pkvm which was introduced by google in AOSP. I am
> > afraid I can just brief the callstack here for security reasons. The
> > pin_user_pages will setup the 2nd stage mapping for the hva by the
> > vm_ops->fault which is registered by kvm memfd driver and all PFNs are
> > from CMA area. The driver will keep the pages out of the LRU which hit
> > the original bug as it is counted but have the movable_page_list be
> > empty and lead to infinite loop within __gup_longterm_locked
> >
> > pkvm_xxx_xxx(equal to user_mem_abort in kvm)
> > {
> > unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> > ...
> > ret = pin_user_pages(hva, 1, flags, &page);
> > __gup_longterm_locked
> > do {
> > nr_pinned_pages = __get_user_pages_locked(mm,
> > start, nr_pages,
> > pages, locked, gup_flags);
> > rc =
> > check_and_migrate_movable_pages(nr_pinned_pages, pages);
> > } while (rc == -EAGAIN);
> > }
>
> Hello, Zhaoyang.
>
> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>
> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
That depends on the user of CMA, yes for my scenario since it worked
for the guest os. For common scenario such as the file/anon mapping,
the page will be judged as unpinnable for long-term and be migrated
out of CMA area.
>
> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
non-LRU CMA pages.
>
> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
I don't think so. pin_user_pages is an exported API which can't make
assumptions over the caller.
>
> Thanks,
> Regards.
>
next prev parent reply other threads:[~2025-05-28 2:49 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09 ` Jaewon Kim
2025-05-22 14:06 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44 ` 김재원
2025-05-22 15:07 ` David Hildenbrand
2025-05-23 2:48 ` John Hubbard
2025-05-23 2:37 ` 김재원
2025-05-23 2:52 ` John Hubbard
2025-05-26 7:48 ` Hyesoo Yu
2025-05-26 8:05 ` Zhaoyang Huang
2025-05-26 9:33 ` Hyesoo Yu
2025-05-26 9:38 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17 ` Jaewon Kim
2025-05-26 11:49 ` Zhaoyang Huang
2025-05-28 1:23 ` Hyesoo Yu
2025-05-28 2:49 ` Zhaoyang Huang [this message]
2025-05-28 3:36 ` Hyesoo Yu
2025-05-28 7:55 ` David Hildenbrand
2025-05-28 10:59 ` Zhaoyang Huang
2025-05-28 12:57 ` David Hildenbrand
2025-06-03 13:12 ` David Hildenbrand
2025-06-04 1:04 ` Zhaoyang Huang
2025-06-04 9:12 ` David Hildenbrand
2025-06-04 9:41 ` Zhaoyang Huang
2025-06-04 9:48 ` David Hildenbrand
[not found] ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04 9:53 ` Hyesoo Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAGWkznH9xc9m+LrqWniKoBj=8_0=iUTW24eTA8ONLhS=r_rTiA@mail.gmail.com' \
--to=huangzhaoyang@gmail.com \
--cc=Steve.Kang@unisoc.com \
--cc=david@redhat.com \
--cc=hyesoo.yu@samsung.com \
--cc=jaewon31.kim@gmail.com \
--cc=jaewon31.kim@samsung.com \
--cc=janghyuck.kim@samsung.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=surenb@google.com \
--cc=zhaoyang.huang@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox