linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: Hyesoo Yu <hyesoo.yu@samsung.com>
Cc: jaewon31.kim@samsung.com, David Hildenbrand <david@redhat.com>,
	 John Hubbard <jhubbard@nvidia.com>,
	 "zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
	"surenb@google.com" <surenb@google.com>,
	 "Steve.Kang@unisoc.com" <Steve.Kang@unisoc.com>,
	Jaewon Kim <jaewon31.kim@gmail.com>,
	 "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Jang-Hyuck Kim <janghyuck.kim@samsung.com>
Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased
Date: Wed, 28 May 2025 10:49:36 +0800	[thread overview]
Message-ID: <CAGWkznH9xc9m+LrqWniKoBj=8_0=iUTW24eTA8ONLhS=r_rTiA@mail.gmail.com> (raw)
In-Reply-To: <20250528012329.GA1545287@tiffany>

On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
>
> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> > On Mon, May 26, 2025 at 7:17 PM Jaewon Kim <jaewon31.kim@samsung.com> wrote:
> > >
> > > >On 26.05.25 11:33, Hyesoo Yu wrote:
> > > >> On Mon, May 26, 2025 at 04:05:16PM +0800, Zhaoyang Huang wrote:
> > > >>> On Mon, May 26, 2025 at 3:50?PM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> > > >>>>
> > > >>>> On Thu, May 22, 2025 at 07:52:41PM -0700, John Hubbard wrote:
> > > >>>>> On 5/22/25 7:37 PM, 김재원 wrote:
> > > >>>>> ...
> > > >>>>>> I think this is what you meant, please let me know if you have an idea to make this nicer.
> > > >>>>>> We may be to able to prepare the patch next week.
> > > >>>>>>
> > > >>>>>>   static long
> > > >>>>>>   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> > > >>>>>>   {
> > > >>>>>> +       bool any_unpinnable;
> > > >>>>>>          LIST_HEAD(movable_folio_list);
> > > >>>>>>
> > > >>>>>> -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> -       if (list_empty(&movable_folio_list))
> > > >>>>>> -               return 0;
> > > >>>>>> +       any_unpinnable = collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > > >>>>>> +       if (list_empty(&movable_folio_list)) {
> > > >>>>>> +               if (any_unpinnable)
> > > >>>>>> +                       pofs_unpin(pofs);
> > > >>>>>
> > > >>>>> I think this is correct, although as I mentioned in the other thread,
> > > >>>>> that implies that commit 1aaf8c122918 (which didn't add nor remove
> > > >>>>> any pof unpinning) is probably not the true or only culprit, right?
> > > >>>>>
> > > >>>>>> +               return any_unpinnable ? -EAGAIN : 0;
> > > >>>>>
> > > >>>>> Ha, the "?" operator almost always does more harm than good.
> > > >>>>>
> > > >>>>> Here, for example, it has obscured from you the fact that any_unpinnable
> > > >>>>> is being checked twice, when you could have merged those into a single "if".
> > > >>>>>
> > > >>>>
> > > >>>> Hello,
> > > >>>>
> > > >>>> I was wondering if the original problem - an infinite loop when pages allocated by
> > > >>>> cma_alloc() in vm_ops->fault are passed to GUP - still remains unresolved.
> > > >>>> (To be honest, I'm not quite sure how such pages end up being pinned via GUP.
> > > >>>>   Is that the expected behavior, or could it possibly indicate a bug ?)
> > > >>> The original problem arises from applying CMA as guestOS's memory
> > > >>> slots for kvm which use GUP to setup its 2nd stage mapping(HVA->PFN).
> > > >>> You can check KVM code if you are interested.
> > > >>>
> > > >>
> > > >> Thanks for the kind explanation. While I'm not deeply familiar with KVM, my understanding
> > > >> is that there are cases where GUP is used on CMA.
> > > >>
> > > >> So does that mean pinning memory from the CMA was actually intended to succeed ?
> > > >
> > > >Careful: KVM uses ordinary GUP, not GUP-longterm.
> > >
> > > Hi. David and Zhaoyang
> > >
> > > If possible, could you kindly explain the situation where the 1aaf8c122918 was addeded?
> > > If KVM does not user FOLL_LONGTERM, then why the function,
> > > collect_longterm_unpinnable_folios, was changed at that time?
> > >
> > > First of all, I'm not a KVM expert. After reading Zhaoyang's mail,
> > > I thought CMA free page was initially allocated then migrated by FOLL_LONGTERM,
> > > during the get_user_page for KVM's guest OS. If KVM does not use FOLL_LONGTERM,
> > > I am confused.
> > >
> > > Actually I did not understand the infinite loop situation. I thought few times of -EAGAIN
> > > might happen during the gup. But calling lru_add_drain_all by collect_longterm_unpinnable_folios
> > > would put the page to LRU. And other cma_alloc context or migration context, I guess,
> > > put the pages back to LRU if there was race.
> > Actually, it is pkvm which was introduced by google in AOSP. I am
> > afraid I can just brief the callstack here for security reasons. The
> > pin_user_pages will setup the 2nd stage mapping for the hva by the
> > vm_ops->fault which is registered by kvm memfd driver and all PFNs are
> > from CMA area. The driver will keep the pages out of the LRU which hit
> > the original bug as it is counted but have the movable_page_list be
> > empty and lead to infinite loop within __gup_longterm_locked
> >
> > pkvm_xxx_xxx(equal to user_mem_abort in kvm)
> > {
> >     unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
> >     ...
> >     ret = pin_user_pages(hva, 1, flags, &page);
> >             __gup_longterm_locked
> >                 do {
> >                      nr_pinned_pages = __get_user_pages_locked(mm,
> > start, nr_pages,
> >                       pages, locked, gup_flags);
> >                      rc =
> > check_and_migrate_movable_pages(nr_pinned_pages, pages);
> >                  } while (rc == -EAGAIN);
> > }
>
> Hello, Zhaoyang.
>
> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
>
> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
That depends on the user of CMA, yes for my scenario since it worked
for the guest os. For common scenario such as the file/anon mapping,
the page will be judged as unpinnable for long-term and be migrated
out of CMA area.
>
> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
non-LRU CMA pages.
>
> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
I don't think so. pin_user_pages is an exported API which can't make
assumptions over the caller.
>
> Thanks,
> Regards.
>


  reply	other threads:[~2025-05-28  2:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22   ` David Hildenbrand
     [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09       ` Jaewon Kim
2025-05-22 14:06         ` David Hildenbrand
     [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44           ` 김재원
2025-05-22 15:07             ` David Hildenbrand
2025-05-23  2:48               ` John Hubbard
2025-05-23  2:37           ` 김재원
2025-05-23  2:52             ` John Hubbard
2025-05-26  7:48               ` Hyesoo Yu
2025-05-26  8:05                 ` Zhaoyang Huang
2025-05-26  9:33                   ` Hyesoo Yu
2025-05-26  9:38                     ` David Hildenbrand
     [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17                       ` Jaewon Kim
2025-05-26 11:49                         ` Zhaoyang Huang
2025-05-28  1:23                           ` Hyesoo Yu
2025-05-28  2:49                             ` Zhaoyang Huang [this message]
2025-05-28  3:36                               ` Hyesoo Yu
2025-05-28  7:55                                 ` David Hildenbrand
2025-05-28 10:59                                   ` Zhaoyang Huang
2025-05-28 12:57                                     ` David Hildenbrand
2025-06-03 13:12                                     ` David Hildenbrand
2025-06-04  1:04                                       ` Zhaoyang Huang
2025-06-04  9:12                                         ` David Hildenbrand
2025-06-04  9:41                                           ` Zhaoyang Huang
2025-06-04  9:48                                             ` David Hildenbrand
     [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04  9:53                                                 ` Hyesoo Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGWkznH9xc9m+LrqWniKoBj=8_0=iUTW24eTA8ONLhS=r_rTiA@mail.gmail.com' \
    --to=huangzhaoyang@gmail.com \
    --cc=Steve.Kang@unisoc.com \
    --cc=david@redhat.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=jaewon31.kim@gmail.com \
    --cc=jaewon31.kim@samsung.com \
    --cc=janghyuck.kim@samsung.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox