From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Hyesoo Yu <hyesoo.yu@samsung.com>,
jaewon31.kim@samsung.com, John Hubbard <jhubbard@nvidia.com>,
"zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
"surenb@google.com" <surenb@google.com>,
"Steve.Kang@unisoc.com" <Steve.Kang@unisoc.com>,
Jaewon Kim <jaewon31.kim@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Jang-Hyuck Kim <janghyuck.kim@samsung.com>
Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased
Date: Wed, 4 Jun 2025 17:41:41 +0800 [thread overview]
Message-ID: <CAGWkznFu-YcErBqQgsg0cUJ6JWE7LsebxHEfqvBNNP8d-9bYzg@mail.gmail.com> (raw)
In-Reply-To: <c34355e2-8dfa-428a-8fbd-2843dd877017@redhat.com>
On Wed, Jun 4, 2025 at 5:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 04.06.25 03:04, Zhaoyang Huang wrote:
> > On Tue, Jun 3, 2025 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 28.05.25 12:59, Zhaoyang Huang wrote:
> >>> On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 28.05.25 05:36, Hyesoo Yu wrote:
> >>>>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> >>>>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>>>>>
> >>>>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >>>>>>>
> >>>>>>> Hello, Zhaoyang.
> >>>>>>>
> >>>>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> >>>>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >>>>>>>
> >>>>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> >>>>>> That depends on the user of CMA, yes for my scenario since it worked
> >>>>>> for the guest os. For common scenario such as the file/anon mapping,
> >>>>>> the page will be judged as unpinnable for long-term and be migrated
> >>>>>> out of CMA area.
> >>>>>
> >>>>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> >>>>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> >>>>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> >>>>>
> >>>>
> >>>> Right. We just disallow long-term pinning CMA pages, because we don't
> >>>> know who the real owner is that would be okay with long-term pinning them.
> >>>>
> >>>>>>>
> >>>>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> >>>>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> >>>>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> >>>>>> non-LRU CMA pages.
> >>>>>
> >>>>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> >>>>>
> >>>>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> >>>>> * that region. And so, CMA attempts to migrate the page before pinning, when
> >>>>> * FOLL_LONGTERM is specified.
> >>>>>
> >>>>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
> >>>>
> >>>> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
> >>>>
> >>>>>>>
> >>>>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> >>>>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> >>>>>> I don't think so. pin_user_pages is an exported API which can't make
> >>>>>> assumptions over the caller.
> >>>>>
> >>>>> My point is not to base the patch on assumptions about the caller,
> >>>>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
> >>>>>
> >>>>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
> >>>>
> >>>> Not sure which exact semantics you have in mind. But failing if we would
> >>>> have to migrate might be ok. Not sure if the caller should worry about
> >>>> that, though: the caller should not have to worry about page placement
> >>>> in general.
> >>> With going over the whole thread, I think the root cause is
> >>> collect_longterm_unpinnable_folios() hit the race window between
> >>> lru_add_drain_all() and folio_isolate_lru() by chance and returned
> >>> with ret=0 which finally have the CMA page pinned, right? However, I
> >>> find the proposed patch below will fail the PKVM
> >>> scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> >>> pages never go to LRU which will have the __gup_longterm_locked loop
> >>> in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
> >>> point is to find a way to distinguish the temporary(on the way to LRU)
> >>> and permanent CMA pages within collect_longterm_unpinnable_folios.
> >>>
> >>> static long
> >>> check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >>> {
> >>> + bool any_unpinnable;
> >>> LIST_HEAD(movable_folio_list);
> >>>
> >>> - collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>> - if (list_empty(&movable_folio_list))
> >>> - return 0;
> >>> + any_unpinnable =
> >>> collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> >>> + if (list_empty(&movable_folio_list)) {
> >>> + if (any_unpinnable)
> >>> + pofs_unpin(pofs);
> >>> + return any_unpinnable ? -EAGAIN : 0;
> >>> + }
> >>>
> >> So, what's the status of that? We should fix it upstream (*not* caring
> >> about controversial out-of-tree pkvm issues).
> > Leaving aside the pkvm issue, we should also care about the CMA pages
> > mapping to VM by special driver which are intended to be long term
> > pinned (actually they are fetched by cma_alloc and then mapped to VM
> > instead of alloc_pages during normal page fault).
>
> Is there any such "special driver" in the tree?
Not that I know of. However, pin_user_pages is exported symbol which
could be used for ko, should we make it be capable of dealing with
this scenario?
>
> > Could we distinguish
> > them by the patch below based on 1aaf8c122, that is, this kind of
> > pages is not on page cache and have equaled refcnt to mapcount
>
> No, not like that. We'd need some proper indication that this page was
> allocated by the CMA area owner, and that owner agrees that the folio
> can be long-term pinned (maybe that agreement is by mapping it into user
> space, tbd).
I think the key point is to distinguish the cma pages which are
allocated from fallback of GFP_MOVABLE during common page faults from
the ones which got from cma_alloc within the special driver's
vm_ops->fault.
>
> Will you send the fix or should I do it? Discussing about broken use
> cases that do no apply upstream is not particularly helpful when we're
> dealing with a real upstream bug.
I would like to ask for your help on this since I have no further ideas. Thanks
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2025-06-04 9:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09 ` Jaewon Kim
2025-05-22 14:06 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44 ` 김재원
2025-05-22 15:07 ` David Hildenbrand
2025-05-23 2:48 ` John Hubbard
2025-05-23 2:37 ` 김재원
2025-05-23 2:52 ` John Hubbard
2025-05-26 7:48 ` Hyesoo Yu
2025-05-26 8:05 ` Zhaoyang Huang
2025-05-26 9:33 ` Hyesoo Yu
2025-05-26 9:38 ` David Hildenbrand
[not found] ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17 ` Jaewon Kim
2025-05-26 11:49 ` Zhaoyang Huang
2025-05-28 1:23 ` Hyesoo Yu
2025-05-28 2:49 ` Zhaoyang Huang
2025-05-28 3:36 ` Hyesoo Yu
2025-05-28 7:55 ` David Hildenbrand
2025-05-28 10:59 ` Zhaoyang Huang
2025-05-28 12:57 ` David Hildenbrand
2025-06-03 13:12 ` David Hildenbrand
2025-06-04 1:04 ` Zhaoyang Huang
2025-06-04 9:12 ` David Hildenbrand
2025-06-04 9:41 ` Zhaoyang Huang [this message]
2025-06-04 9:48 ` David Hildenbrand
[not found] ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04 9:53 ` Hyesoo Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGWkznFu-YcErBqQgsg0cUJ6JWE7LsebxHEfqvBNNP8d-9bYzg@mail.gmail.com \
--to=huangzhaoyang@gmail.com \
--cc=Steve.Kang@unisoc.com \
--cc=david@redhat.com \
--cc=hyesoo.yu@samsung.com \
--cc=jaewon31.kim@gmail.com \
--cc=jaewon31.kim@samsung.com \
--cc=janghyuck.kim@samsung.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=surenb@google.com \
--cc=zhaoyang.huang@unisoc.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox