linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Zhaoyang Huang <huangzhaoyang@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Hyesoo Yu <hyesoo.yu@samsung.com>,
	jaewon31.kim@samsung.com,  John Hubbard <jhubbard@nvidia.com>,
	 "zhaoyang.huang@unisoc.com" <zhaoyang.huang@unisoc.com>,
	"surenb@google.com" <surenb@google.com>,
	 "Steve.Kang@unisoc.com" <Steve.Kang@unisoc.com>,
	Jaewon Kim <jaewon31.kim@gmail.com>,
	 "linux-mm@kvack.org" <linux-mm@kvack.org>,
	Jang-Hyuck Kim <janghyuck.kim@samsung.com>
Subject: Re: reply: [RFC] pin_user_pages_fast failure count increased
Date: Wed, 4 Jun 2025 09:04:09 +0800	[thread overview]
Message-ID: <CAGWkznEZYiu9ReU+zyeMb41YRo+kOcbv5_oQTHj87wu8uYPJ8w@mail.gmail.com> (raw)
In-Reply-To: <223cf8fc-7743-497f-893c-37ac689af002@redhat.com>

On Tue, Jun 3, 2025 at 9:12 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 28.05.25 12:59, Zhaoyang Huang wrote:
> > On Wed, May 28, 2025 at 3:55 PM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 28.05.25 05:36, Hyesoo Yu wrote:
> >>> On Wed, May 28, 2025 at 10:49:36AM +0800, Zhaoyang Huang wrote:
> >>>> On Wed, May 28, 2025 at 9:25 AM Hyesoo Yu <hyesoo.yu@samsung.com> wrote:
> >>>>>
> >>>>> On Mon, May 26, 2025 at 07:49:57PM +0800, Zhaoyang Huang wrote:
> >>>>>
> >>>>> Hello, Zhaoyang.
> >>>>>
> >>>>> I don't believe commit 1aaf8c was just intended to prevent an infinite loop.
> >>>>> The commit was introduced to allow pinning CMA memory in the pKVM on AOSP.
> >>>>>
> >>>>> That leads me to question whether the assumption that CMA can be long-term pinned is actually valid.
> >>>> That depends on the user of CMA, yes for my scenario since it worked
> >>>> for the guest os. For common scenario such as the file/anon mapping,
> >>>> the page will be judged as unpinnable for long-term and be migrated
> >>>> out of CMA area.
> >>>
> >>> Your scenario and the common scenarios can not be distinguished from the kernel API's perspective.
> >>> Even in common cases, the page may be in a non-LRU state temporiarily, and in such situations,
> >>> pinning CMA can lead to bugs - we've encountered multiple issues because of this.
> >>>
> >>
> >> Right. We just disallow long-term pinning CMA pages, because we don't
> >> know who the real owner is that would be okay with long-term pinning them.
> >>
> >>>>>
> >>>>> In my opinion, it might be more appropriate to revert that commit 1aaf8c and instead ensure
> >>>>> that pKVM avoids using CMA for memory that requires long-term pinning through GUP ?
> >>>> It is not a pkvm issue but a defect of applying FOLL_LONGTERM over
> >>>> non-LRU CMA pages.
> >>>
> >>> In include/linux/mm_types.h, the CMA should be migrated when FOLL_LONGTERM.
> >>>
> >>> * In the CMA case: long term pins in a CMA region would unnecessarily fragment
> >>> * that region.  And so, CMA attempts to migrate the page before pinning, when
> >>> * FOLL_LONGTERM is specified.
> >>>
> >>> Given this, would it make sense to avoid using FOLL_LONGTERM in this code path ?
> >>
> >> If something is unbounded in time, FOLL_LONGTERM is the right thing to use.
> >>
> >>>>>
> >>>>> Alternatively, instead of changing the current logic that prevents longterm GUP from pinning CMA,
> >>>>> it would be better to propose a new patch that specifically addresses the pKVM scenario like adding new FOLL_flags ?
> >>>> I don't think so. pin_user_pages is an exported API which can't make
> >>>> assumptions over the caller.
> >>>
> >>> My point is not to base the patch on assumptions about the caller,
> >>> but to define a clear mechanism that ensures safe behavior in the intended scenario.
> >>>
> >>> For example, you can add FOLL_NO_MIGRATION and skip to migrate unpinnable pages.
> >>
> >> Not sure which exact semantics you have in mind. But failing if we would
> >> have to migrate might be ok. Not sure if the caller should worry about
> >> that, though: the caller should not have to worry about page placement
> >> in general.
> > With going over the whole thread, I think the root cause is
> > collect_longterm_unpinnable_folios() hit the race window between
> > lru_add_drain_all() and folio_isolate_lru() by chance and returned
> > with ret=0 which finally have the CMA page pinned, right? However, I
> > find the proposed patch below will fail the PKVM
> > scenario(FOLL_LONGTERM set with non-LRU CMA pages) again as the CMA
> > pages never go to LRU which will have the __gup_longterm_locked loop
> > in do while(ret == -EAGAIN) as it did before 1aaf8c. I think the key
> > point is to find a way to distinguish the temporary(on the way to LRU)
> > and permanent CMA pages within collect_longterm_unpinnable_folios.
> >
> >   static long
> >   check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs)
> >   {
> > +       bool any_unpinnable;
> >          LIST_HEAD(movable_folio_list);
> >
> > -       collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > -       if (list_empty(&movable_folio_list))
> > -               return 0;
> > +       any_unpinnable =
> > collect_longterm_unpinnable_folios(&movable_folio_list, pofs);
> > +       if (list_empty(&movable_folio_list)) {
> > +               if (any_unpinnable)
> > +                       pofs_unpin(pofs);
> > +               return any_unpinnable ? -EAGAIN : 0;
> > +       }
> >
> So, what's the status of that? We should fix it upstream (*not* caring
> about controversial out-of-tree pkvm issues).
Leaving aside the pkvm issue, we should also care about the CMA pages
mapping to VM by special driver which are intended to be long term
pinned (actually they are fetched by cma_alloc and then mapped to VM
instead of alloc_pages during normal page fault). Could we distinguish
them by the patch below based on 1aaf8c122, that is, this kind of
pages is not on page cache and have equaled refcnt to mapcount

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf55206935c4..1ae251cd194a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2099,7 +2099,13 @@ static inline bool
folio_is_longterm_pinnable(struct folio *folio)
 #ifdef CONFIG_CMA
        int mt = folio_migratetype(folio);

-       if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
+       /*
+        * CMA pages mapping to VM by special driver may not on page
cache which has NULL folio->mapping and equal
+        * refcnt to mapcount
+        */
+       if ((mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) &&
+               (folio->mapping != NULL) &&
+               (folio_ref_count(folio) != folio_mapcount(folio)))
                return false;

>
> --
> Cheers,
>
> David / dhildenb
>


  reply	other threads:[~2025-06-04  1:04 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CAJrd-UtDD50iN=Yxz4=6kNkAcNAtRFkxhKAbEYiRyyDT-bYPHg@mail.gmail.com>
2025-05-22 10:18 ` 黄朝阳 (Zhaoyang Huang)
2025-05-22 12:22   ` David Hildenbrand
     [not found]     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p3>
2025-05-22 13:09       ` Jaewon Kim
2025-05-22 14:06         ` David Hildenbrand
     [not found]         ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p2>
2025-05-22 14:44           ` 김재원
2025-05-22 15:07             ` David Hildenbrand
2025-05-23  2:48               ` John Hubbard
2025-05-23  2:37           ` 김재원
2025-05-23  2:52             ` John Hubbard
2025-05-26  7:48               ` Hyesoo Yu
2025-05-26  8:05                 ` Zhaoyang Huang
2025-05-26  9:33                   ` Hyesoo Yu
2025-05-26  9:38                     ` David Hildenbrand
     [not found]                     ` <CGME20250522130101epcas1p435244c12cfc9bb7895008b8ea98af064@epcms1p8>
2025-05-26 11:17                       ` Jaewon Kim
2025-05-26 11:49                         ` Zhaoyang Huang
2025-05-28  1:23                           ` Hyesoo Yu
2025-05-28  2:49                             ` Zhaoyang Huang
2025-05-28  3:36                               ` Hyesoo Yu
2025-05-28  7:55                                 ` David Hildenbrand
2025-05-28 10:59                                   ` Zhaoyang Huang
2025-05-28 12:57                                     ` David Hildenbrand
2025-06-03 13:12                                     ` David Hildenbrand
2025-06-04  1:04                                       ` Zhaoyang Huang [this message]
2025-06-04  9:12                                         ` David Hildenbrand
2025-06-04  9:41                                           ` Zhaoyang Huang
2025-06-04  9:48                                             ` David Hildenbrand
     [not found]                                               ` <CGME20250604095542epcas2p3f3d2d6fc17115547981a7173215a09d1@epcas2p3.samsung.com>
2025-06-04  9:53                                                 ` Hyesoo Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGWkznEZYiu9ReU+zyeMb41YRo+kOcbv5_oQTHj87wu8uYPJ8w@mail.gmail.com \
    --to=huangzhaoyang@gmail.com \
    --cc=Steve.Kang@unisoc.com \
    --cc=david@redhat.com \
    --cc=hyesoo.yu@samsung.com \
    --cc=jaewon31.kim@gmail.com \
    --cc=jaewon31.kim@samsung.com \
    --cc=janghyuck.kim@samsung.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=surenb@google.com \
    --cc=zhaoyang.huang@unisoc.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox