linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: David Hildenbrand <david@redhat.com>,
	Jinjiang Tu <tujinjiang@huawei.com>,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
	surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
	wangkefeng.wang@huawei.com
Subject: Re: [PATCH] mm: fix COW mapping handing in generic_access_phys
Date: Wed, 28 May 2025 13:14:58 -0400	[thread overview]
Message-ID: <aDdEkhwDS2YMu9OV@x1.local> (raw)
In-Reply-To: <20250528162915.GR61950@nvidia.com>

On Wed, May 28, 2025 at 01:29:15PM -0300, Jason Gunthorpe wrote:
> On Wed, May 28, 2025 at 12:06:07PM -0400, Peter Xu wrote:
> > #define VM_PFNMAP	0x00000400	/* Page-ranges managed without "struct page", just pure PFN */
> > 
> > I'm not confident to blame any driver yet to have those special cases for
> > VM_PFNMAP, because it only says "managed without struct page", it didn't
> > say "it must not contain struct page"..  Hence it hints the core mm "please
> > do not manage these mappings with struct page at all".  Still sounds fair
> > contract, even if not ideal.
> 
> I think it is pretty clear, if a VMA has VM_PFNMAP then nothing must
> ever try to obtain a struct page from any PTEs in it, for any reason,
> even if things in it might have a struct page. In practice it means
> nothing can call vm_normal_page() on a VM_PFNMAP.
> 
> It would be nice to update the comment to make it clearer.

Yes that would help. Maybe the hard part is making sure how it is
documented will be how it is used..

> 
> If the VMA owner wanted to permit access to the struct page then it
> should have used VM_MIXEDMAP.
> 
> The fundamental difference between PFNMAP and MIXEDMAP is that
> vm_normal_page() is allowed on MIXEDMAP. That comes with some extra
> rules and restrictions to support arches without the special pte bit.

If in the ideal world where VM_PFNMAP has a stricter semantics, it sounds
fair to disable vm_normal_page() on top of VM_PFNMAP, yes.

> 
> VM_IO | VM_PFNMAP further means that all the pfns in the VMA require
> the use of io accessors (writel/readl) to access them.

Hmm.. I'm not 100% sure on this one.  E.g., vDSO is VM_IO now but it's
definitely accessible that got mapped into userspace.  Same case when
e.g. mmap() VM_IO regions then accessing using the virtual addresses.

What you said sounds more like what __iomem declares rather than VM_IO.
But I confess I at least don't know why VM_IO existed, considering there're
also VM_*MAP and VM_DONTDUMP.

> 
> No idea what VM_IO | VM_MIXEDMAP is supposed to mean. Only the special
> ptes need io accessors?
> 
> In either case GUP doesn't really work on the VMA. PFNMAP is totally
> blocked, and for MIXEDMAP userspace has no way to discover which
> subset of the VMA is GUPable. I think that GUP is supported on
> MIXEDMAP at all is a bit of a weirdo thing.

Does it imply that in the ideal case one should use follow_pfnmap_start()
for MIXEDMAP?

I don't have a strong feeling yet on how GUP should treat MIXEDMAP, either
(1) fail MIXEDMAP like you said, falling back to follow_pfnmap_start(), or
(2) allow MIXEDMAP only on page-backed mappings, then fallback to
follow_pfnmap_start() on non-page-backed mappings only.

> 
> The main value of MIXEDMAP having access to the struct page is to
> allow the page table itself to manage the life-cycle of memory that
> was allocated specifically and only for the VMA. It will do fork and
> unmap properly and free the memory at the right time.

Agree.

> 
> If a driver is managing the memory lifecylce on its own and using zap
> to clear the PTEs (because it must to manage the non-struct page PTE
> life-cycle), then there isn't much point in using MIXEDMAP, IMHO. I
> wonder if some of the GPU drivers are confused like this.
> 
> Jason
> 

-- 
Peter Xu



  reply	other threads:[~2025-05-28 17:15 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-28  1:56 Jinjiang Tu
2025-05-28  8:59 ` David Hildenbrand
2025-05-28  9:59   ` David Hildenbrand
2025-05-28 12:14     ` Jinjiang Tu
2025-05-28 14:54     ` Peter Xu
2025-05-28 15:02       ` David Hildenbrand
2025-05-28 15:25         ` Peter Xu
2025-05-28 15:29           ` David Hildenbrand
2025-05-28 16:06             ` Peter Xu
2025-05-28 16:29               ` Jason Gunthorpe
2025-05-28 17:14                 ` Peter Xu [this message]
2025-05-28 17:34                   ` Jason Gunthorpe
2025-05-28 17:37                   ` David Hildenbrand
2025-05-28 17:32                 ` David Hildenbrand
2025-05-28 17:47                   ` Jason Gunthorpe
2025-05-28 17:59                     ` Jason Gunthorpe
2025-05-28 18:03                       ` David Hildenbrand
2025-05-28 18:00                     ` David Hildenbrand
2025-05-28 18:15                       ` Jason Gunthorpe
2025-05-28 18:22                         ` David Hildenbrand
2025-05-28 18:29                           ` Jason Gunthorpe
2025-05-30 10:04                             ` David Hildenbrand
2025-05-28 12:13   ` Jinjiang Tu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aDdEkhwDS2YMu9OV@x1.local \
    --to=peterx@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=tujinjiang@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox