linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: Jason Gunthorpe <jgg@nvidia.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Nico Pache <npache@redhat.com>,
	Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
	David Hildenbrand <david@redhat.com>,
	Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
	David Laight <david.laight.linux@gmail.com>,
	Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 2/4] mm: Add file_operations.get_mapping_order()
Date: Wed, 10 Dec 2025 15:23:02 -0500	[thread overview]
Message-ID: <aTnWphMGVwWl12FX@x1.local> (raw)
In-Reply-To: <aTWpjOhLOMOB2e74@nvidia.com>

On Sun, Dec 07, 2025 at 12:21:32PM -0400, Jason Gunthorpe wrote:
> On Thu, Dec 04, 2025 at 10:10:01AM -0500, Peter Xu wrote:
> > Add one new file operation, get_mapping_order().  It can be used by file
> > backends to report mapping order hints.
> > 
> > By default, Linux assumed we will map in PAGE_SIZE chunks.  With this hint,
> > the driver can report the possibility of mapping chunks that are larger
> > than PAGE_SIZE.  Then, the VA allocator will try to use that as alignment
> > when allocating the VA ranges.
> > 
> > This is useful because when chunks to be mapped are larger than PAGE_SIZE,
> > VA alignment matters and it needs to be aligned with the size of the chunk
> > to be mapped.
> > 
> > Said that, no matter what is the alignment used for the VA allocation, the
> > driver can still decide which size to map the chunks.  It is also not an
> > issue if it keeps mapping in PAGE_SIZE.
> > 
> > get_mapping_order() is defined to take three parameters.  Besides the 1st
> > parameter which will be the file object pointer, the 2nd + 3rd parameters
> > being the pgoff + size of the mmap() request.  Its retval is defined as the
> > order, which must be non-negative to enable the alignment.  When zero is
> > returned, it should behave like when the hint is not provided, IOW,
> > alignment will still be PAGE_SIZE.
> 
> This should explain how it works when the incoming pgoff is not
> aligned..

Hmm, I thought the charm of this new proposal (based on suggestions of your
v1 reviews) is to not need to worry on this..  Or maybe you meant I should
add some doc comments in the commit message?

If so I can do that.

thp_get_unmapped_area_vmflags() should have taken all kinds of pgoff
unalignment into account.  It's just that this v2 is better than v1 when
using this new API because that THP function doesn't need to be exported
anymore.

> 
> I think for dpdk we want to support mapping around the MSI hole so
> something like
> 
>  pgoff 0 -> 2M
>  skip 4k
>  2m + 4k -> 64M
> 
> Should setup the last VMA to align to 2M + 4k so the first PMD is
> fragmented to 4k pages but the remaning part is 2M sized or better.
> 
> We just noticed a bug very similer to this in qemu around it's manual
> alignment scheme where it would de-align things around the MSI window
> and spoil the PMDs.

Right, IIUC this series should work all fine exactly as you said.

Here the driver should only care about what owns the content of (pgoff,
len) range, and the proper order to map these chunks.  In case of VFIO, it
will know what BAR it's mapping, so as to return a proper order for that
specific bar pointed by (pgoff, len).

The driver doesn't need to worry on anything else like above.

Let me know if I misread your question, or if this series doesn't achieve
what you're asking here..

Thanks,

> 
> I guess ideally the file could return the order assuming an aligned-to-start
> pgoff and the core code could use that order to compute an adjustment
> for
> the actual pgoff so we maintain:
>   va % order = pgoff % order
> 
> Jason
> 

-- 
Peter Xu



  reply	other threads:[~2025-12-10 20:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 15:09 [PATCH v2 0/4] mm/vfio: huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-04 15:10 ` [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Peter Xu
2025-12-04 15:10 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-04 15:19   ` Peter Xu
2025-12-08  9:21     ` Matthew Wilcox
2025-12-10 20:24       ` Peter Xu
2025-12-07 16:21   ` Jason Gunthorpe
2025-12-10 20:23     ` Peter Xu [this message]
2025-12-04 15:10 ` [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Peter Xu
2025-12-04 15:10 ` [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-05  4:33   ` kernel test robot
2025-12-05  7:45   ` kernel test robot
2025-12-07 16:26   ` Jason Gunthorpe
2025-12-10 20:43     ` Peter Xu
2025-12-08  3:11   ` Alex Mastro
2025-12-04 18:16 ` [PATCH v2 0/4] mm/vfio: " Cédric Le Goater
2025-12-07  9:13 ` Alex Mastro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aTnWphMGVwWl12FX@x1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=david.laight.linux@gmail.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhiw@nvidia.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox