linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: kvm@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, Nico Pache <npache@redhat.com>,
	Zi Yan <ziy@nvidia.com>, Alex Mastro <amastro@fb.com>,
	David Hildenbrand <david@redhat.com>,
	Alex Williamson <alex@shazbot.org>, Zhi Wang <zhiw@nvidia.com>,
	David Laight <david.laight.linux@gmail.com>,
	Yi Liu <yi.l.liu@intel.com>, Ankit Agrawal <ankita@nvidia.com>,
	Kevin Tian <kevin.tian@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings
Date: Tue, 16 Dec 2025 10:42:24 -0400	[thread overview]
Message-ID: <20251216144224.GE6079@nvidia.com> (raw)
In-Reply-To: <aTnbf_dtwOo_gaVM@x1.local>

On Wed, Dec 10, 2025 at 03:43:43PM -0500, Peter Xu wrote:
> > This seems a bit weird, the vma length is already known, it is len,
> > why do we go to all this trouble to recalculate len in terms of phys?
> > 
> > If the length is wrong the mmap will fail, so there is no issue with
> > returning a larger order here.
> > 
> > I feel this should just return the order based on pci_resource_len()?
> 
> IIUC there's a trivial difference when partial of a huge bar is mapped.
> 
> Example: 1G bar, map range (pgoff=2M, size=1G-2M).
> 
> If we return bar size order, we'd say 1G, however then it means we'll do
> the alignment with 1G. __thp_get_unmapped_area() will think it's not
> proper, because:
> 
> 	loff_t off_end = off + len;
> 	loff_t off_align = round_up(off, size);
> 
> 	if (off_end <= off_align || (off_end - off_align) < size)
> 		return 0;
> 
> Here what we really want is to map (2M, 1G-2M) with 2M huge, not 1G, nor
> 4K.

This was the point of my prior email, the alignment calculation can't
just be 'align to a size'. The core code needs to choose a VA such
that:

   VA % (1 << order) == pg_off % (1 << order)

So if VFIO returns 1G then the VA should be aligned to 2M within a 1G
region. This allows opportunities to increase the alignment as the
mapping continues, eg if the arch supports a 32M 16x2M contiguous page
then we'd get 32M mappings with your example.

The core code should adjust the target order from the driver by:
   lowest of order or size rounded down to a power of two
 then
   highest arch supported leaf size below the above

None of this logic should be in drivers.

The way to think about this is that the driver is returning an order
which indicates the maximum case where:

   VA % (1 << order) == pg_off % (1 << order)

Could be true. Eg a PCI BAR returns an order that is the size of the
BAR because it is always true. Something that stores at most 1G pages
would return 1G pages, etc.

> Note that here checking CONFIG_ARCH_SUPPORTS_P*D_PFNMAP is a vfio behavior,
> pairing with the huge_fault() of vfio-pci driver.  It implies if vfio-pci's
> huge pfnmap is enabled or not.  If it's not enabled, we don't need to
> report larger orders here.

Honestly, I'd ignore this, and I'm not sure VFIO should be testing
those in the huge_fault either. Again the core code should deal with
it.

> Shall I keep it simple to leave it to drivers, until we have something more
> solid (I think we need HAVE_ARCH_HUGE_P*D_LEAVES here)?

Logic like this should not be in drivers.

> Even with that config ready, drivers should always still do proper check on
> its own (drivers need to support huge pfnmaps here first before reporting
> high orders).  

Drivers shouldn't implement this alignment function without also
implementing huge fault, it is pointless. Don't see a reason to add
extra complexity.

Jason


  reply	other threads:[~2025-12-16 14:42 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-04 15:09 [PATCH v2 0/4] mm/vfio: " Peter Xu
2025-12-04 15:10 ` [PATCH v2 1/4] mm/thp: Allow thp_get_unmapped_area_vmflags() to take alignment Peter Xu
2025-12-04 15:10 ` [PATCH v2 2/4] mm: Add file_operations.get_mapping_order() Peter Xu
2025-12-04 15:19   ` Peter Xu
2025-12-08  9:21     ` Matthew Wilcox
2025-12-10 20:24       ` Peter Xu
2025-12-07 16:21   ` Jason Gunthorpe
2025-12-10 20:23     ` Peter Xu
2025-12-16 14:44       ` Jason Gunthorpe
2025-12-16 15:42         ` Peter Xu
2025-12-16 17:19           ` Jason Gunthorpe
2025-12-16 17:36             ` Peter Xu
2025-12-16 18:58               ` Jason Gunthorpe
2025-12-16 19:44                 ` Peter Xu
2025-12-19 14:59                   ` Jason Gunthorpe
2025-12-19 15:13                     ` Peter Xu
2025-12-19 15:20                       ` Jason Gunthorpe
2025-12-19 15:53                         ` Peter Xu
2025-12-04 15:10 ` [PATCH v2 3/4] vfio: Introduce vfio_device_ops.get_mapping_order hook Peter Xu
2025-12-04 15:10 ` [PATCH v2 4/4] vfio-pci: Best-effort huge pfnmaps with !MAP_FIXED mappings Peter Xu
2025-12-05  4:33   ` kernel test robot
2025-12-05  7:45   ` kernel test robot
2025-12-07 16:26   ` Jason Gunthorpe
2025-12-10 20:43     ` Peter Xu
2025-12-16 14:42       ` Jason Gunthorpe [this message]
2025-12-16 16:01         ` Peter Xu
2025-12-16 19:01           ` Jason Gunthorpe
2025-12-16 19:58             ` Peter Xu
2025-12-08  3:11   ` Alex Mastro
2025-12-04 18:16 ` [PATCH v2 0/4] mm/vfio: " Cédric Le Goater
2025-12-07  9:13 ` Alex Mastro

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251216144224.GE6079@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex@shazbot.org \
    --cc=amastro@fb.com \
    --cc=ankita@nvidia.com \
    --cc=david.laight.linux@gmail.com \
    --cc=david@redhat.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=npache@redhat.com \
    --cc=peterx@redhat.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhiw@nvidia.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox