[RFC] Huge remap_pfn_range for vfio-pci

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Axel Rasmussen <axelrasmussen@google.com>
To: Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
	 David Hildenbrand <david@redhat.com>,
	Sean Christopherson <seanjc@google.com>
Cc: Linux MM <linux-mm@kvack.org>
Subject: [RFC] Huge remap_pfn_range for vfio-pci
Date: Fri, 24 May 2024 13:54:20 -0700	[thread overview]
Message-ID: <CAJHvVcge-1JhHd4HQKXye9F-WrrMQZ66oU6XF2ie3PCKXaihKw@mail.gmail.com> (raw)

Hi,

I'm interested in extending remap_pfn_range to allow it to map the
range hugely (using PUDs or PMDs). The initial user I have in mind is
vfio-pci; I'm thinking when we're mapping large ranges for GPUs, we
can get both a performance and host overhead win by doing this hugely.

Another thing I have in the back of my mind is adding something KVM
can re-use to simplify its whole host_pfn_mapping_level /
hva_to_pfn_remapped / get_user_page_fast_only thing.

I know Peter and David are working on some related things (hugetlbfs
unification and follow_pte et al improvements, respectively). Although
I have a hacky proof of concept that works, I thought it best to get
some consensus on the design before I post something, so I don't
conflict with this existing / upcoming work.

Changing remap_pfn_range to install PUDs or PMDs is straightforward.
The hairy part is the fault / follow side of things:

1. follow_pte clearly doesn't work for this, since the leaf might be a
PUD or PMD instead. Most callers don't care about the PTE itself, they
care about the pgprot or flags it has set, so my idea was to add a new
interface which just yields those bits, instead of the actual PTE.

Peter, I think hugetlbfs unification may run into similar issues, do
you have some plan already to deal with PUD/PMD/PTE being different
types?

2. vfio-pci relies on vm_ops->fault. This is a problem because the
normal fault handler path doesn't call this until after it has walked
down to the PTE level, installing PUDs/PMDs along the way. I have only
gross ideas for how to deal with this:

- Add a VM_HUGEPFNMAP VMA flag indicating vm_ops->fault should be
called earlier in __handle_mm_fault
- Add a vm_ops->hugepfn_fault (name not important) which should be
called earlier in __handle_mm_fault
- Go ahead and let remap_pfn_range overwrite existing PUDs/PMDS

I wonder which of these folks find least offensive? Or is there a
better way I haven't thought of?

3. That's also an issue for CoW faults, but I don't know of any real
use case for CoW huge pfn mappings, so I thought we can just keep the
existing small mapping behavior for CoW VMAs. Any objections?

Thanks!

next             reply	other threads:[~2024-05-24 20:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-24 20:54 Axel Rasmussen [this message]
2024-05-24 23:31 ` Peter Xu
2024-05-30 16:59   ` Axel Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJHvVcge-1JhHd4HQKXye9F-WrrMQZ66oU6XF2ie3PCKXaihKw@mail.gmail.com \
    --to=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox