linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jason Gunthorpe <jgg@nvidia.com>
To: Peter Xu <peterx@redhat.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Axel Rasmussen <axelrasmussen@google.com>,
	linux-arm-kernel@lists.infradead.org, x86@kernel.org,
	Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Zi Yan <ziy@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Ingo Molnar <mingo@redhat.com>,
	Alistair Popple <apopple@nvidia.com>,
	Borislav Petkov <bp@alien8.de>,
	David Hildenbrand <david@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Yan Zhao <yan.y.zhao@intel.com>
Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps
Date: Wed, 14 Aug 2024 09:37:15 -0300	[thread overview]
Message-ID: <20240814123715.GB2032816@nvidia.com> (raw)
In-Reply-To: <20240809160909.1023470-1-peterx@redhat.com>

On Fri, Aug 09, 2024 at 12:08:50PM -0400, Peter Xu wrote:
> Overview
> ========
> 
> This series is based on mm-unstable, commit 98808d08fc0f of Aug 7th latest,
> plus dax 1g fix [1].  Note that this series should also apply if without
> the dax 1g fix series, but when without it, mprotect() will trigger similar
> errors otherwise on PUD mappings.
> 
> This series implements huge pfnmaps support for mm in general.  Huge pfnmap
> allows e.g. VM_PFNMAP vmas to map in either PMD or PUD levels, similar to
> what we do with dax / thp / hugetlb so far to benefit from TLB hits.  Now
> we extend that idea to PFN mappings, e.g. PCI MMIO bars where it can grow
> as large as 8GB or even bigger.

FWIW, I've started to hear people talk about needing this in the VFIO
context with VMs.

vfio/iommufd will reassemble the contiguous range from the 4k PFNs to
setup the IOMMU, but KVM is not able to do it so reliably. There is a
notable performance gap with two dimensional paging between 4k and 1G
entries in the KVM table. The platforms are being architected with the
assumption that 1G TLB entires will be used throughout the hypervisor
environment.

> Currently, only x86_64 (1G+2M) and arm64 (2M) are supported.  

There is definitely interest here in extending ARM to support the 1G
size too, what is missing?

> The other trick is how to allow gup-fast working for such huge mappings
> even if there's no direct sign of knowing whether it's a normal page or
> MMIO mapping.  This series chose to keep the pte_special solution, so that
> it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that
> gup-fast will be able to identify them and fail properly.

Make sense

> More architectures / More page sizes
> ------------------------------------
> 
> Currently only x86_64 (2M+1G) and arm64 (2M) are supported.
> 
> For example, if arm64 can start to support THP_PUD one day, the huge pfnmap
> on 1G will be automatically enabled.

Oh that sounds like a bigger step..
 
> VFIO is so far the only consumer for the huge pfnmaps after this series
> applied.  Besides above remap_pfn_range() generic optimization, device
> driver can also try to optimize its mmap() on a better VA alignment for
> either PMD/PUD sizes.  This may, iiuc, normally require userspace changes,
> as the driver doesn't normally decide the VA to map a bar.  But I don't
> think I know all the drivers to know the full picture.

How does alignment work? In most caes I'm aware of the userspace does
not use MAP_FIXED so the expectation would be for the kernel to
automatically select a high alignment. I suppose your cases are
working because qemu uses MAP_FIXED and naturally aligns the BAR
addresses?

> - x86_64 + AMD GPU
>   - Needs Alex's modified QEMU to guarantee proper VA alignment to make
>     sure all pages to be mapped with PUDs

Oh :(

Jason


  parent reply	other threads:[~2024-08-14 12:37 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-09 16:08 Peter Xu
2024-08-09 16:08 ` [PATCH 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-09 16:34   ` David Hildenbrand
2024-08-09 17:16     ` Peter Xu
2024-08-09 18:06       ` David Hildenbrand
2024-08-09 16:08 ` [PATCH 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-09 16:34   ` David Hildenbrand
2024-08-14 12:38   ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-14 12:40   ` Jason Gunthorpe
2024-08-14 15:23     ` Peter Xu
2024-08-14 15:53       ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-14 12:40   ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-09 16:23   ` David Hildenbrand
2024-08-09 16:59     ` Peter Xu
2024-08-14 12:42       ` Jason Gunthorpe
2024-08-14 15:34         ` Peter Xu
2024-08-14 12:41   ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-09 16:32   ` David Hildenbrand
2024-08-09 17:15     ` Peter Xu
2024-08-09 17:59       ` David Hildenbrand
2024-08-12 18:29         ` Peter Xu
2024-08-12 18:50           ` David Hildenbrand
2024-08-12 19:05             ` Peter Xu
2024-08-09 16:08 ` [PATCH 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-14 13:09   ` Jason Gunthorpe
2024-08-14 15:43     ` Peter Xu
2024-08-09 16:08 ` [PATCH 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-14 13:19   ` Jason Gunthorpe
2024-08-14 18:24     ` Peter Xu
2024-08-14 22:14       ` Jason Gunthorpe
2024-08-15 15:41         ` Peter Xu
2024-08-15 16:16           ` Jason Gunthorpe
2024-08-15 17:21             ` Peter Xu
2024-08-15 17:24               ` Jason Gunthorpe
2024-08-15 18:52                 ` Peter Xu
2024-08-16 23:12   ` Sean Christopherson
2024-08-17 11:05     ` David Hildenbrand
2024-08-21 19:10     ` Peter Xu
2024-08-09 16:09 ` [PATCH 10/19] KVM: Use " Peter Xu
2024-08-09 17:23   ` Axel Rasmussen
2024-08-12 18:58     ` Peter Xu
2024-08-12 22:47       ` Axel Rasmussen
2024-08-12 23:44         ` Sean Christopherson
2024-08-14 13:15           ` Jason Gunthorpe
2024-08-14 14:23             ` Sean Christopherson
2024-08-09 16:09 ` [PATCH 11/19] s390/pci_mmio: " Peter Xu
2024-08-09 16:09 ` [PATCH 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-09 16:09 ` [PATCH 13/19] vfio: " Peter Xu
2024-08-14 13:20   ` Jason Gunthorpe
2024-08-09 16:09 ` [PATCH 14/19] acrn: " Peter Xu
2024-08-09 16:09 ` [PATCH 15/19] mm/access_process_vm: " Peter Xu
2024-08-09 16:09 ` [PATCH 16/19] mm: Remove follow_pte() Peter Xu
2024-08-09 16:09 ` [PATCH 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-09 16:09 ` [PATCH 18/19] mm/arm64: " Peter Xu
2024-08-09 16:09 ` [PATCH 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-14 13:25   ` Jason Gunthorpe
2024-08-14 16:08     ` Alex Williamson
2024-08-14 16:24       ` Jason Gunthorpe
     [not found] ` <20240809160909.1023470-7-peterx@redhat.com>
2024-08-09 16:20   ` [PATCH 06/19] mm/pagewalk: Check pfnmap early for folio_walk_start() David Hildenbrand
2024-08-09 16:54     ` Peter Xu
2024-08-09 17:25       ` David Hildenbrand
2024-08-09 21:37         ` Peter Xu
2024-08-14 13:05         ` Jason Gunthorpe
2024-08-16  9:30           ` David Hildenbrand
2024-08-16 14:21             ` Peter Xu
2024-08-16 17:38               ` Jason Gunthorpe
2024-08-21 18:42                 ` Peter Xu
2024-08-16 17:56               ` David Hildenbrand
2024-08-19 12:19                 ` Jason Gunthorpe
2024-08-19 14:19                   ` Sean Christopherson
2024-08-09 18:12 ` [PATCH 00/19] mm: Support huge pfnmaps David Hildenbrand
2024-08-14 12:37 ` Jason Gunthorpe [this message]
2024-08-14 14:35   ` Sean Christopherson
2024-08-14 14:42     ` Paolo Bonzini
2024-08-14 14:43     ` Jason Gunthorpe
2024-08-14 20:54       ` Sean Christopherson
2024-08-14 22:00         ` Sean Christopherson
2024-08-14 22:10         ` Jason Gunthorpe
2024-08-14 23:36           ` Oliver Upton
2024-08-14 23:27         ` Oliver Upton
2024-08-14 23:38           ` Oliver Upton
2024-08-15  0:23             ` Sean Christopherson
2024-08-15 19:20   ` Peter Xu
2024-08-16  3:05     ` Kefeng Wang
2024-08-16 14:33       ` Peter Xu
2024-08-19 13:14         ` Kefeng Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240814123715.GB2032816@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox