From: Peter Xu <peterx@redhat.com>
To: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Sean Christopherson <seanjc@google.com>,
Oscar Salvador <osalvador@suse.de>,
Axel Rasmussen <axelrasmussen@google.com>,
linux-arm-kernel@lists.infradead.org, x86@kernel.org,
Will Deacon <will@kernel.org>, Gavin Shan <gshan@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, Zi Yan <ziy@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Ingo Molnar <mingo@redhat.com>,
Alistair Popple <apopple@nvidia.com>,
Borislav Petkov <bp@alien8.de>,
David Hildenbrand <david@redhat.com>,
Thomas Gleixner <tglx@linutronix.de>,
kvm@vger.kernel.org, Dave Hansen <dave.hansen@linux.intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Yan Zhao <yan.y.zhao@intel.com>
Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps
Date: Fri, 16 Aug 2024 10:33:04 -0400 [thread overview]
Message-ID: <Zr9jIKp_vWyfCzQs@x1n> (raw)
In-Reply-To: <1147332f-790e-487f-8816-1860b8744ab2@huawei.com>
On Fri, Aug 16, 2024 at 11:05:33AM +0800, Kefeng Wang wrote:
>
>
> On 2024/8/16 3:20, Peter Xu wrote:
> > On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote:
> > > > Currently, only x86_64 (1G+2M) and arm64 (2M) are supported.
> > >
> > > There is definitely interest here in extending ARM to support the 1G
> > > size too, what is missing?
> >
> > Currently PUD pfnmap relies on THP_PUD config option:
> >
> > config ARCH_SUPPORTS_PUD_PFNMAP
> > def_bool y
> > depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> >
> > Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet.
> >
> > Ideally, pfnmap is too simple comparing to real THPs and it shouldn't
> > require to depend on THP at all, but we'll need things like below to land
> > first:
> >
> > https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com
> >
> > I sent that first a while ago, but I didn't collect enough inputs, and I
> > decided to unblock this series from that, so x86_64 shouldn't be affected,
> > and arm64 will at least start to have 2M.
> >
> > >
> > > > The other trick is how to allow gup-fast working for such huge mappings
> > > > even if there's no direct sign of knowing whether it's a normal page or
> > > > MMIO mapping. This series chose to keep the pte_special solution, so that
> > > > it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that
> > > > gup-fast will be able to identify them and fail properly.
> > >
> > > Make sense
> > >
> > > > More architectures / More page sizes
> > > > ------------------------------------
> > > >
> > > > Currently only x86_64 (2M+1G) and arm64 (2M) are supported.
> > > >
> > > > For example, if arm64 can start to support THP_PUD one day, the huge pfnmap
> > > > on 1G will be automatically enabled.
>
> A draft patch to enable THP_PUD on arm64, only passed with DEBUG_VM_PGTABLE,
> we may test pud pfnmaps on arm64.
Thanks, Kefeng. It'll be great if this works already, as simple.
Might be interesting to know whether it works already if you have some
few-GBs GPU around on the systems.
Logically as long as you have HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD selected
below, 1g pfnmap will be automatically enabled when you rebuild the kernel.
You can double check that by looking for this:
CONFIG_ARCH_SUPPORTS_PUD_PFNMAP=y
And you can try to observe the mappings by enabling dynamic debug for
vfio_pci_mmap_huge_fault(), then map the bar with vfio-pci and read
something from it.
>
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a2f8ff354ca6..ff0d27c72020 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -184,6 +184,7 @@ config ARM64
> select HAVE_ARCH_THREAD_STRUCT_WHITELIST
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE
> + select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if PGTABLE_LEVELS > 2
> select HAVE_ARCH_VMAP_STACK
> select HAVE_ARM_SMCCC
> select HAVE_ASM_MODVERSIONS
> diff --git a/arch/arm64/include/asm/pgtable.h
> b/arch/arm64/include/asm/pgtable.h
> index 7a4f5604be3f..e013fe458476 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -763,6 +763,25 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> #define pud_valid(pud) pte_valid(pud_pte(pud))
> #define pud_user(pud) pte_user(pud_pte(pud))
> #define pud_user_exec(pud) pte_user_exec(pud_pte(pud))
> +#define pud_dirty(pud) pte_dirty(pud_pte(pud))
> +#define pud_devmap(pud) pte_devmap(pud_pte(pud))
> +#define pud_wrprotect(pud) pte_pud(pte_wrprotect(pud_pte(pud)))
> +#define pud_mkold(pud) pte_pud(pte_mkold(pud_pte(pud)))
> +#define pud_mkwrite(pud) pte_pud(pte_mkwrite_novma(pud_pte(pud)))
> +#define pud_mkclean(pud) pte_pud(pte_mkclean(pud_pte(pud)))
> +#define pud_mkdirty(pud) pte_pud(pte_mkdirty(pud_pte(pud)))
> +
> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> +static inline int pud_trans_huge(pud_t pud)
> +{
> + return pud_val(pud) && pud_present(pud) && !(pud_val(pud) &
> PUD_TABLE_BIT);
> +}
> +
> +static inline pud_t pud_mkdevmap(pud_t pud)
> +{
> + return pte_pud(set_pte_bit(pud_pte(pud), __pgprot(PTE_DEVMAP)));
> +}
> +#endif
>
> static inline bool pgtable_l4_enabled(void);
>
> @@ -1137,10 +1156,20 @@ static inline int pmdp_set_access_flags(struct
> vm_area_struct *vma,
> pmd_pte(entry), dirty);
> }
>
> +static inline int pudp_set_access_flags(struct vm_area_struct *vma,
> + unsigned long address, pud_t *pudp,
> + pud_t entry, int dirty)
> +{
> + return __ptep_set_access_flags(vma, address, (pte_t *)pudp,
> + pud_pte(entry), dirty);
> +}
> +
> +#ifndef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
> static inline int pud_devmap(pud_t pud)
> {
> return 0;
> }
> +#endif
>
> static inline int pgd_devmap(pgd_t pgd)
> {
> @@ -1213,6 +1242,13 @@ static inline int pmdp_test_and_clear_young(struct
> vm_area_struct *vma,
> {
> return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp);
> }
> +
> +static inline int pudp_test_and_clear_young(struct vm_area_struct *vma,
> + unsigned long address,
> + pud_t *pudp)
> +{
> + return __ptep_test_and_clear_young(vma, address, (pte_t *)pudp);
> +}
> #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>
> static inline pte_t __ptep_get_and_clear(struct mm_struct *mm,
> @@ -1433,6 +1469,7 @@ static inline void update_mmu_cache_range(struct
> vm_fault *vmf,
> #define update_mmu_cache(vma, addr, ptep) \
> update_mmu_cache_range(NULL, vma, addr, ptep, 1)
> #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0)
> +#define update_mmu_cache_pud(vma, address, pud) do { } while (0)
>
> #ifdef CONFIG_ARM64_PA_BITS_52
> #define phys_to_ttbr(addr) (((addr) | ((addr) >> 46)) & TTBR_BADDR_MASK_52)
> --
> 2.27.0
--
Peter Xu
next prev parent reply other threads:[~2024-08-16 14:33 UTC|newest]
Thread overview: 90+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-09 16:08 Peter Xu
2024-08-09 16:08 ` [PATCH 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-09 16:34 ` David Hildenbrand
2024-08-09 17:16 ` Peter Xu
2024-08-09 18:06 ` David Hildenbrand
2024-08-09 16:08 ` [PATCH 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-09 16:34 ` David Hildenbrand
2024-08-14 12:38 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-14 12:40 ` Jason Gunthorpe
2024-08-14 15:23 ` Peter Xu
2024-08-14 15:53 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-14 12:40 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-09 16:23 ` David Hildenbrand
2024-08-09 16:59 ` Peter Xu
2024-08-14 12:42 ` Jason Gunthorpe
2024-08-14 15:34 ` Peter Xu
2024-08-14 12:41 ` Jason Gunthorpe
2024-08-09 16:08 ` [PATCH 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-09 16:32 ` David Hildenbrand
2024-08-09 17:15 ` Peter Xu
2024-08-09 17:59 ` David Hildenbrand
2024-08-12 18:29 ` Peter Xu
2024-08-12 18:50 ` David Hildenbrand
2024-08-12 19:05 ` Peter Xu
2024-08-09 16:08 ` [PATCH 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-14 13:09 ` Jason Gunthorpe
2024-08-14 15:43 ` Peter Xu
2024-08-09 16:08 ` [PATCH 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-14 13:19 ` Jason Gunthorpe
2024-08-14 18:24 ` Peter Xu
2024-08-14 22:14 ` Jason Gunthorpe
2024-08-15 15:41 ` Peter Xu
2024-08-15 16:16 ` Jason Gunthorpe
2024-08-15 17:21 ` Peter Xu
2024-08-15 17:24 ` Jason Gunthorpe
2024-08-15 18:52 ` Peter Xu
2024-08-16 23:12 ` Sean Christopherson
2024-08-17 11:05 ` David Hildenbrand
2024-08-21 19:10 ` Peter Xu
2024-08-09 16:09 ` [PATCH 10/19] KVM: Use " Peter Xu
2024-08-09 17:23 ` Axel Rasmussen
2024-08-12 18:58 ` Peter Xu
2024-08-12 22:47 ` Axel Rasmussen
2024-08-12 23:44 ` Sean Christopherson
2024-08-14 13:15 ` Jason Gunthorpe
2024-08-14 14:23 ` Sean Christopherson
2024-08-09 16:09 ` [PATCH 11/19] s390/pci_mmio: " Peter Xu
2024-08-09 16:09 ` [PATCH 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-09 16:09 ` [PATCH 13/19] vfio: " Peter Xu
2024-08-14 13:20 ` Jason Gunthorpe
2024-08-09 16:09 ` [PATCH 14/19] acrn: " Peter Xu
2024-08-09 16:09 ` [PATCH 15/19] mm/access_process_vm: " Peter Xu
2024-08-09 16:09 ` [PATCH 16/19] mm: Remove follow_pte() Peter Xu
2024-08-09 16:09 ` [PATCH 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-09 16:09 ` [PATCH 18/19] mm/arm64: " Peter Xu
2024-08-09 16:09 ` [PATCH 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-14 13:25 ` Jason Gunthorpe
2024-08-14 16:08 ` Alex Williamson
2024-08-14 16:24 ` Jason Gunthorpe
[not found] ` <20240809160909.1023470-7-peterx@redhat.com>
2024-08-09 16:20 ` [PATCH 06/19] mm/pagewalk: Check pfnmap early for folio_walk_start() David Hildenbrand
2024-08-09 16:54 ` Peter Xu
2024-08-09 17:25 ` David Hildenbrand
2024-08-09 21:37 ` Peter Xu
2024-08-14 13:05 ` Jason Gunthorpe
2024-08-16 9:30 ` David Hildenbrand
2024-08-16 14:21 ` Peter Xu
2024-08-16 17:38 ` Jason Gunthorpe
2024-08-21 18:42 ` Peter Xu
2024-08-16 17:56 ` David Hildenbrand
2024-08-19 12:19 ` Jason Gunthorpe
2024-08-19 14:19 ` Sean Christopherson
2024-08-09 18:12 ` [PATCH 00/19] mm: Support huge pfnmaps David Hildenbrand
2024-08-14 12:37 ` Jason Gunthorpe
2024-08-14 14:35 ` Sean Christopherson
2024-08-14 14:42 ` Paolo Bonzini
2024-08-14 14:43 ` Jason Gunthorpe
2024-08-14 20:54 ` Sean Christopherson
2024-08-14 22:00 ` Sean Christopherson
2024-08-14 22:10 ` Jason Gunthorpe
2024-08-14 23:36 ` Oliver Upton
2024-08-14 23:27 ` Oliver Upton
2024-08-14 23:38 ` Oliver Upton
2024-08-15 0:23 ` Sean Christopherson
2024-08-15 19:20 ` Peter Xu
2024-08-16 3:05 ` Kefeng Wang
2024-08-16 14:33 ` Peter Xu [this message]
2024-08-19 13:14 ` Kefeng Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zr9jIKp_vWyfCzQs@x1n \
--to=peterx@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=alex.williamson@redhat.com \
--cc=apopple@nvidia.com \
--cc=axelrasmussen@google.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=jgg@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=osalvador@suse.de \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=wangkefeng.wang@huawei.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yan.y.zhao@intel.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox