From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE407C3DA4A for ; Fri, 16 Aug 2024 14:33:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 671CA6B032D; Fri, 16 Aug 2024 10:33:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FA926B032E; Fri, 16 Aug 2024 10:33:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 473FD6B032F; Fri, 16 Aug 2024 10:33:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 25F226B032D for ; Fri, 16 Aug 2024 10:33:13 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C43E2141BF4 for ; Fri, 16 Aug 2024 14:33:12 +0000 (UTC) X-FDA: 82458351024.21.74417DA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 6DF0D140016 for ; Fri, 16 Aug 2024 14:33:10 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fv6Qmkgz; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723818777; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pI+A4TxlCP26Io+FMl9dijsLZDh/SBaAeeqWjnr6+2U=; b=0Bjl7ogMPzibf7lPSYZ/RpqJUdqlDh34aChiztdGAfMSgi/ARQUiQZS81PsFvYP2g7dchi 3RbFi2ay4wEZKeEIlXdKSJ5/1DWfjSCW6RcwvNLdKgYAw0WjQdE3kMJ0Xe5t3TbONe2ipU ttMzjiF3wcj5O0b586/yVrucPpyayJc= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fv6Qmkgz; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723818777; a=rsa-sha256; cv=none; b=8jqtaRsEq94fai5UOTH2lmUjfYhXCmKuJmuSEmGWuTDZmTGtIBpcUBo4kOCW6Yt1lj8vEI At1y2cK7pZ5lTnPuYnCWqrl1GM3vrpDjE95KZt9uW5WbZHw2FXZoF9WaePdxPKNGOAoQFF 1rnVrDT5x47clpSzSWuVs2NyrWUCfW8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1723818789; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pI+A4TxlCP26Io+FMl9dijsLZDh/SBaAeeqWjnr6+2U=; b=fv6QmkgzKYK25zZyxcHmXPLpEzako8nExynguof2qA9bFRShmUUDXls9UbaWa0q0WhaRH9 pTW8/gomMPraZ5gQqx5tPcO3V1fAylKrSSWHrKifPtQft4kenDBQRoqhq/ZPEyQjFTOK+q Yqc2BTEnW+utOpwkatxV4A3QGYJquCo= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-230-OOjiyVbYOTWJ0kDnaWGRlw-1; Fri, 16 Aug 2024 10:33:08 -0400 X-MC-Unique: OOjiyVbYOTWJ0kDnaWGRlw-1 Received: by mail-qv1-f69.google.com with SMTP id 6a1803df08f44-6bba0f9d3efso4131796d6.3 for ; Fri, 16 Aug 2024 07:33:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723818788; x=1724423588; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=pI+A4TxlCP26Io+FMl9dijsLZDh/SBaAeeqWjnr6+2U=; b=uJGOSUfLbm57VYKbbdHX5/FEQeQFYvo4KKwZsHn72wxiy2Y2iVYLRZo8/BBBYciD/4 fwtz2J71pHu2eHYnSsOymK5zUkqYfGQGpSGwkbxGi0hLEbj8a/S2qYKvZyHLu2JVQE0J hj7JGgzW0PUBstjYK2yoT2U40PQSHVDx5z0e2quwkYFu7zJJhTBiuJNwAF10M3Pt/HfP YUW7sryF70DBzjsdDm0QHMqyge1tKA4LhsDBhDFH3ZzsV5AQzvlYc+FfsCVkRpO0Lwcm F+J68jfGGQyq/oROHIbW0ChjLBx7qjkDrGQWMr7WPhCDCIybSdYCC23Bp+mwSMOhwZMr OxpA== X-Forwarded-Encrypted: i=1; AJvYcCWvw3Ju13mvfIjp40PLAQMsu5XTuenNq4lgZS/vdLq8yo24GRxYMdgz+Kzp4yVrnH1G5jlBj4LVYA==@kvack.org X-Gm-Message-State: AOJu0Yx1evNYOr5mH8en+q4O4qaoO+HqsCjHXcZTPMcOUkVTfn31jFb4 JGYo9GtDZgOYBdyFItv8BVZmDj0H6S4kh4PYXYNEnqU0JNWJGNJ8dcpPW5RM2Zczs5OUR9Yr81V +WTQDt/Q9L3ShN8/mC5fJtOVyGyAWk+2SMAW+H/3v3MB5FCa9 X-Received: by 2002:a05:6214:246c:b0:6bb:3f69:dd0c with SMTP id 6a1803df08f44-6bf7cf24591mr18243466d6.9.1723818787764; Fri, 16 Aug 2024 07:33:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGAXotyBPW98wWBPOCwOllaVQ2bg66VXsApljb5NIRTUYVV9cPWLSGNiu0ld5IKhS5uOnqJhg== X-Received: by 2002:a05:6214:246c:b0:6bb:3f69:dd0c with SMTP id 6a1803df08f44-6bf7cf24591mr18243216d6.9.1723818787295; Fri, 16 Aug 2024 07:33:07 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bf6ff06d71sm17954566d6.129.2024.08.16.07.33.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Aug 2024 07:33:06 -0700 (PDT) Date: Fri, 16 Aug 2024 10:33:04 -0400 From: Peter Xu To: Kefeng Wang Cc: Jason Gunthorpe , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Sean Christopherson , Oscar Salvador , Axel Rasmussen , linux-arm-kernel@lists.infradead.org, x86@kernel.org, Will Deacon , Gavin Shan , Paolo Bonzini , Zi Yan , Andrew Morton , Catalin Marinas , Ingo Molnar , Alistair Popple , Borislav Petkov , David Hildenbrand , Thomas Gleixner , kvm@vger.kernel.org, Dave Hansen , Alex Williamson , Yan Zhao Subject: Re: [PATCH 00/19] mm: Support huge pfnmaps Message-ID: References: <20240809160909.1023470-1-peterx@redhat.com> <20240814123715.GB2032816@nvidia.com> <1147332f-790e-487f-8816-1860b8744ab2@huawei.com> MIME-Version: 1.0 In-Reply-To: <1147332f-790e-487f-8816-1860b8744ab2@huawei.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Stat-Signature: imzdfhrd5q7dn9yd7xka7fbwwbi9em5p X-Rspamd-Queue-Id: 6DF0D140016 X-Rspamd-Server: rspam11 X-HE-Tag: 1723818790-735195 X-HE-Meta: U2FsdGVkX18GA1k/kzUtuRJ02T0ZyeW6ixSzHNDwJE3AtRmIikagDYRCWVe5RHQ99LVRCldgbRhXDTqn5uhXZGrWUoYweCc/1Z2R2KvP14fwKuFzWgHsUfNJjH22EyaFluZ7Yj6MP1ZNaWyrqgcREv7R4Rg0uzjf0y5WMWaLdHPIcUt1iI78ocafhDnx1L3IVKUOLWCc58NtlrQhgIHPdgQj3VhFUWDvGNSHubIbbE0JULvL6qCysk+XHn0cjmLErSMNfZX7o9J9iA2Mi8j/jXXiZRHtDRtzkbECDCxrV/K91rdVwlyBA3B9BAHvv3qXdbuCa0/YA0cUoFzY6D8bOiBNLrT89YHPuGrs0nOSUkBHFftUDGfYU75XXHSUawmmf/Kzy2G4cP7cMGzFiblt4bvFhiTeo6dWdr3ueot1BvmtWntlEbFbS9l/sNIdK62R4WaQUC5LzrdSJiTqhvCH+JAAI2hW4EhyVqdzkYfBnpzxbOzziV8UH1sVoyrsj9D3agAnIpL+dx/vUAhPAmPllDgYDebx9bjE6G7YDTH9Crk4O+6s4iIDgp+6wdUe2K8HufwgCnO2DH8xblVQvwc0S7sb6GLjIFCTN0qWifE/4KpqQJ7s5wAJOjEt3S/60oo66RC7gBKU3Jf0BQLsirbYCTj+DarJt4MNI/GbikKYQd4t8DV0fP5673b9Qac8S4KG6oRXfMrIt+ghktDqcKlsptdKKLwVodxUpJDj6bSjfe5eCXFBnXr+GI4zRsCobvWkcdG8xcGAjv/cKaLNtL6Og0TQpmMbKKotg03YO0GSmxuffgZKir/LJiwW0ErGFVJM5KHtpuT7bqDe/2sNrzGsmaGjihqWRTXiLrjNBjx4KuZXKDPKqNZnyi4tLrnKeixUXrIwHLikYjEWR4/0V03gGNS+rx0UhoFMKXFh/05Zarwa8/D/8q4uCrOJW4ylUJgWQJw7BQOWT/44bva0eSY OeVqul7S V2r63tyLlK2a0Kc6Ll4WQs+kgwyUHUKMcZAsSKv0PPCcr/Ok4Jkma/j48XPSeo9DK8ZxdzHn6Q09UVjZxO8WQC3VCLmSqSNv3CmPHlmoSjtmzG2fr0/n1SIFUc9eHt6Z31mI8ZTbHRyXUumzucJA4iFXqvKtKbcWEOBsxxbmmKINVEilLU5wMcCGtfe0f7/XA1IwOGFfgiSq6Gv1/JgDkoSZ7KU/XVL5LqIQqIPGQ8zYjFrHEdEZG5ioBWyPMJCxMGLPptUehMRUfz1YRLNTO2p34k0314kbqVv0YFqpfqrY/yddsHsTgXCnB4AmQZbbIy7PN7J1meYFTXo4vzSzc+5cBcqoPhYqgvxZ10XIeBcog6YKHy7eIF/87/nQd9c15k64FPZ8CAfmuxEekN6jMl2X+S5ag/ziRRLJ5VW6jazfsTOLnORhbZGI60A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Aug 16, 2024 at 11:05:33AM +0800, Kefeng Wang wrote: > > > On 2024/8/16 3:20, Peter Xu wrote: > > On Wed, Aug 14, 2024 at 09:37:15AM -0300, Jason Gunthorpe wrote: > > > > Currently, only x86_64 (1G+2M) and arm64 (2M) are supported. > > > > > > There is definitely interest here in extending ARM to support the 1G > > > size too, what is missing? > > > > Currently PUD pfnmap relies on THP_PUD config option: > > > > config ARCH_SUPPORTS_PUD_PFNMAP > > def_bool y > > depends on ARCH_SUPPORTS_HUGE_PFNMAP && HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > > > > Arm64 unfortunately doesn't yet support dax 1G, so not applicable yet. > > > > Ideally, pfnmap is too simple comparing to real THPs and it shouldn't > > require to depend on THP at all, but we'll need things like below to land > > first: > > > > https://lore.kernel.org/r/20240717220219.3743374-1-peterx@redhat.com > > > > I sent that first a while ago, but I didn't collect enough inputs, and I > > decided to unblock this series from that, so x86_64 shouldn't be affected, > > and arm64 will at least start to have 2M. > > > > > > > > > The other trick is how to allow gup-fast working for such huge mappings > > > > even if there's no direct sign of knowing whether it's a normal page or > > > > MMIO mapping. This series chose to keep the pte_special solution, so that > > > > it reuses similar idea on setting a special bit to pfnmap PMDs/PUDs so that > > > > gup-fast will be able to identify them and fail properly. > > > > > > Make sense > > > > > > > More architectures / More page sizes > > > > ------------------------------------ > > > > > > > > Currently only x86_64 (2M+1G) and arm64 (2M) are supported. > > > > > > > > For example, if arm64 can start to support THP_PUD one day, the huge pfnmap > > > > on 1G will be automatically enabled. > > A draft patch to enable THP_PUD on arm64, only passed with DEBUG_VM_PGTABLE, > we may test pud pfnmaps on arm64. Thanks, Kefeng. It'll be great if this works already, as simple. Might be interesting to know whether it works already if you have some few-GBs GPU around on the systems. Logically as long as you have HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD selected below, 1g pfnmap will be automatically enabled when you rebuild the kernel. You can double check that by looking for this: CONFIG_ARCH_SUPPORTS_PUD_PFNMAP=y And you can try to observe the mappings by enabling dynamic debug for vfio_pci_mmap_huge_fault(), then map the bar with vfio-pci and read something from it. > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2f8ff354ca6..ff0d27c72020 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -184,6 +184,7 @@ config ARM64 > select HAVE_ARCH_THREAD_STRUCT_WHITELIST > select HAVE_ARCH_TRACEHOOK > select HAVE_ARCH_TRANSPARENT_HUGEPAGE > + select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if PGTABLE_LEVELS > 2 > select HAVE_ARCH_VMAP_STACK > select HAVE_ARM_SMCCC > select HAVE_ASM_MODVERSIONS > diff --git a/arch/arm64/include/asm/pgtable.h > b/arch/arm64/include/asm/pgtable.h > index 7a4f5604be3f..e013fe458476 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -763,6 +763,25 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) > #define pud_valid(pud) pte_valid(pud_pte(pud)) > #define pud_user(pud) pte_user(pud_pte(pud)) > #define pud_user_exec(pud) pte_user_exec(pud_pte(pud)) > +#define pud_dirty(pud) pte_dirty(pud_pte(pud)) > +#define pud_devmap(pud) pte_devmap(pud_pte(pud)) > +#define pud_wrprotect(pud) pte_pud(pte_wrprotect(pud_pte(pud))) > +#define pud_mkold(pud) pte_pud(pte_mkold(pud_pte(pud))) > +#define pud_mkwrite(pud) pte_pud(pte_mkwrite_novma(pud_pte(pud))) > +#define pud_mkclean(pud) pte_pud(pte_mkclean(pud_pte(pud))) > +#define pud_mkdirty(pud) pte_pud(pte_mkdirty(pud_pte(pud))) > + > +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > +static inline int pud_trans_huge(pud_t pud) > +{ > + return pud_val(pud) && pud_present(pud) && !(pud_val(pud) & > PUD_TABLE_BIT); > +} > + > +static inline pud_t pud_mkdevmap(pud_t pud) > +{ > + return pte_pud(set_pte_bit(pud_pte(pud), __pgprot(PTE_DEVMAP))); > +} > +#endif > > static inline bool pgtable_l4_enabled(void); > > @@ -1137,10 +1156,20 @@ static inline int pmdp_set_access_flags(struct > vm_area_struct *vma, > pmd_pte(entry), dirty); > } > > +static inline int pudp_set_access_flags(struct vm_area_struct *vma, > + unsigned long address, pud_t *pudp, > + pud_t entry, int dirty) > +{ > + return __ptep_set_access_flags(vma, address, (pte_t *)pudp, > + pud_pte(entry), dirty); > +} > + > +#ifndef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD > static inline int pud_devmap(pud_t pud) > { > return 0; > } > +#endif > > static inline int pgd_devmap(pgd_t pgd) > { > @@ -1213,6 +1242,13 @@ static inline int pmdp_test_and_clear_young(struct > vm_area_struct *vma, > { > return __ptep_test_and_clear_young(vma, address, (pte_t *)pmdp); > } > + > +static inline int pudp_test_and_clear_young(struct vm_area_struct *vma, > + unsigned long address, > + pud_t *pudp) > +{ > + return __ptep_test_and_clear_young(vma, address, (pte_t *)pudp); > +} > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ > > static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, > @@ -1433,6 +1469,7 @@ static inline void update_mmu_cache_range(struct > vm_fault *vmf, > #define update_mmu_cache(vma, addr, ptep) \ > update_mmu_cache_range(NULL, vma, addr, ptep, 1) > #define update_mmu_cache_pmd(vma, address, pmd) do { } while (0) > +#define update_mmu_cache_pud(vma, address, pud) do { } while (0) > > #ifdef CONFIG_ARM64_PA_BITS_52 > #define phys_to_ttbr(addr) (((addr) | ((addr) >> 46)) & TTBR_BADDR_MASK_52) > -- > 2.27.0 -- Peter Xu