linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Cc: Gavin Shan <gshan@redhat.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, Ingo Molnar <mingo@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Alistair Popple <apopple@nvidia.com>,
	kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	Sean Christopherson <seanjc@google.com>,
	Oscar Salvador <osalvador@suse.de>,
	Jason Gunthorpe <jgg@nvidia.com>, Borislav Petkov <bp@alien8.de>,
	Zi Yan <ziy@nvidia.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Yan Zhao <yan.y.zhao@intel.com>, Will Deacon <will@kernel.org>,
	Kefeng Wang <wangkefeng.wang@huawei.com>,
	Alex Williamson <alex.williamson@redhat.com>
Subject: Re: [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start()
Date: Wed, 28 Aug 2024 09:44:04 +0200	[thread overview]
Message-ID: <9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com> (raw)
In-Reply-To: <20240826204353.2228736-7-peterx@redhat.com>

On 26.08.24 22:43, Peter Xu wrote:
> Teach folio_walk_start() to recognize special pmd/pud mappings, and fail
> them properly as it means there's no folio backing them.
> 
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>   mm/pagewalk.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/pagewalk.c b/mm/pagewalk.c
> index cd79fb3b89e5..12be5222d70e 100644
> --- a/mm/pagewalk.c
> +++ b/mm/pagewalk.c
> @@ -753,7 +753,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
>   		fw->pudp = pudp;
>   		fw->pud = pud;
>   
> -		if (!pud_present(pud) || pud_devmap(pud)) {
> +		if (!pud_present(pud) || pud_devmap(pud) || pud_special(pud)) {
>   			spin_unlock(ptl);
>   			goto not_found;
>   		} else if (!pud_leaf(pud)) {
> @@ -783,7 +783,7 @@ struct folio *folio_walk_start(struct folio_walk *fw,
>   		fw->pmdp = pmdp;
>   		fw->pmd = pmd;
>   
> -		if (pmd_none(pmd)) {
> +		if (pmd_none(pmd) || pmd_special(pmd)) {
>   			spin_unlock(ptl);
>   			goto not_found;
>   		} else if (!pmd_leaf(pmd)) {

As raised, this is not the right way to to it. You should follow what
CONFIG_ARCH_HAS_PTE_SPECIAL and vm_normal_page() does.

It's even spelled out in vm_normal_page_pmd() that at the time it was
introduced there was no pmd_special(), so there was no way to handle that.



diff --git a/mm/memory.c b/mm/memory.c
index f0cf5d02b4740..272445e9db147 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -672,15 +672,29 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
  {
         unsigned long pfn = pmd_pfn(pmd);
  
-       /*
-        * There is no pmd_special() but there may be special pmds, e.g.
-        * in a direct-access (dax) mapping, so let's just replicate the
-        * !CONFIG_ARCH_HAS_PTE_SPECIAL case from vm_normal_page() here.
-        */
+       if (IS_ENABLED(CONFIG_ARCH_HAS_PMD_SPECIAL)) {
+               if (likely(!pmd_special(pmd)))
+                       goto check_pfn;
+               if (vma->vm_ops && vma->vm_ops->find_special_page)
+                       return vma->vm_ops->find_special_page(vma, addr);
+               if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
+                       return NULL;
+               if (is_huge_zero_pmd(pmd))
+                       return NULL;
+               if (pmd_devmap(pmd))
+                       /* See vm_normal_page() */
+                       return NULL;
+               return NULL;
+       }
+
+       /* !CONFIG_ARCH_HAS_PMD_SPECIAL case follows: */
+
         if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) {
                 if (vma->vm_flags & VM_MIXEDMAP) {
                         if (!pfn_valid(pfn))
                                 return NULL;
+                       if (is_huge_zero_pmd(pmd))
+                               return NULL;
                         goto out;
                 } else {
                         unsigned long off;
@@ -692,6 +706,11 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
                 }
         }
  
+       /*
+        * For historical reasons, these might not have pmd_special() set,
+        * so we'll check them manually, in contrast to vm_normal_page().
+        */
+check_pfn:
         if (pmd_devmap(pmd))
                 return NULL;
         if (is_huge_zero_pmd(pmd))



We should then look into mapping huge zeropages also with pmd_special.
pmd_devmap we'll leave alone until removed. But that's indeoendent of your series.

I wonder if CONFIG_ARCH_HAS_PTE_SPECIAL is sufficient and we don't need additional
CONFIG_ARCH_HAS_PMD_SPECIAL.

As I said, if you need someone to add vm_normal_page_pud(), I can handle that.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-08-28  7:44 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-26 20:43 [PATCH v2 00/19] mm: Support huge pfnmaps Peter Xu
2024-08-26 20:43 ` [PATCH v2 01/19] mm: Introduce ARCH_SUPPORTS_HUGE_PFNMAP and special bits to pmd/pud Peter Xu
2024-08-26 20:43 ` [PATCH v2 02/19] mm: Drop is_huge_zero_pud() Peter Xu
2024-08-26 20:43 ` [PATCH v2 03/19] mm: Mark special bits for huge pfn mappings when inject Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 04/19] mm: Allow THP orders for PFNMAPs Peter Xu
2024-08-28 15:31   ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 05/19] mm/gup: Detect huge pfnmap entries in gup-fast Peter Xu
2024-08-26 20:43 ` [PATCH v2 06/19] mm/pagewalk: Check pfnmap for folio_walk_start() Peter Xu
2024-08-28  7:44   ` David Hildenbrand [this message]
2024-08-28 14:24     ` Peter Xu
2024-08-28 15:30       ` David Hildenbrand
2024-08-28 19:45         ` Peter Xu
2024-08-28 23:46           ` Jason Gunthorpe
2024-08-29  6:35             ` David Hildenbrand
2024-08-29 18:45               ` Peter Xu
2024-08-29 15:10           ` David Hildenbrand
2024-08-29 18:49             ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 07/19] mm/fork: Accept huge pfnmap entries Peter Xu
2024-08-29 15:10   ` David Hildenbrand
2024-08-29 18:26     ` Peter Xu
2024-08-29 19:44       ` David Hildenbrand
2024-08-29 20:01         ` Peter Xu
2024-09-02  7:58   ` Yan Zhao
2024-09-03 21:23     ` Peter Xu
2024-09-09 22:25       ` Andrew Morton
2024-09-09 22:43         ` Peter Xu
2024-09-09 23:15           ` Andrew Morton
2024-09-10  0:08             ` Peter Xu
2024-09-10  2:52               ` Yan Zhao
2024-09-10 12:16                 ` Peter Xu
2024-09-11  2:16                   ` Yan Zhao
2024-09-11 14:34                     ` Peter Xu
2024-08-26 20:43 ` [PATCH v2 08/19] mm: Always define pxx_pgprot() Peter Xu
2024-08-26 20:43 ` [PATCH v2 09/19] mm: New follow_pfnmap API Peter Xu
2024-08-26 20:43 ` [PATCH v2 10/19] KVM: Use " Peter Xu
2024-08-26 20:43 ` [PATCH v2 11/19] s390/pci_mmio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 12/19] mm/x86/pat: Use the new " Peter Xu
2024-08-26 20:43 ` [PATCH v2 13/19] vfio: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 14/19] acrn: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 15/19] mm/access_process_vm: " Peter Xu
2024-08-26 20:43 ` [PATCH v2 16/19] mm: Remove follow_pte() Peter Xu
2024-09-01  4:33   ` Yu Zhao
2024-09-01 13:39     ` David Hildenbrand
2024-08-26 20:43 ` [PATCH v2 17/19] mm/x86: Support large pfn mappings Peter Xu
2024-08-26 20:43 ` [PATCH v2 18/19] mm/arm64: " Peter Xu
2025-03-19 22:22   ` Keith Busch
2025-03-19 22:46     ` Peter Xu
2025-03-19 22:53       ` Keith Busch
2024-08-26 20:43 ` [PATCH v2 19/19] vfio/pci: Implement huge_fault support Peter Xu
2024-08-27 22:36 ` [PATCH v2 00/19] mm: Support huge pfnmaps Jiaqi Yan
2024-08-27 22:57   ` Peter Xu
2024-08-28  0:42     ` Jiaqi Yan
2024-08-28  0:46       ` Jiaqi Yan
2024-08-28 14:24       ` Jason Gunthorpe
2024-08-28 16:10         ` Jiaqi Yan
2024-08-28 23:49           ` Jason Gunthorpe
2024-08-29 19:21             ` Jiaqi Yan
2024-09-04 15:52               ` Jason Gunthorpe
2024-09-04 16:38                 ` Jiaqi Yan
2024-09-04 16:43                   ` Jason Gunthorpe
2024-09-04 16:58                     ` Jiaqi Yan
2024-09-04 17:00                       ` Jason Gunthorpe
2024-09-04 17:07                         ` Jiaqi Yan
2024-09-09  3:56                           ` Ankit Agrawal
2024-08-28 14:41       ` Peter Xu
2024-08-28 16:23         ` Jiaqi Yan
2024-09-09  4:03 ` Ankit Agrawal
2024-09-09 15:03   ` Peter Xu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9f9d7e96-b135-4830-b528-37418ae7bbfd@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=gshan@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox