linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Yin Tirui <yintirui@huawei.com>
Cc: "Jürgen Groß" <jgross@suse.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org,
	linux-arm-kernel@lists.infradead.org, david@kernel.org,
	catalin.marinas@arm.com, will@kernel.org, tglx@kernel.org,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	hpa@zytor.com, luto@kernel.org, peterz@infradead.org,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	ziy@nvidia.com, baolin.wang@linux.alibaba.com,
	Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com,
	dev.jain@arm.com, baohua@kernel.org, lance.yang@linux.dev,
	vbabka@suse.cz, rppt@kernel.org, surenb@google.com,
	mhocko@suse.com, anshuman.khandual@arm.com,
	rmclure@linux.ibm.com, kevin.brodsky@arm.com, apopple@nvidia.com,
	ajd@linux.ibm.com, pasha.tatashin@soleen.com, bhe@redhat.com,
	thuth@redhat.com, coxu@redhat.com, dan.j.williams@intel.com,
	yu-cheng.yu@intel.com, baolu.lu@linux.intel.com,
	conor.dooley@microchip.com, Jonathan.Cameron@huawei.com,
	riel@surriel.com, "Kefeng Wang" <wangkefeng.wang@huawei.com>,
	chenjun102@huawei.com
Subject: Re: [PATCH RFC v3 2/4] mm/pgtable: Make pfn_pte() filter out huge page attributes
Date: Fri, 6 Mar 2026 04:25:40 +0000	[thread overview]
Message-ID: <aapXRN4KjWtUUJ0g@casper.infradead.org> (raw)
In-Reply-To: <e8d5ba16-3071-4b4a-b6a1-d492cc46e8c2@huawei.com>

On Thu, Mar 05, 2026 at 05:38:46PM +0800, Yin Tirui wrote:
> On 3/4/2026 3:52 PM, Jürgen Groß wrote:
> > Today it can either be used for a large page (which should be a pmd,
> > of course), or - much worse - you'd strip the _PAGE_PAT bit, which is
> > at the same position in PTEs.
> > 
> > So basically you are removing the ability to use some cache modes.
> > 
> > NACK!
> > 
> > 
> > Juergen
> 
> Hi Willy and Jürgen,
> 
> Following up on the x86 _PAGE_PSE and _PAGE_PAT aliasing issue.
> 
> To achieve the goal of keeping pfn_pte() pure and completely eradicating the
> pte_clrhuge() anti-pattern, we need a way to ensure pfn_pte() never receives
> a pgprot with the huge bit set.
> 
> @Jürgen:
> Just to be absolutely certain: is there any safe way to filter out the huge
> page attributes directly inside x86's pfn_pte() without breaking PAT? Or
> does the hardware bit-aliasing make this strictly impossible at the
> pfn_pte() level?
> 
> @Willy @Jürgen:
> Assuming it is impossible to filter this safely inside pfn_pte() on x86, we
> must translate the pgprot before passing it down. To maintain strict
> type-safety and still drop pte_clrhuge(), I plan to introduce two
> arch-neutral wrappers:
> 
> x86:
> /* Translates large prot to 4K. Shifts PAT back to bit 7, inherently
> clearing _PAGE_PSE */
> #define pgprot_huge_to_pte(prot)    pgprot_large_2_4k(prot)
> /* Translates 4K prot to large. Shifts PAT to bit 12, strictly sets
> _PAGE_PSE */
> #define pgprot_pte_to_huge(prot)
> __pgprot(pgprot_val(pgprot_4k_2_large(prot)) | _PAGE_PSE)

I don't think we should have pgprot_large_2_4k().  Or rather, I think it
should be embedded in pmd_pgprot() / pud_pgprot().  That is, we should
have an 'ideal' pgprot which, on x86, perhaps matches that used by the
4k level.  pfn_pmd() should be converting from the ideal pgprot to
that actually used by PMDs (and setting _PAGE_PSE?)

> arm64:
> /*
>  * Drops Block marker, enforces Page marker.
>  * Strictly preserves the PTE_VALID bit to avoid validating PROT_NONE pages.
>  */
> #define pgprot_huge_to_pte(prot) \
>       __pgprot((pgprot_val(prot) & ~(PMD_TYPE_MASK & ~PTE_VALID)) | \
>              (PTE_TYPE_PAGE & ~PTE_VALID))
> /*
>  * Drops Page marker, sets Block marker.
>  * Strictly preserves the PTE_VALID bit.
>  */
> #define pgprot_pte_to_huge(prot) \
>       __pgprot((pgprot_val(prot) & ~(PTE_TYPE_MASK & ~PTE_VALID)) | \
>              (PMD_TYPE_SECT & ~PTE_VALID))
> 
> Usage:
> 1.  Creating a huge pfnmap (remap_try_huge_pmd)
> pgprot_t huge_prot = pgprot_pte_to_huge(prot);
> 
> /* No need for pmd_mkhuge() */
> pmd_t entry = pmd_mkspecial(pfn_pmd(pfn, huge_prot));
> set_pmd_at(mm, addr, pmd, entry);
> 
> 2. Splitting a huge pfnmap (__split_huge_pmd_locked)
> pgprot_t small_prot = pgprot_huge_to_pte(pmd_pgprot(old_pmd));
> 
> /* No need for pte_clrhuge() */
> pte_t entry = pfn_pte(pmd_pfn(old_pmd), small_prot);
> set_ptes(mm, haddr, pte, entry, HPAGE_PMD_NR);
> 
> 
> Willy, is there a better architectural approach to handle this and satisfy
> the type-safety requirement given the x86 hardware constraints?
> 
> -- 
> Thanks,
> Yin Tirui
> 
> 


  parent reply	other threads:[~2026-03-06  4:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  7:09 [PATCH RFC v3 0/4] mm: add huge pfnmap support for remap_pfn_range() Yin Tirui
2026-02-28  7:09 ` [PATCH RFC v3 1/4] x86/mm: Use proper page table helpers for huge page generation Yin Tirui
2026-02-28  7:09 ` [PATCH RFC v3 2/4] mm/pgtable: Make pfn_pte() filter out huge page attributes Yin Tirui
2026-03-04  7:52   ` Jürgen Groß
2026-03-04 10:08     ` Yin Tirui
2026-03-05  9:38     ` Yin Tirui
2026-03-05 10:05       ` Jürgen Groß
2026-03-06  4:25       ` Matthew Wilcox [this message]
2026-02-28  7:09 ` [PATCH RFC v3 3/4] x86/mm: Remove pte_clrhuge() and clean up init_64.c Yin Tirui
2026-02-28  7:09 ` [PATCH RFC v3 4/4] mm: add PMD-level huge page support for remap_pfn_range() Yin Tirui

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aapXRN4KjWtUUJ0g@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=ajd@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bhe@redhat.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=chenjun102@huawei.com \
    --cc=conor.dooley@microchip.com \
    --cc=coxu@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=dev.jain@arm.com \
    --cc=hpa@zytor.com \
    --cc=jgross@suse.com \
    --cc=kevin.brodsky@arm.com \
    --cc=lance.yang@linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=npache@redhat.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rmclure@linux.ibm.com \
    --cc=rppt@kernel.org \
    --cc=ryan.roberts@arm.com \
    --cc=surenb@google.com \
    --cc=tglx@kernel.org \
    --cc=thuth@redhat.com \
    --cc=vbabka@suse.cz \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yintirui@huawei.com \
    --cc=yu-cheng.yu@intel.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox