From: Mike Rapoport <rppt@kernel.org>
To: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Jason Gunthorpe <jgg@nvidia.com>, Jann Horn <jannh@google.com>,
Vasant Hegde <vasant.hegde@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@intel.com>,
Alistair Popple <apopple@nvidia.com>,
Peter Zijlstra <peterz@infradead.org>,
Uladzislau Rezki <urezki@gmail.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Andy Lutomirski <luto@kernel.org>, Yi Lai <yi1.lai@intel.com>,
David Hildenbrand <david@redhat.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>,
Michal Hocko <mhocko@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Vinicius Costa Gomes <vinicius.gomes@intel.com>,
iommu@lists.linux.dev, security@kernel.org, x86@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v7 2/8] mm: Add a ptdesc flag to mark kernel page tables
Date: Thu, 23 Oct 2025 10:07:57 +0300 [thread overview]
Message-ID: <aPnUTfMXD7qReWUl@kernel.org> (raw)
In-Reply-To: <20251022082635.2462433-3-baolu.lu@linux.intel.com>
On Wed, Oct 22, 2025 at 04:26:28PM +0800, Lu Baolu wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> The page tables used to map the kernel and userspace often have very
> different handling rules. There are frequently *_kernel() variants of
> functions just for kernel page tables. That's not great and has lead
> to code duplication.
>
> Instead of having completely separate call paths, allow a 'ptdesc' to
> be marked as being for kernel mappings. Introduce helpers to set and
> clear this status.
>
> Note: this uses the PG_referenced bit. Page flags are a great fit for
> this since it is truly a single bit of information. Use PG_referenced
> itself because it's a fairly benign flag (as opposed to things like
> PG_lock). It's also (according to Willy) unlikely to go away any time
> soon.
>
> PG_referenced is not in PAGE_FLAGS_CHECK_AT_FREE. It does not need to
> be cleared before freeing the page, and pages coming out of the
> allocator should have it cleared. Regardless, introduce an API to
> clear it anyway. Having symmetry in the API makes it easier to change
> the underlying implementation later, like if there was a need to move
> to a PAGE_FLAGS_CHECK_AT_FREE bit.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
> ---
> include/linux/mm.h | 41 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 41 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d16b33bacc32..354d7925bf77 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2940,6 +2940,7 @@ static inline pmd_t *pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long a
> #endif /* CONFIG_MMU */
>
> enum pt_flags {
> + PT_kernel = PG_referenced,
> PT_reserved = PG_reserved,
> /* High bits are used for zone/node/section */
> };
> @@ -2965,6 +2966,46 @@ static inline bool pagetable_is_reserved(struct ptdesc *pt)
> return test_bit(PT_reserved, &pt->pt_flags.f);
> }
>
> +/**
> + * ptdesc_set_kernel - Mark a ptdesc used to map the kernel
> + * @ptdesc: The ptdesc to be marked
> + *
> + * Kernel page tables often need special handling. Set a flag so that
> + * the handling code knows this ptdesc will not be used for userspace.
> + */
> +static inline void ptdesc_set_kernel(struct ptdesc *ptdesc)
> +{
> + set_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> +/**
> + * ptdesc_clear_kernel - Mark a ptdesc as no longer used to map the kernel
> + * @ptdesc: The ptdesc to be unmarked
> + *
> + * Use when the ptdesc is no longer used to map the kernel and no longer
> + * needs special handling.
> + */
> +static inline void ptdesc_clear_kernel(struct ptdesc *ptdesc)
> +{
> + /*
> + * Note: the 'PG_referenced' bit does not strictly need to be
> + * cleared before freeing the page. But this is nice for
> + * symmetry.
> + */
> + clear_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> +/**
> + * ptdesc_test_kernel - Check if a ptdesc is used to map the kernel
> + * @ptdesc: The ptdesc being tested
> + *
> + * Call to tell if the ptdesc used to map the kernel.
> + */
> +static inline bool ptdesc_test_kernel(const struct ptdesc *ptdesc)
> +{
> + return test_bit(PT_kernel, &ptdesc->pt_flags.f);
> +}
> +
> /**
> * pagetable_alloc - Allocate pagetables
> * @gfp: GFP flags
> --
> 2.43.0
>
--
Sincerely yours,
Mike.
next prev parent reply other threads:[~2025-10-23 7:08 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 8:26 [PATCH v7 0/8] Fix stale IOTLB entries for kernel address space Lu Baolu
2025-10-22 8:26 ` [PATCH v7 1/8] iommu: Disable SVA when CONFIG_X86 is set Lu Baolu
2025-10-22 19:50 ` Jason Gunthorpe
2025-10-22 8:26 ` [PATCH v7 2/8] mm: Add a ptdesc flag to mark kernel page tables Lu Baolu
2025-10-22 18:31 ` David Hildenbrand
2025-10-23 7:07 ` Mike Rapoport [this message]
2025-10-22 8:26 ` [PATCH v7 3/8] mm: Actually mark kernel page table pages Lu Baolu
2025-10-22 8:26 ` [PATCH v7 4/8] x86/mm: Use 'ptdesc' when freeing PMD pages Lu Baolu
2025-10-22 18:31 ` David Hildenbrand
2025-10-22 8:26 ` [PATCH v7 5/8] mm: Introduce pure page table freeing function Lu Baolu
2025-10-22 8:26 ` [PATCH v7 6/8] x86/mm: Use pagetable_free() Lu Baolu
2025-11-18 2:14 ` Vishal Moola (Oracle)
2025-11-20 10:35 ` Mike Rapoport
2025-10-22 8:26 ` [PATCH v7 7/8] mm: Introduce deferred freeing for kernel page tables Lu Baolu
2025-10-22 18:34 ` David Hildenbrand
2025-10-22 19:12 ` Dave Hansen
2025-10-22 19:52 ` Jason Gunthorpe
2025-10-23 7:10 ` Mike Rapoport
2025-10-22 8:26 ` [PATCH v7 8/8] iommu/sva: Invalidate stale IOTLB entries for kernel address space Lu Baolu
2025-10-22 19:01 ` [PATCH v7 0/8] Fix " Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aPnUTfMXD7qReWUl@kernel.org \
--to=rppt@kernel.org \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=iommu@lists.linux.dev \
--cc=jannh@google.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=security@kernel.org \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vasant.hegde@amd.com \
--cc=vbabka@suse.cz \
--cc=vinicius.gomes@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox