From: David Hildenbrand <david@redhat.com>
To: Lu Baolu <baolu.lu@linux.intel.com>,
Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>,
Robin Murphy <robin.murphy@arm.com>,
Kevin Tian <kevin.tian@intel.com>,
Jason Gunthorpe <jgg@nvidia.com>, Jann Horn <jannh@google.com>,
Vasant Hegde <vasant.hegde@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@intel.com>,
Alistair Popple <apopple@nvidia.com>,
Peter Zijlstra <peterz@infradead.org>,
Uladzislau Rezki <urezki@gmail.com>,
Jean-Philippe Brucker <jean-philippe@linaro.org>,
Andy Lutomirski <luto@kernel.org>, Yi Lai <yi1.lai@intel.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R . Howlett" <Liam.Howlett@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Michal Hocko <mhocko@kernel.org>,
Matthew Wilcox <willy@infradead.org>,
Vinicius Costa Gomes <vinicius.gomes@intel.com>
Cc: iommu@lists.linux.dev, security@kernel.org, x86@kernel.org,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [PATCH v7 7/8] mm: Introduce deferred freeing for kernel page tables
Date: Wed, 22 Oct 2025 20:34:53 +0200 [thread overview]
Message-ID: <dabf557c-d83b-4edb-8cf3-1ab8581e5406@redhat.com> (raw)
In-Reply-To: <20251022082635.2462433-8-baolu.lu@linux.intel.com>
On 22.10.25 10:26, Lu Baolu wrote:
> From: Dave Hansen <dave.hansen@linux.intel.com>
>
> This introduces a conditional asynchronous mechanism, enabled by
> CONFIG_ASYNC_KERNEL_PGTABLE_FREE. When enabled, this mechanism defers the
> freeing of pages that are used as page tables for kernel address mappings.
> These pages are now queued to a work struct instead of being freed
> immediately.
>
> This deferred freeing allows for batch-freeing of page tables, providing
> a safe context for performing a single expensive operation (TLB flush)
> for a batch of kernel page tables instead of performing that expensive
> operation for each page table.
>
> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
> Reviewed-by: Kevin Tian <kevin.tian@intel.com>
> ---
> mm/Kconfig | 3 +++
> include/linux/mm.h | 16 +++++++++++++---
> mm/pgtable-generic.c | 37 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 53 insertions(+), 3 deletions(-)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 0e26f4fc8717..a83df9934acd 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -908,6 +908,9 @@ config PAGE_MAPCOUNT
> config PGTABLE_HAS_HUGE_LEAVES
> def_bool TRANSPARENT_HUGEPAGE || HUGETLB_PAGE
>
> +config ASYNC_KERNEL_PGTABLE_FREE
> + def_bool n
> +
> # TODO: Allow to be enabled without THP
> config ARCH_SUPPORTS_HUGE_PFNMAP
> def_bool n
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 52ae551d0eb4..d521abd33164 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3031,6 +3031,14 @@ static inline void __pagetable_free(struct ptdesc *pt)
> __free_pages(page, compound_order(page));
> }
>
> +#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
> +void pagetable_free_kernel(struct ptdesc *pt);
> +#else
> +static inline void pagetable_free_kernel(struct ptdesc *pt)
> +{
> + __pagetable_free(pt);
> +}
> +#endif
> /**
> * pagetable_free - Free pagetables
> * @pt: The page table descriptor
> @@ -3040,10 +3048,12 @@ static inline void __pagetable_free(struct ptdesc *pt)
> */
> static inline void pagetable_free(struct ptdesc *pt)
> {
> - if (ptdesc_test_kernel(pt))
> + if (ptdesc_test_kernel(pt)) {
> ptdesc_clear_kernel(pt);
> -
> - __pagetable_free(pt);
> + pagetable_free_kernel(pt);
> + } else {
> + __pagetable_free(pt);
> + }
> }
>
> #if defined(CONFIG_SPLIT_PTE_PTLOCKS)
> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
> index 567e2d084071..1c7caa8ef164 100644
> --- a/mm/pgtable-generic.c
> +++ b/mm/pgtable-generic.c
> @@ -406,3 +406,40 @@ pte_t *__pte_offset_map_lock(struct mm_struct *mm, pmd_t *pmd,
> pte_unmap_unlock(pte, ptl);
> goto again;
> }
> +
> +#ifdef CONFIG_ASYNC_KERNEL_PGTABLE_FREE
> +static void kernel_pgtable_work_func(struct work_struct *work);
> +
> +static struct {
> + struct list_head list;
> + /* protect above ptdesc lists */
> + spinlock_t lock;
> + struct work_struct work;
> +} kernel_pgtable_work = {
> + .list = LIST_HEAD_INIT(kernel_pgtable_work.list),
> + .lock = __SPIN_LOCK_UNLOCKED(kernel_pgtable_work.lock),
> + .work = __WORK_INITIALIZER(kernel_pgtable_work.work, kernel_pgtable_work_func),
> +};
> +
> +static void kernel_pgtable_work_func(struct work_struct *work)
> +{
> + struct ptdesc *pt, *next;
> + LIST_HEAD(page_list);
> +
> + spin_lock(&kernel_pgtable_work.lock);
> + list_splice_tail_init(&kernel_pgtable_work.list, &page_list);
> + spin_unlock(&kernel_pgtable_work.lock);
> +
> + list_for_each_entry_safe(pt, next, &page_list, pt_list)
> + __pagetable_free(pt);
> +}
> +
> +void pagetable_free_kernel(struct ptdesc *pt)
> +{
> + spin_lock(&kernel_pgtable_work.lock);
> + list_add(&pt->pt_list, &kernel_pgtable_work.list);
> + spin_unlock(&kernel_pgtable_work.lock);
> +
> + schedule_work(&kernel_pgtable_work.work);
> +}
> +#endif
Acked-by: David Hildenbrand <david@redhat.com>
I was briefly wondering whether the pages can get stuck in there
sufficiently long that we would want to wire up the shrinker to say
"OOM, hold your horses, we can still free something here".
But I'd assume the workqueue will get scheduled in a reasonable
timeframe either so this is not a concern?
--
Cheers
David / dhildenb
next prev parent reply other threads:[~2025-10-22 18:35 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-22 8:26 [PATCH v7 0/8] Fix stale IOTLB entries for kernel address space Lu Baolu
2025-10-22 8:26 ` [PATCH v7 1/8] iommu: Disable SVA when CONFIG_X86 is set Lu Baolu
2025-10-22 19:50 ` Jason Gunthorpe
2025-10-22 8:26 ` [PATCH v7 2/8] mm: Add a ptdesc flag to mark kernel page tables Lu Baolu
2025-10-22 18:31 ` David Hildenbrand
2025-10-23 7:07 ` Mike Rapoport
2025-10-22 8:26 ` [PATCH v7 3/8] mm: Actually mark kernel page table pages Lu Baolu
2025-10-22 8:26 ` [PATCH v7 4/8] x86/mm: Use 'ptdesc' when freeing PMD pages Lu Baolu
2025-10-22 18:31 ` David Hildenbrand
2025-10-22 8:26 ` [PATCH v7 5/8] mm: Introduce pure page table freeing function Lu Baolu
2025-10-22 8:26 ` [PATCH v7 6/8] x86/mm: Use pagetable_free() Lu Baolu
2025-11-18 2:14 ` Vishal Moola (Oracle)
2025-11-20 10:35 ` Mike Rapoport
2025-10-22 8:26 ` [PATCH v7 7/8] mm: Introduce deferred freeing for kernel page tables Lu Baolu
2025-10-22 18:34 ` David Hildenbrand [this message]
2025-10-22 19:12 ` Dave Hansen
2025-10-22 19:52 ` Jason Gunthorpe
2025-10-23 7:10 ` Mike Rapoport
2025-10-22 8:26 ` [PATCH v7 8/8] iommu/sva: Invalidate stale IOTLB entries for kernel address space Lu Baolu
2025-10-22 19:01 ` [PATCH v7 0/8] Fix " Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=dabf557c-d83b-4edb-8cf3-1ab8581e5406@redhat.com \
--to=david@redhat.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=baolu.lu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=iommu@lists.linux.dev \
--cc=jannh@google.com \
--cc=jean-philippe@linaro.org \
--cc=jgg@nvidia.com \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=mhocko@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=rppt@kernel.org \
--cc=security@kernel.org \
--cc=tglx@linutronix.de \
--cc=urezki@gmail.com \
--cc=vasant.hegde@amd.com \
--cc=vbabka@suse.cz \
--cc=vinicius.gomes@intel.com \
--cc=will@kernel.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=yi1.lai@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox