linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: Dennis Zhou <dennis@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	x86@kernel.org, Borislav Petkov <bp@alien8.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	Uladzislau Rezki <urezki@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Christoph Lameter <cl@gentwo.org>,
	David Hildenbrand <david@redhat.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	kasan-dev@googlegroups.com, Mike Rapoport <rppt@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-kernel@vger.kernel.org, Dmitry Vyukov <dvyukov@google.com>,
	Alexander Potapenko <glider@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Suren Baghdasaryan <surenb@google.com>,
	Thomas Huth <thuth@redhat.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Michal Hocko <mhocko@suse.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	linux-mm@kvack.org, "Kirill A. Shutemov" <kas@kernel.org>,
	Oscar Salvador <osalvador@suse.de>,
	Jane Chu <jane.chu@oracle.com>,
	Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>,
	"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
	Joerg Roedel <joro@8bytes.org>,
	Alistair Popple <apopple@nvidia.com>,
	Joao Martins <joao.m.martins@oracle.com>,
	linux-arch@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH V4 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel()
Date: Mon, 11 Aug 2025 13:18:12 +0100	[thread overview]
Message-ID: <c3ec3012-4ba0-4b7b-bf0a-88f39ef029d8@lucifer.local> (raw)
In-Reply-To: <aJneGJSJcltEIT41@hyeyoo>

On Mon, Aug 11, 2025 at 09:12:08PM +0900, Harry Yoo wrote:
> On Mon, Aug 11, 2025 at 12:38:37PM +0100, Lorenzo Stoakes wrote:
> > On Mon, Aug 11, 2025 at 02:34:19PM +0900, Harry Yoo wrote:
> > > Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
> > > populating PGD and P4D entries for the kernel address space.
> > > These helpers ensure proper synchronization of page tables when
> > > updating the kernel portion of top-level page tables.
> > >
> > > Until now, the kernel has relied on each architecture to handle
> > > synchronization of top-level page tables in an ad-hoc manner.
> > > For example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for
> > > direct mapping and vmemmap mapping changes").
> > >
> > > However, this approach has proven fragile for following reasons:
> > >
> > >   1) It is easy to forget to perform the necessary page table
> > >      synchronization when introducing new changes.
> > >      For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
> > >      savings for compound devmaps") overlooked the need to synchronize
> > >      page tables for the vmemmap area.
> > >
> > >   2) It is also easy to overlook that the vmemmap and direct mapping areas
> > >      must not be accessed before explicit page table synchronization.
> > >      For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
> > >      sub-pmd ranges")) caused crashes by accessing the vmemmap area
> > >      before calling sync_global_pgds().
> > >
> > > To address this, as suggested by Dave Hansen, introduce _kernel() variants
> > > of the page table population helpers, which invoke architecture-specific
> > > hooks to properly synchronize page tables. These are introduced in a new
> > > header file, include/linux/pgalloc.h, so they can be called from common code.
> > >
> > > They reuse existing infrastructure for vmalloc and ioremap.
> > > Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
> > > and the actual synchronization is performed by arch_sync_kernel_mappings().
> > >
> > > This change currently targets only x86_64, so only PGD and P4D level
>
> Hi Lorenzo, thanks for looking at this!
>
> > Well, arm defines ARCH_PAGE_TABLE_SYNC_MASK in arch/arm/include/asm/page.h. But
> > it aliases this to PGTBL_PMD_MODIFIED so will remain unaffected :)
>
> Oh, here I just intended to explain why I didn't implement
> {pud,pmd}_populate_kernel().

I'd add that arm handles PGTBL_PMD_MODIFIED and therefore remains unaffected
just to be super clear.

>
> > > helpers are introduced. In theory, PUD and PMD level helpers can be added
> > > later if needed by other architectures.
> > >
> > > Currently this is a no-op, since no architecture sets
> > > PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.
> > >
> > > Cc: <stable@vger.kernel.org>
> > > Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
> > > Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> > > Signed-off-by: Harry Yoo <harry.yoo@oracle.com>

Given that I missed you fixed the vmalloc.h thing, this LGTM so:

Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>

> > > ---
> > >  include/linux/pgalloc.h | 24 ++++++++++++++++++++++++
> > >  include/linux/pgtable.h |  4 ++--
> > >  mm/kasan/init.c         | 12 ++++++------
> > >  mm/percpu.c             |  6 +++---
> > >  mm/sparse-vmemmap.c     |  6 +++---
> > >  5 files changed, 38 insertions(+), 14 deletions(-)
> > >  create mode 100644 include/linux/pgalloc.h
> > >
> > > diff --git a/include/linux/pgalloc.h b/include/linux/pgalloc.h
> > > new file mode 100644
> > > index 000000000000..290ab864320f
> > > --- /dev/null
> > > +++ b/include/linux/pgalloc.h
> > > @@ -0,0 +1,24 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +#ifndef _LINUX_PGALLOC_H
> > > +#define _LINUX_PGALLOC_H
> > > +
> > > +#include <linux/pgtable.h>
> > > +#include <asm/pgalloc.h>
> > > +
> > > +static inline void pgd_populate_kernel(unsigned long addr, pgd_t *pgd,
> > > +				       p4d_t *p4d)
> > > +{
> > > +	pgd_populate(&init_mm, pgd, p4d);
> > > +	if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED)
> >
> > Hm, ARCH_PAGE_TABLE_SYNC_MASK is only defined for x86 2, 3 page level and arm. I see:
> >
> > #ifndef ARCH_PAGE_TABLE_SYNC_MASK
> > #define ARCH_PAGE_TABLE_SYNC_MASK 0
> > #endif
> >
> > In linux/vmalloc.h, but you're not importing that?
>
> Patch 1 moves it from linux/vmalloc.h to linux/pgtable.h,
> and linux/pgalloc.h includes linux/pgtable.h.
>
> > It sucks that that there is there, but maybe you need to #include
> > <linux/vmalloc.h> for this otherwise this could be broken on other arches?
> >
> > You may be getting lucky with nested header includes that causes this to be
> > picked up somewhere for you, or having it only declared for arches that define
> > it, but we should probably make this explicit.
>
> ...so I don't think I'm missing necessary header includes even on
> other architectures?
>
> > Also arch_sync_kernel_mappings() is defined in linux/vmalloc.h so seems
> > sensible.
>
> Also moved to linux/pgtable.h.

Ah yeah damn, I missed that you do that there, ok well that's fine then :)

>
> > > +		arch_sync_kernel_mappings(addr, addr);
> > > +}
> > > +
> > > +static inline void p4d_populate_kernel(unsigned long addr, p4d_t *p4d,
> > > +				       pud_t *pud)
> > > +{
> > > +	p4d_populate(&init_mm, p4d, pud);
> > > +	if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED)
> > > +		arch_sync_kernel_mappings(addr, addr);
> >
> > It's kind of weird we don't have this defined as a function for many arches,
>
> That's really a mystery :)
>
> I have no idea why other architectures don't handle this.
>
> (At least on 64 bit arches) In theory I think only a few architectures
> (like arm64 where a kernel page table is shared between tasks) don't have
> to implement this.
>
> Probably because it's a bit niche bug to hit?
> (vmemmap, direct mapping, vmalloc/vmap area can span multiple PGD ranges)
> AND (populating some PGD entries is done after boot process (e.g. memory
> hot-plug or vmalloc())).

No comment is more why we don't just do a standard:

#ifndef xxx
#define xxx (0)
#endif

Or something. Just odd.

>
> > (weird as well that we declare it in... vmalloc.h but I guess one for follow up
> > cleanups that).
> >
> > But I see from the comment:
> >
> > /*
> >  * There is no default implementation for arch_sync_kernel_mappings(). It is
> >  * relied upon the compiler to optimize calls out if ARCH_PAGE_TABLE_SYNC_MASK
> >  * is 0.
> >  */
> >
> > So this seems intended... :)
>
> > The rest of this seems sensible, nice cleanup!
>
> Thanks for looking at!
>
> --
> Cheers,
> Harry / Hyeonggon
>


  reply	other threads:[~2025-08-11 12:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-11  5:34 [PATCH V4 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss Harry Yoo
2025-08-11  5:34 ` [PATCH V4 mm-hotfixes 1/3] mm: move page table sync declarations to linux/pgtable.h Harry Yoo
2025-08-11  8:05   ` Mike Rapoport
2025-08-11  8:36     ` Harry Yoo
2025-08-11  8:52       ` Mike Rapoport
2025-08-11  9:19     ` Uladzislau Rezki
2025-08-11 11:21   ` Lorenzo Stoakes
2025-08-11  5:34 ` [PATCH V4 mm-hotfixes 2/3] mm: introduce and use {pgd,p4d}_populate_kernel() Harry Yoo
2025-08-11  8:10   ` Mike Rapoport
2025-08-11  9:10   ` Lorenzo Stoakes
2025-08-11 10:36     ` Harry Yoo
2025-08-11 11:18       ` Lorenzo Stoakes
2025-08-11 11:38   ` Lorenzo Stoakes
2025-08-11 12:12     ` Harry Yoo
2025-08-11 12:18       ` Lorenzo Stoakes [this message]
2025-08-12  9:53         ` Harry Yoo
2025-08-12 16:08           ` Lorenzo Stoakes
2025-08-25 11:27   ` Christophe Leroy
2025-08-25 16:02     ` Harry Yoo
2025-08-11  5:34 ` [PATCH V4 mm-hotfixes 3/3] x86/mm/64: define ARCH_PAGE_TABLE_SYNC_MASK and arch_sync_kernel_mappings() Harry Yoo
2025-08-11  8:13   ` Mike Rapoport
2025-08-11 11:46   ` Lorenzo Stoakes
2025-08-12  8:59     ` Harry Yoo
2025-08-12 16:36       ` Lorenzo Stoakes
2025-08-11  6:46 ` [PATCH V4 mm-hotfixes 0/3] mm, x86: fix crash due to missing page table sync and make it harder to miss Kiryl Shutsemau
2025-08-11  8:09   ` Harry Yoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c3ec3012-4ba0-4b7b-bf0a-88f39ef029d8@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=cl@gentwo.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=gwan-gyeong.mun@intel.com \
    --cc=harry.yoo@oracle.com \
    --cc=hpa@zytor.com \
    --cc=jane.chu@oracle.com \
    --cc=jhubbard@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=joro@8bytes.org \
    --cc=kas@kernel.org \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=osalvador@suse.de \
    --cc=peterz@infradead.org \
    --cc=rppt@kernel.org \
    --cc=ryabinin.a.a@gmail.com \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=thuth@redhat.com \
    --cc=tj@kernel.org \
    --cc=urezki@gmail.com \
    --cc=vbabka@suse.cz \
    --cc=vincenzo.frascino@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox