linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kalesh Singh <kaleshsingh@google.com>
To: "David Hildenbrand (Arm)" <david@kernel.org>
Cc: Kiryl Shutsemau <kas@kernel.org>,
	lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	 x86@kernel.org, linux-kernel@vger.kernel.org,
	 Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	 Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	 "Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	 Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Usama Arif <usama.arif@linux.dev>
Subject: Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86
Date: Fri, 20 Feb 2026 11:33:16 -0800	[thread overview]
Message-ID: <CAC_TJvd=wKKnj=d3phZsAaarXUdUbZDGauxWNWkQtsFV-MTYEg@mail.gmail.com> (raw)
In-Reply-To: <d7c7ef63-e40c-40c5-8ce5-a4ca411da832@kernel.org>

On Fri, Feb 20, 2026 at 8:30 AM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/20/26 13:07, Kiryl Shutsemau wrote:
> > On Fri, Feb 20, 2026 at 11:24:37AM +0100, David Hildenbrand (Arm) wrote:
> >>>
> >>> Just to clarify, do you want it to be enforced on userspace ABI.
> >>> Like, all mappings are 64k aligned?
> >>
> >> Right, see the proposal from Dev on the list.
> >>
> >>  From user-space POV, the pagesize would be 64K for these emulated processes.
> >> That is, VMAs must be suitable aligned etc.
> >
> > Well, it will drastically limit the adoption. We have too much legacy
> > stuff on x86.
>
> I'd assume that many applications nowadays can deal with differing page
> sizes (thanks to some other architectures paving the way).
>
> But yes, some real legacy stuff, or stuff that ever only cared about
> intel still hardcodes pagesize=4k.

I think most issues will stem from linkers setting the default ELF
segment alignment (max-page-size) for x86 to 4096. So those ELFs will
not load correctly or at all on the larger emulated granularity.

-- Kalesh

>
> In Meta's fleet, I'd be quite interesting how much conversion there
> would have to be done.
>
> For legacy apps, you could still run them as 4k pagesize on the same
> system, of course.
>
> >
> >>>
> >>> Waste of memory for page table is solvable and pretty straight forward.
> >>> Most of such cases can be solve mechanically by switching to slab.
> >>
> >> Well, yes, like Willy says, there are already similar custom solutions for
> >> s390x and ppc.
> >>
> >> Pasha talked recently about the memory waste of 16k kernel stacks and how we
> >> would want to reduce that to 4k. In your proposal, it would be 64k, unless
> >> you somehow manage to allocate multiple kernel stacks from the same 64k
> >> page. My head hurts thinking about whether that could work, maybe it could
> >> (no idea about guard pages in there, though).
> >
> > Kernel stack is allocated from vmalloc. I think mapping them with
> > sub-page granularity should be doable.
>
> I still have to wrap my head around the sub-page mapping here as well.
> It's scary.
>
> Re mapcount: I think if any part of the page is mapped, it would be
> considered mapped -> mapcount += 1.
>
> >
> > BTW, do you see any reason why slab-allocated stack wouldn't work for
> > large base page sizes? There's no requirement for it be aligned to page
> > or PTE, right?
>
> I'd assume that would work. Devil is in the detail with these things
> before we have memdescs.
>
> E.g., page table have a dedicated type (PGTY_table) and store separate
> metadata in the ptdesc. For kernel stack there was once a proposal to
> have a type but it is not upstream.
>
> >
> >> Let's take a look at the history of page size usage on Arm (people can feel
> >> free to correct me):
> >>
> >> (1) Most distros were using 64k on Arm.
> >>
> >> (2) People realized that 64k was suboptimal many use cases (memory
> >>      waste for stacks, pagecache, etc) and started to switch to 4k. I
> >>      remember that mostly HPC-centric users sticked to 64k, but there was
> >>      also demand from others to be able to stay on 64k.
> >>
> >> (3) Arm improved performance on a 4k kernel by adding cont-pte support,
> >>      trying to get closer to 64k native performance.
> >>
> >> (4) Achieving 64k native performance is hard, which is why per-process
> >>      page sizes are being explored to get the best out of both worlds
> >>      (use 64k page size only where it really matters for performance).
> >>
> >> Arm clearly has the added benefit of actually benefiting from hardware
> >> support for 64k.
> >>
> >> IIUC, what you are proposing feels a bit like traveling back in time when it
> >> comes to the memory waste problem that Arm users encountered.
> >>
> >> Where do you see the big difference to 64k on Arm in your proposal? Would
> >> you currently also be running 64k Arm in production and the memory waste etc
> >> is acceptable?
> >
> > That's the point. I don't see a big difference to 64k Arm. I want to
> > bring this option to x86: at some machine size it makes sense trade
> > memory consumption for scalability. I am targeting it to machines with
> > over 2TiB of RAM.
> >
> > BTW, we do run 64k Arm in our fleet. There's some growing pains, but it
> > looks good in general We have no plans to switch to 4k (or 16k) at the
> > moment. 512M THPs also look good on some workloads.
>
> Okay, that's valuable information, thanks!
>
> Being able to remove the sub-page mapping part (or being able to just
> hide it somewhere deep down in arch code) would make this a lot easier
> to digest.
>
> --
> Cheers,
>
> David
>


  reply	other threads:[~2026-02-20 19:33 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:08 Kiryl Shutsemau
2026-02-19 15:17 ` Peter Zijlstra
2026-02-19 15:20   ` Peter Zijlstra
2026-02-19 15:27     ` Kiryl Shutsemau
2026-02-19 15:33 ` Pedro Falcato
2026-02-19 15:50   ` Kiryl Shutsemau
2026-02-19 15:53     ` David Hildenbrand (Arm)
2026-02-19 19:31       ` Pedro Falcato
2026-02-19 15:39 ` David Hildenbrand (Arm)
2026-02-19 15:54   ` Kiryl Shutsemau
2026-02-19 16:09     ` David Hildenbrand (Arm)
2026-02-20  2:55       ` Zi Yan
2026-02-19 17:09   ` Kiryl Shutsemau
2026-02-20 10:24     ` David Hildenbrand (Arm)
2026-02-20 12:07       ` Kiryl Shutsemau
2026-02-20 16:30         ` David Hildenbrand (Arm)
2026-02-20 19:33           ` Kalesh Singh [this message]
2026-02-19 23:24   ` Kalesh Singh
2026-02-20 12:10     ` Kiryl Shutsemau
2026-02-20 19:21       ` Kalesh Singh
2026-02-19 17:08 ` Dave Hansen
2026-02-19 22:05   ` Kiryl Shutsemau
2026-02-20  3:28     ` Liam R. Howlett
2026-02-20 12:33       ` Kiryl Shutsemau
2026-02-20 15:17         ` Liam R. Howlett
2026-02-20 15:50           ` Kiryl Shutsemau
2026-02-19 17:30 ` Dave Hansen
2026-02-19 22:14   ` Kiryl Shutsemau
2026-02-19 22:21     ` Dave Hansen
2026-02-19 17:47 ` Matthew Wilcox
2026-02-19 22:26   ` Kiryl Shutsemau
2026-02-20  9:04 ` David Laight
2026-02-20 12:12   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAC_TJvd=wKKnj=d3phZsAaarXUdUbZDGauxWNWkQtsFV-MTYEg@mail.gmail.com' \
    --to=kaleshsingh@google.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mingo@redhat.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=usama.arif@linux.dev \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox