Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "David Hildenbrand (Arm)" <david@kernel.org>
To: Kiryl Shutsemau <kas@kernel.org>
Cc: lsf-pc@lists.linux-foundation.org, linux-mm@kvack.org,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Usama Arif <usama.arif@linux.dev>
Subject: Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86
Date: Fri, 20 Feb 2026 17:30:26 +0100	[thread overview]
Message-ID: <d7c7ef63-e40c-40c5-8ce5-a4ca411da832@kernel.org> (raw)
In-Reply-To: <aZhErt9DZcWI24_v@thinkstation>

On 2/20/26 13:07, Kiryl Shutsemau wrote:
> On Fri, Feb 20, 2026 at 11:24:37AM +0100, David Hildenbrand (Arm) wrote:
>>>
>>> Just to clarify, do you want it to be enforced on userspace ABI.
>>> Like, all mappings are 64k aligned?
>>
>> Right, see the proposal from Dev on the list.
>>
>>  From user-space POV, the pagesize would be 64K for these emulated processes.
>> That is, VMAs must be suitable aligned etc.
> 
> Well, it will drastically limit the adoption. We have too much legacy
> stuff on x86.

I'd assume that many applications nowadays can deal with differing page 
sizes (thanks to some other architectures paving the way).

But yes, some real legacy stuff, or stuff that ever only cared about 
intel still hardcodes pagesize=4k.

In Meta's fleet, I'd be quite interesting how much conversion there 
would have to be done.

For legacy apps, you could still run them as 4k pagesize on the same 
system, of course.

> 
>>>
>>> Waste of memory for page table is solvable and pretty straight forward.
>>> Most of such cases can be solve mechanically by switching to slab.
>>
>> Well, yes, like Willy says, there are already similar custom solutions for
>> s390x and ppc.
>>
>> Pasha talked recently about the memory waste of 16k kernel stacks and how we
>> would want to reduce that to 4k. In your proposal, it would be 64k, unless
>> you somehow manage to allocate multiple kernel stacks from the same 64k
>> page. My head hurts thinking about whether that could work, maybe it could
>> (no idea about guard pages in there, though).
> 
> Kernel stack is allocated from vmalloc. I think mapping them with
> sub-page granularity should be doable.

I still have to wrap my head around the sub-page mapping here as well. 
It's scary.

Re mapcount: I think if any part of the page is mapped, it would be 
considered mapped -> mapcount += 1.

> 
> BTW, do you see any reason why slab-allocated stack wouldn't work for
> large base page sizes? There's no requirement for it be aligned to page
> or PTE, right?

I'd assume that would work. Devil is in the detail with these things 
before we have memdescs.

E.g., page table have a dedicated type (PGTY_table) and store separate 
metadata in the ptdesc. For kernel stack there was once a proposal to 
have a type but it is not upstream.

> 
>> Let's take a look at the history of page size usage on Arm (people can feel
>> free to correct me):
>>
>> (1) Most distros were using 64k on Arm.
>>
>> (2) People realized that 64k was suboptimal many use cases (memory
>>      waste for stacks, pagecache, etc) and started to switch to 4k. I
>>      remember that mostly HPC-centric users sticked to 64k, but there was
>>      also demand from others to be able to stay on 64k.
>>
>> (3) Arm improved performance on a 4k kernel by adding cont-pte support,
>>      trying to get closer to 64k native performance.
>>
>> (4) Achieving 64k native performance is hard, which is why per-process
>>      page sizes are being explored to get the best out of both worlds
>>      (use 64k page size only where it really matters for performance).
>>
>> Arm clearly has the added benefit of actually benefiting from hardware
>> support for 64k.
>>
>> IIUC, what you are proposing feels a bit like traveling back in time when it
>> comes to the memory waste problem that Arm users encountered.
>>
>> Where do you see the big difference to 64k on Arm in your proposal? Would
>> you currently also be running 64k Arm in production and the memory waste etc
>> is acceptable?
> 
> That's the point. I don't see a big difference to 64k Arm. I want to
> bring this option to x86: at some machine size it makes sense trade
> memory consumption for scalability. I am targeting it to machines with
> over 2TiB of RAM.
> 
> BTW, we do run 64k Arm in our fleet. There's some growing pains, but it
> looks good in general We have no plans to switch to 4k (or 16k) at the
> moment. 512M THPs also look good on some workloads.

Okay, that's valuable information, thanks!

Being able to remove the sub-page mapping part (or being able to just 
hide it somewhere deep down in arch code) would make this a lot easier 
to digest.

-- 
Cheers,

David

next prev parent reply	other threads:[~2026-02-20 16:30 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-19 15:08 Kiryl Shutsemau
2026-02-19 15:17 ` Peter Zijlstra
2026-02-19 15:20   ` Peter Zijlstra
2026-02-19 15:27     ` Kiryl Shutsemau
2026-02-19 15:33 ` Pedro Falcato
2026-02-19 15:50   ` Kiryl Shutsemau
2026-02-19 15:53     ` David Hildenbrand (Arm)
2026-02-19 19:31       ` Pedro Falcato
2026-02-19 15:39 ` David Hildenbrand (Arm)
2026-02-19 15:54   ` Kiryl Shutsemau
2026-02-19 16:09     ` David Hildenbrand (Arm)
2026-02-20  2:55       ` Zi Yan
2026-02-19 17:09   ` Kiryl Shutsemau
2026-02-20 10:24     ` David Hildenbrand (Arm)
2026-02-20 12:07       ` Kiryl Shutsemau
2026-02-20 16:30         ` David Hildenbrand (Arm) [this message]
2026-02-20 19:33           ` Kalesh Singh
2026-02-19 23:24   ` Kalesh Singh
2026-02-20 12:10     ` Kiryl Shutsemau
2026-02-20 19:21       ` Kalesh Singh
2026-02-19 17:08 ` Dave Hansen
2026-02-19 22:05   ` Kiryl Shutsemau
2026-02-20  3:28     ` Liam R. Howlett
2026-02-20 12:33       ` Kiryl Shutsemau
2026-02-20 15:17         ` Liam R. Howlett
2026-02-20 15:50           ` Kiryl Shutsemau
2026-02-19 17:30 ` Dave Hansen
2026-02-19 22:14   ` Kiryl Shutsemau
2026-02-19 22:21     ` Dave Hansen
2026-02-19 17:47 ` Matthew Wilcox
2026-02-19 22:26   ` Kiryl Shutsemau
2026-02-20  9:04 ` David Laight
2026-02-20 12:12   ` Kiryl Shutsemau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7c7ef63-e40c-40c5-8ce5-a4ca411da832@kernel.org \
    --to=david@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=kas@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mingo@redhat.com \
    --cc=rppt@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=usama.arif@linux.dev \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox