From: Pedro Falcato <pfalcato@suse.de>
To: Dev Jain <dev.jain@arm.com>
Cc: lsf-pc@lists.linux-foundation.org, ryan.roberts@arm.com,
catalin.marinas@arm.com, will@kernel.org, ardb@kernel.org,
willy@infradead.org, hughd@google.com,
baolin.wang@linux.alibaba.com, akpm@linux-foundation.org,
david@kernel.org, lorenzo.stoakes@oracle.com,
Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org,
surenb@google.com, mhocko@suse.com, linux-mm@kvack.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Per-process page size
Date: Mon, 23 Feb 2026 12:49:18 +0000 [thread overview]
Message-ID: <fqsd4x5oqavouhwawmsmpanaszvr6xsboc2oeqzs3fafrtovpk@gfnqznoqlabk> (raw)
In-Reply-To: <a778d249-afea-4488-b045-beff3492a48a@arm.com>
On Mon, Feb 23, 2026 at 10:37:55AM +0530, Dev Jain wrote:
> >>
> >> 3. Translation from Linux pagetable to native pagetable
> >> -------------------------------------------------------
> >> Assume the case of a kernel pagesize of 4K and app pagesize of 64K.
> >> Now that enlightenment is done, it is guaranteed that every single mapping
> >> in the 4K pagetable (which we call the Linux pagetable) is of granularity
> >> at least 64K. In the arm64 MM code, we maintain a "native" pagetable per
> >> mm_struct, which is based off a 64K geometry. Because of the guarantee
> >> aforementioned, any pagetable operation on the Linux pagetable
> >> (set_ptes, clear_flush_ptes, modify_prot_start_ptes, etc) is going to happen
> >> at a granularity of at least 16 PTEs - therefore we can translate this
> >> operation to modify a single PTE entry in the native pagetable.
> >> Given that enlightenment may miss corner cases, we insert a warning in the
> >> architecture code - on being presented with an operation not translatable
> >> into a native operation, we fallback to the Linux pagetable, thus losing
> >> the benefits borne out of the pagetable geometry but keeping
> >> the emulation intact.
> > I don't understand. What exactly are you trying to do here? Maintain 2
> > different paging structures, one for core mm and the other for the arch? As
> > done in architectures with no radix tree paging structures?
>
> The mm->pgd will be the software pagetable. So suppose that do_anonymous_page is
> doing set_ptes on the PTE table belonging to the software pagetable. We will
> hook a "native_set_ptes" into set_ptes, which will set the ptes on a different
> pagetable maintained by arm64 code (probably mm_context_t->native_pgd).
Traditionally, you do this kind of funky manipulation in update_mmu_cache.
But this is still an extremely complex and invasive change (that I assume most
people would not like to see) with dubious benefit.
>
> >
> > If so, that's wildly inefficient, unless you're willing to go into reclaimable
> > page tables on the arm64 side. And that brings extra problems and extra fun :)
>
> I didn't understand the reclaimable reference, but yes we need to make this efficient.
I'm not talking about CPU runtime efficiency, but memory efficiency. Doing
this makes you essentially duplicate page tables - not exactly ideal. This is
a Known Problem in classic UNIX systems which do something similar
(but not the same): anonymous memory pointers are stored in some intermediary
structure (SunOS and UVM call it "amap"), and paging structures are entirely
redundant there. They can freely tear down a page table because they can freely
put it together from the amap and file mappings (what they call vm_object and
we call address_space).
Anyway, I'm boring you with these funny historical details so you can understand
the similarities: the Linux page table format generally matches hardware, and
we store anonymous memory "state" there, so you can't ever tear-down a pgtable
without losing state of whatever was mapped there before. However, if you go
down the "arm64 now has a separate pgtable structure", the roles switch:
arm64's internal page table format makes for the real page tables, and linux's
pgtable structure is nothing more than an "amap". So you could (and perhaps
should) freely reclaim arm64 MMU page tables once memory pressure hits, because
they are freely discardable.
Does this make sense?
--
Pedro
next prev parent reply other threads:[~2026-02-23 12:49 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-17 14:50 Dev Jain
2026-02-17 15:22 ` Matthew Wilcox
2026-02-17 15:30 ` David Hildenbrand (Arm)
2026-02-17 15:51 ` Ryan Roberts
2026-02-20 4:49 ` Matthew Wilcox
2026-02-20 16:50 ` David Hildenbrand (Arm)
2026-02-23 13:02 ` [Lsf-pc] " Jan Kara
2026-02-18 8:39 ` Dev Jain
2026-02-18 8:58 ` Dev Jain
2026-02-18 9:15 ` David Hildenbrand (Arm)
2026-02-20 9:49 ` Arnd Bergmann
2026-02-20 13:37 ` Pedro Falcato
2026-02-23 5:07 ` Dev Jain
2026-02-23 12:49 ` Pedro Falcato [this message]
2026-02-23 13:01 ` David Hildenbrand (Arm)
2026-02-23 15:18 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fqsd4x5oqavouhwawmsmpanaszvr6xsboc2oeqzs3fafrtovpk@gfnqznoqlabk \
--to=pfalcato@suse.de \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=ardb@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=catalin.marinas@arm.com \
--cc=david@kernel.org \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=ryan.roberts@arm.com \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=will@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox