From: Brendan Jackman <jackmanb@google.com>
To: Brendan Jackman <jackmanb@google.com>,
Borislav Petkov <bp@alien8.de>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Vlastimil Babka <vbabka@kernel.org>, Wei Xu <weixugc@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
<x86@kernel.org>, <rppt@kernel.org>,
Sumit Garg <sumit.garg@oss.qualcomm.com>, <derkling@google.com>,
<reijiw@google.com>, Will Deacon <will@kernel.org>,
<rientjes@google.com>,
"Kalyazin, Nikita" <kalyazin@amazon.co.uk>,
<patrick.roy@linux.dev>,
"Itazuri, Takahiro" <itazur@amazon.co.uk>,
Andy Lutomirski <luto@kernel.org>,
David Kaplan <david.kaplan@amd.com>,
Thomas Gleixner <tglx@kernel.org>,
Yosry Ahmed <yosry.ahmed@linux.dev>
Subject: Re: [PATCH RFC 04/19] x86/mm: introduce the mermap
Date: Fri, 27 Feb 2026 10:47:45 +0000 [thread overview]
Message-ID: <DGPONXAKU7ZG.11C48FZY9OCPT@google.com> (raw)
In-Reply-To: <20260225-page_alloc-unmapped-v1-4-e8808a03cd66@google.com>
Relaying some code review from an AI that I wasn't able to run before
sending...
(This isn't the AI's verbatim output I'm filtering it and rephrasing).
On Wed Feb 25, 2026 at 4:34 PM UTC, Brendan Jackman wrote:
> +
> +/* Call with migration disabled. */
> +static inline struct mermap_alloc *mermap_alloc(struct mm_struct *mm,
> + unsigned long size, bool use_reserve)
> +{
> + int cpu = raw_smp_processor_id();
> + struct mermap_cpu *mc = this_cpu_ptr(mm->mermap.cpu);
> + unsigned long cpu_end = mermap_cpu_end(cpu);
> + struct mermap_alloc *alloc = NULL;
> +
> + /*
> + * This is an extremely stupid allocator, there can only ever be a small
> + * number of allocations so everything just works on linear search.
> + *
> + * Allocations are "in order", i.e. if the whole region is free it
> + * allocates from the beginning. If there are any existing allocations
> + * it allocates from right after the last (highest address) one. Any
> + * free space before that goes unused.
> + *
> + * Once an allocation has been freed, the space it occupied must be flushed
> + * from the TLB before it can be reused.
> + *
> + * Visual example of how this is suppose to behave (A for allocated, T for
> + * TLB-flush-pending):
> + *
> + * _______________ Start with everything free.
> + * AaaA___________ Allocate something.
> + * TttT___________ Free it. (Region needs a TLB flush now).
> + * TttTAaaaaaaaA__ Allocate something else.
> + * TttTAaaaaaaaAAA Allocate the remaining space.
> + * TttTTtttttttTAA Free the allocation before last.
> + * ^^^^^^^^^^^^^ This could all be reused now but for simplicity it
> + * isn't. Another allocation at this point will fail.
> + * TttTTtttttttTTT Free the last allocation.
> + * _______________ Next time we allocate, first flush the TLB
> + * AA_____________ Now we're back at the beginning.
> + */
> +
> + if (use_reserve) {
> + if (WARN_ON_ONCE(size != PAGE_SIZE))
> + return NULL;
> + lockdep_assert_preemption_disabled();
> + } else {
> + cpu_end -= PAGE_SIZE;
> + }
> +
> + if (WARN_ON_ONCE(!in_task()))
> + return NULL;
> + guard(preempt)();
> +
> + /* Out of already-available space? */
> + if (mc->next_addr + size > cpu_end) {
> + unsigned long new_next = mermap_cpu_base(cpu);
> +
> + /* Would we have space after a TLB flush? */
> + for (int i = 0; i < ARRAY_SIZE(mc->allocs); i++) {
> + struct mermap_alloc *alloc = &mc->allocs[i];
> +
> + /*
> + * The space between the uppermost allocated alloc->end
> + * (or the base of the CPU's region if there are no
> + * current allocations) and mc->next_addr has been
> + * unmapped in the pagetables, but not flushed from the
> + * TLB. Set new_next to point to the beginning of that
> + * space.
> + */
> + if (READ_ONCE(alloc->in_use))
> + new_next = max(new_next, alloc->end);
> + }
> + if (size > cpu_end - new_next)
> + return NULL;
> +
> + mermap_flush_tlb(cpu, mc);
> + mc->next_addr = new_next;
> + }
> +
> + /* Find an alloc-tracking structure to use */
> + for (int i = 0; i < ARRAY_SIZE(mc->allocs); i++) {
> + if (!READ_ONCE(mc->allocs[i].in_use)) {
> + alloc = &mc->allocs[i];
> + break;
> + }
> + }
> + if (!alloc)
> + return NULL;
Oops, I forgot to account for @use_reserve here. The alloc-tracking
structures should have a reservation like how the virtual address space
does otherwise allocations can fail where they aren't supposed to.
> + alloc->in_use = true;
> + alloc->base = mc->next_addr;
> + alloc->end = alloc->base + size;
> + mc->next_addr += size;
> +
> + return alloc;
> +}
> +
> +struct set_pte_ctx {
> + pgprot_t prot;
> + unsigned long next_pfn;
> +};
> +
> +static inline int do_set_pte(pte_t *pte, unsigned long addr, void *data)
> +{
> + struct set_pte_ctx *ctx = data;
> +
> + set_pte(pte, pfn_pte(ctx->next_pfn, ctx->prot));
> + ctx->next_pfn++;
> +
> + return 0;
> +}
> +
> +static struct mermap_alloc *
> +__mermap_get(struct mm_struct *mm, struct page *page,
> + unsigned long size, pgprot_t prot, bool use_reserve)
> +{
> + struct mermap_alloc *alloc = NULL;
> + struct set_pte_ctx ctx;
> + int err;
> +
> + if (size > MERMAP_CPU_REGION_SIZE || WARN_ON_ONCE(!mm || !mm->mermap.cpu))
> + return NULL;
> + if (WARN_ON_ONCE(!arch_mermap_pgprot_allowed(prot)))
> + return NULL;
> +
> + size = PAGE_ALIGN(size);
> +
> + migrate_disable();
> +
> + alloc = mermap_alloc(mm, size, use_reserve);
> + if (!alloc) {
> + migrate_enable();
> + return NULL;
> + }
> +
> + /* This probably wants to be optimised. */
> + ctx.prot = prot;
> + ctx.next_pfn = page_to_pfn(page);
> + err = apply_to_existing_page_range(mm, alloc->base, size, do_set_pte, &ctx);
This takes a PTE lock, and we may have preemption off here, so this may be
broken on PREEMPT_RT? Haven't checked. (Maybe I can just test this by
running it with PREEMPT_RT and lockdep enabled?)
If that's indeed broken, this is yet another point to discuss in about
requirements for a pagetable library [0]. The lock is not needed at all
here - we need a way to modify pagetables that lets you take advantage
of higher-level synchronization.
[0] https://lore.kernel.org/all/20260219175113.618562-1-jackmanb@google.com/
> + if (err) {
> + WRITE_ONCE(alloc->in_use, false);
> + return NULL;
Forgot migrate_enable().
(Is there a way to prevent this with the lovely new
__attribute__((cleanup)) magic?)
> + }
> +
> + return alloc;
> +}
> +
> +/*
> + * Allocate a region of virtual memory, and map the page into it. This tries
> + * pretty hard to be fast but doesn't try very hard at all to actually succeed.
> + *
> + * The returned region is physically local to the current mm. It is _logically_
> + * local to the current CPU but this is not enforced by hardware so it can be
> + * exploited to mitigate CPU vulns. This means the caller must not map memory
> + * here that doesn't belong to the current process. The caller must also perform
> + * a full TLB flush of the region before freeing the pages that have been mapped
> + * here.
> + *
> + * This may only be called from process context, and the caller must arrange to
> + * first call mermap_mm_prepare(). (It would be possible to support this in IRQ,
> + * but it seems unlikely there's a valid usecase given the TLB flushing
> + * requirements). If it succeeds, it disables migration until you call
> + * mermap_put().
> + *
> + * This is guaranteed not to allocate.
This one isn't from AI, but I just realised that's a pretty confusing
thing for an allocator to say. It should say it's guaranteed not to call
the page allocator. (This is important coz I want to use it from the
page allocator).
> + *
> + * Use mermap_addr() to get the actual address of the mapped region.
> + */
> +struct mermap_alloc *mermap_get(struct page *page, unsigned long size, pgprot_t prot)
> +{
> + return __mermap_get(current->mm, page, size, prot, false);
> +}
> +EXPORT_SYMBOL(mermap_get);
> +
> +/*
> + * Allocate a single PAGE_SIZE page via mermap_get(), requiring preemption to be
> + * off until it is freed. This always succeeds.
> + */
> +void *mermap_get_reserved(struct page *page, pgprot_t prot)
Oops, that should return mermap_alloc *.
> +/* Clean up mermap stuff on mm teardown. */
> +void mermap_mm_teardown(struct mm_struct *mm)
> +{
> + int cpu;
> +
> + if (!mm->mermap.cpu)
> + return;
> +
> + for_each_possible_cpu(cpu) {
> + struct mermap_cpu *mc = this_cpu_ptr(mm->mermap.cpu);
Oops, this should be per_cpu_ptr(..., cpu) or whatever it is.
next prev parent reply other threads:[~2026-02-27 10:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-25 16:34 [PATCH RFC 00/19] mm: Add __GFP_UNMAPPED Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 01/19] x86/mm: split out preallocate_sub_pgd() Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 02/19] x86/mm: Generalize LDT remap into "mm-local region" Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 03/19] x86/tlb: Expose some flush function declarations to modules Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 04/19] x86/mm: introduce the mermap Brendan Jackman
2026-02-27 10:47 ` Brendan Jackman [this message]
2026-02-25 16:34 ` [PATCH RFC 05/19] mm: KUnit tests for " Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 06/19] mm: introduce for_each_free_list() Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 07/19] mm/page_alloc: don't overload migratetype in find_suitable_fallback() Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 08/19] mm: introduce freetype_t Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 09/19] mm: move migratetype definitions to freetype.h Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 10/19] mm: add definitions for allocating unmapped pages Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 11/19] mm: rejig pageblock mask definitions Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 12/19] mm: encode freetype flags in pageblock flags Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 13/19] mm/page_alloc: remove ifdefs from pindex helpers Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 14/19] mm/page_alloc: separate pcplists by freetype flags Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 15/19] mm/page_alloc: rename ALLOC_NON_BLOCK back to _HARDER Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 16/19] mm/page_alloc: introduce ALLOC_NOBLOCK Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 17/19] mm/page_alloc: implement __GFP_UNMAPPED allocations Brendan Jackman
2026-02-27 10:56 ` Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 18/19] mm/page_alloc: implement __GFP_UNMAPPED|__GFP_ZERO allocations Brendan Jackman
2026-02-27 11:04 ` Brendan Jackman
2026-02-25 16:34 ` [PATCH RFC 19/19] mm: Minimal KUnit tests for some new page_alloc logic Brendan Jackman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=DGPONXAKU7ZG.11C48FZY9OCPT@google.com \
--to=jackmanb@google.com \
--cc=akpm@linux-foundation.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=david.kaplan@amd.com \
--cc=david@kernel.org \
--cc=derkling@google.com \
--cc=hannes@cmpxchg.org \
--cc=itazur@amazon.co.uk \
--cc=kalyazin@amazon.co.uk \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=luto@kernel.org \
--cc=patrick.roy@linux.dev \
--cc=peterz@infradead.org \
--cc=reijiw@google.com \
--cc=rientjes@google.com \
--cc=rppt@kernel.org \
--cc=sumit.garg@oss.qualcomm.com \
--cc=tglx@kernel.org \
--cc=vbabka@kernel.org \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yosry.ahmed@linux.dev \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox