Re: [LSF/MM/BPF TOPIC] Page allocation for ASI

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Brendan Jackman <jackmanb@google.com>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	 Mel Gorman <mgorman@suse.de>,
	David Hildenbrand <david@redhat.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	 Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>
Subject: Re: [LSF/MM/BPF TOPIC] Page allocation for ASI
Date: Wed, 29 Jan 2025 17:35:29 +0100	[thread overview]
Message-ID: <CA+i-1C1oTMZQ51DLPzyC_SD6Z22r1K9LXLcmBaMkPXQJe55jDA@mail.gmail.com> (raw)
In-Reply-To: <20250129124034.2612562-1-jackmanb@google.com>

On Wed, 29 Jan 2025 at 13:40, Brendan Jackman <jackmanb@google.com> wrote:
>
> At last year’s LSF/MM/BPF I presented Address Space Isolation (ASI) [0]. ASI is
> a mitigation for a broad class of CPU vulnerabilities that works by creating a
> second “restricted” kernel address space which has “sensitive” data unmapped. If
> you’re unfamiliar with ASI, the first 10-15 minutes of that talk provide a broad
> overview of the whole system. The v1 of my RFC [2] also has some explanatory
> discussion in the cover letter.
>
> Last year my talk was pretty high-level, taking the temperature of the MM
> community about how to integrate this into the broader kernel and whether there
> are any major roadblocks.
>
> Since then, I’ve posted a new RFC [1] and Google’s internal implementation has
> continued to expand its footprint in production - it’s now a cornerstone of our
> CPU security strategy. Nonetheless, as noted in the RFCv2 cover-letter there are
> a few hurdles to overcome, at least in a proof-of-concept, before I’ll be making
> actual requests to merge ASI upstream.
>
> The one I’d like to talk about at this session is how to best integrate ASI into
> the page allocator. “Sensitivty” of memory in ASI is currently all decided at
> the allocation site. This means when allocating pages we need to alter the
> pagetables for the restricted address space. This is a little tricky from the
> page allocator:
>
> 1. In the most general case, adding pages to the restricted address space requires
>    allocating pagetables. Allocating while you allocate requires some thought to
>    avoid spaghetti code/deadlock risk.
>
> 2. Removing them requires a TLB flush, which can’t be done from all
>    page-freeing/allocating contexts.
>
> In the RFCs, we’ve simply kept all free pages unmapped from the restricted
> address space. The allocator itself is largely unchanged; at the very end of
> allocation we map pages (if appropriate), allocating pagetables via totally
> separate allocation calls. When ASI-mapped pages are freed, they go onto a queue
> that is then freed asynchronously from a context that’s able to batch up the TLB
> flushes before making them available for re-allocation. Reclaim is then made
> aware of this asynchronous process so that __GFP_DIRECT_RECLAIM allocations can
> block on it where necessary.
>
> Although we’ve been able to hammer this approach into a viable shape for the
> Google workloads we’ve been concerned with so far, it’s not a general solution.
> Some concrete reasons include:
>
> a. It leads to pointless TLB shootdowns; there must be pathological cases where
>    lots of pages get un-mapped only to get immediately re-allocated and mapped
>    again.
>
> b. The asynchronous worker creates CPU jitter.
>
> v. It provides no ability to prioritise re-allocating pages with the same
>    sensitivity as prior allocations. As well as TLB issues this creates page
>    zeroing costs as pages that were formerly sensitive need to be zeroed before
>    they can be mapped into the restricted address space.
>
> d. This all creates unnecessary allocation latency and extra work to free pages.
>
> At last year’s session I touched on the idea of instead using something akin to
> migratetypes to track sensitivity (more accurately: presence in ASI’s restricted
> pagetables) of free pages/pageblocks. The feedback on that idea was basically
> “dunno, we would need more details”. I’m now working on a design based on this
> approach and I’d like to use this session to go over such details. I don’t have
> a prototype yet, but by March I hope to have shared some illustrative code.
>
> Some questions I’m currently investigating that I’d like to discuss details of
> (hopefully, with proposed answers by the time of the conference!):
>
> - Can we totally avoid the need to allocate pagetables during allocation, by
>   keeping ASI’s restricted copy of the physmap in-sync with the unrestricted one,
>   different only in _PAGE_PRESENT?
>
> - If not, what’s the best way to allocate while we allocate?
>
> - When a TLB shootdown would let us satisfy an allocation that is getting into the
>   deeper end of the slowpath, how is that prioritised and structured wrt. direct
>   compact/reclaim/other fallbacks etc?
>
> - How do we maintain a balance of sensitivities among free pages, and what does
>   that desired balance look like?
>
>   - (Note: if no page-table-allocation is needed to map nonsensitive pages, the
>     second question goes away: since mapping is cheap but unmapping is
>     expensive, we would mostly just want to minimize the number of free pages
>     mapped into the restricted address space).
>
> [0] https://lwn.net/Articles/974390/
>     https://www.youtube.com/watch?v=DxaN6X_fdlI
> [1] https://lore.kernel.org/linux-mm/20250110-asi-rfc-v2-v2-0-8419288bc805@google.com/
> [2] https://lore.kernel.org/linux-mm/20240712-asi-rfc-24-v1-0-144b319a40d8@google.com/

Hmm, I did not CC anyone except the list. Adding some people in case
it prompts a discussion.

next prev parent reply	other threads:[~2025-01-29 16:35 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-29 12:40 Brendan Jackman
2025-01-29 16:35 ` Brendan Jackman [this message]
2025-01-31 11:08   ` Brendan Jackman
2025-02-05 13:32   ` Brendan Jackman
2025-03-26 15:36 ` Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+i-1C1oTMZQ51DLPzyC_SD6Z22r1K9LXLcmBaMkPXQJe55jDA@mail.gmail.com \
    --to=jackmanb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox