Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Brendan Jackman <jackmanb@google.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Junaid Shahid <junaids@google.com>,
	akpm@linux-foundation.org, dave.hansen@linux.intel.com,
	yosryahmed@google.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterz@infradead.org, seanjc@google.com, tglx@linutronix.de,
	x86@kernel.org
Subject: Re: [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API
Date: Mon, 17 Mar 2025 11:40:29 +0000	[thread overview]
Message-ID: <Z9gKLdNm9p6qGACS@google.com> (raw)
In-Reply-To: <20250315123621.GCZ9V0RWGFapbQNL1w@fat_crate.local>

On Sat, Mar 15, 2025 at 01:36:21PM +0100, Borislav Petkov wrote:
> On Fri, Mar 14, 2025 at 06:34:32PM -0700, Junaid Shahid wrote:
> > The reason this isn't a problem with the current asi_enter() is because
> > there the equivalent of asi_start_critical() happens _before_ the address
> > space switch. That ensures that even if an NMI arrives in the middle of
> > asi_enter(), the NMI epilog will switch to the restricted address space and
> > there is no window where an NMI (or any other interrupt/exception for that
> > matter) would result in going into vmenter with an unrestricted address
> > space.
> 
> Aha.
> 
> > So
> > 	asi_enter();
> > 	asi_start_critical();
> > 	vmenter();
> > 	asi_end_critical();
> > 
> > is broken as there is a problematic window between asi_enter() and
> > asi_start_critical() as Brendan pointed out.
> > 
> > However,
> > 	asi_start_critical();
> > 	asi_enter();
> > 	vmenter();
> > 	asi_end_critical();
> > 
> > would work perfectly fine.
> > 
> > Perhaps that might be the way to refactor the API?
> 
> Ok, let's see if I understand the API better now. And I'm using function names
> which say what they do:
> 
> I guess the flow and needed ingredients should be:
> 
> 	1. asi_lock() or asi_start() or whatnot which does that atomic switch
> 	   of asi target. That tells other code like the NMI glue where in the
> 	   asi context the CPU is so that glue code can know what to do on
> 	   return
> 
> 	2. asi_switch_address_space() - the expensive pagetables build and CR3
> 	   switch

(Not relevant to the present discussion but just taking a chance to
 add some colour: there are no pagetables to build here, they are ready
 to go - e.g. see my separate RFC from last week[0] on how the physmap
 could be managed.

 Also, it's perhaps worth noting that the CR3 write itself isn't
 expected to be the really expensive thing, in the general case we
 expect that to be dwarfed by the cost of the actual security
 mitigations - IBPB etc)

[0] https://lore.kernel.org/all/20250313-asi-page-alloc-v1-0-04972e046cea@google.com/

> 	3. asi_enter_critical_region() - this could be NOP but basically marks
> 	   the beginning of the CPU executing "unsafe" code
> 
> 		<... executes unsafe code... >
> 
> 	4. asi_exit_critical_region() - sets ASI target to NULL, i.e., what
> 	   asi_relax) does now. This also atomic and tells other code, we're
> 	   done with executing unsafe code but we're still running in the
> 	   restricted address space.
> 
> 	   This here can go back to 3 as often as needed.
> 
> 	5. asi_switch_address_space() - this goes back to the unrestricted
> 	   adddress space
> 
> 	6. asi_unlock()
> 
> Close?

I don't understand having both asi_[un]lock() _and_
asi_{start,enter}_critical_region(). The only reason we need the
critical section concept is for the purposes of the NMI glue code you
mentioned in part 1, and that setup must happen before the switch into
the restricted address space.

Also, I don't think we want part 5 inside the asi_lock()->asi_unlock()
region. That seems like the region betwen part 5 and 6, we are in the
unrestricted address space, but the NMI entry code is still set up to
return to the restricted address space on exception return. I think
that would actually be harmless, but it doesn't achieve anything.

The more I talk about it, the more convinced I am that the proper API
should only have two elements, one that says "I'm about to run
untrusted code" and one that says "I've finished running untrusted
code". But...

> 1. you can do empty calls to keep the interface balanced and easy to use
>
> 2. once you can remove asi_exit(), you should be able to replace all in-tree
>    users in one atomic change so that they're all switched to the new,
>    simplified interface

Then what about if we did this:

/* 
 * Begin a region where ASI restricted address spaces _may_ be used.
 *
 * Preemption must be off throughout this region.
 */
static inline void asi_start(void)
{
	/* 
	 * Cannot currently context switch in the restricted adddress
	 * space.
	 */
	lockdep_assert_preemption_disabled();

	/* 
	 * (Actually, this doesn't do anything besides assert, it's
	 * just to help the API make sense).
	 */
}

/*
 * End a region begun by asi_start(). After this, the CPU cannot be in
 * the restricted address space until the next asi_start(). 
 */
static inline void asi_end(void)
{
	/* Leave the restricted address space if we're in it. */
	...
}

/*
 * About to run untrusted code, begin a region that _must_ run in the
 * restricted address space.
 */
void asi_start_critical(void);

/* End a region begun by asi_start_critical(). */
void asi_end_critical(void);

ioctl(KVM_RUN) {
    enter_from_user_mode()
    asi_start()
    while !need_userspace_handling()
        asi_start_critical();
        vmenter();
        asi_end_critical();
    }
    asi_end()
    exit_to_user_mode()
}

Then the API is balanced, and we have a clear migration path towards
the two-element API, i.e. we need to just remove asi_start() and
asi_end(). It also better captures the point about the temporary
simplification: basically the reason why the API is currently
overcomplicated is: if totally arbitrary parts of the kernel can find
themselves in the restricted address space, we have more work to do.
(It's totally possible, but we don't wanna block initial submission on
that work). The simplification is about demarcating what code is and
isn't affected by ASI, so having this "region" kinda helps with that.
Although, because NMIs can also be affected it's a bit of a fuzzy
demarcation...

next prev parent reply	other threads:[~2025-03-17 11:40 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-10 18:40 [PATCH RFC v2 00/29] Address Space Isolation (ASI) Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 01/29] mm: asi: Make some utility functions noinstr compatible Brendan Jackman
2025-01-16  0:18   ` Borislav Petkov
2025-01-16 10:27     ` Borislav Petkov
2025-01-16 13:22       ` Brendan Jackman
2025-01-16 14:02         ` Borislav Petkov
2025-01-10 18:40 ` [PATCH RFC v2 02/29] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION Brendan Jackman
2025-01-16 16:43   ` Borislav Petkov
2025-03-01  7:23   ` Mike Rapoport
2025-03-05 13:12     ` Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 03/29] mm: asi: Introduce ASI core API Brendan Jackman
2025-02-19 10:55   ` Borislav Petkov
2025-02-19 13:50     ` Brendan Jackman
2025-02-19 13:53     ` Brendan Jackman
2025-02-27 12:06       ` Borislav Petkov
2025-02-28  8:43         ` Brendan Jackman
2025-03-14 13:14           ` Borislav Petkov
2025-03-15  1:34             ` Junaid Shahid
2025-03-15 12:36               ` Borislav Petkov
2025-03-17 11:40                 ` Brendan Jackman [this message]
2025-03-18  0:50                   ` Junaid Shahid
2025-03-18 13:03                     ` Brendan Jackman
2025-03-18 22:48                       ` Junaid Shahid
2025-03-19 15:23                         ` Borislav Petkov
2025-01-10 18:40 ` [PATCH RFC v2 04/29] mm: asi: Add infrastructure for boot-time enablement Brendan Jackman
2025-03-19 17:29   ` Borislav Petkov
2025-03-19 18:47     ` Yosry Ahmed
2025-03-20 10:44       ` Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 05/29] mm: asi: ASI support in interrupts/exceptions Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 06/29] mm: asi: Use separate PCIDs for restricted address spaces Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 07/29] mm: asi: Make __get_current_cr3_fast() ASI-aware Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 08/29] mm: asi: Avoid warning from NMI userspace accesses in ASI context Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 09/29] mm: asi: ASI page table allocation functions Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 10/29] mm: asi: asi_exit() on PF, skip handling if address is accessible Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 11/29] mm: asi: Functions to map/unmap a memory range into ASI page tables Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 12/29] mm: asi: Add basic infrastructure for global non-sensitive mappings Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 13/29] mm: Add __PAGEFLAG_FALSE Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 14/29] mm: asi: Map non-user buddy allocations as nonsensitive Brendan Jackman
2025-01-10 18:40 ` [PATCH TEMP WORKAROUND RFC v2 15/29] mm: asi: Workaround missing partial-unmap support Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 16/29] mm: asi: Map kernel text and static data as nonsensitive Brendan Jackman
2025-01-17 11:23   ` Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 17/29] mm: asi: Map vmalloc/vmap " Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 18/29] mm: asi: Map dynamic percpu memory " Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 19/29] mm: asi: Stabilize CR3 in switch_mm_irqs_off() Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 20/29] mm: asi: Make TLB flushing correct under ASI Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 21/29] KVM: x86: asi: Restricted address space for VM execution Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 22/29] mm: asi: exit ASI before accessing CR3 from C code where appropriate Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 23/29] mm: asi: exit ASI before suspend-like operations Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 24/29] mm: asi: Add infrastructure for mapping userspace addresses Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 25/29] mm: asi: Restricted execution fore bare-metal processes Brendan Jackman
2025-02-28 15:32   ` Yosry Ahmed
2025-03-20 15:55   ` Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 26/29] x86: Create library for flushing L1D for L1TF Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 27/29] mm: asi: Add some mitigations on address space transitions Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 28/29] x86/pti: Disable PTI when ASI is on Brendan Jackman
2025-01-10 18:40 ` [PATCH RFC v2 29/29] mm: asi: Stop ignoring asi=on cmdline flag Brendan Jackman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z9gKLdNm9p6qGACS@google.com \
    --to=jackmanb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=junaids@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=peterz@infradead.org \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox