linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Brijesh Singh <brijesh.singh@amd.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	David Rientjes <rientjes@google.com>
Cc: brijesh.singh@amd.com, Borislav Petkov <bp@alien8.de>,
	Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andi Kleen <ak@linux.intel.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Jon Grimm <jon.grimm@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Christoph Hellwig <hch@lst.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>, Joerg Roedel <jroedel@suse.de>,
	x86@kernel.org, linux-mm@kvack.org
Subject: Re: AMD SEV-SNP/Intel TDX: validation of memory pages
Date: Tue, 2 Feb 2021 18:16:41 -0600	[thread overview]
Message-ID: <961a2736-9bc9-43e1-1e75-6d373fe9590b@amd.com> (raw)
In-Reply-To: <20210202160205.3wfchtibq2sd7pe5@black.fi.intel.com>


On 2/2/21 10:02 AM, Kirill A. Shutemov wrote:
> On Mon, Feb 01, 2021 at 05:51:09PM -0800, David Rientjes wrote:
>> Hi everybody,
>>
>> I'd like to kick-start the discussion on lazy validation of guest memory
>> for the purposes of AMD SEV-SNP and Intel TDX.
>>
>> Both AMD SEV-SNP and Intel TDX require validation of guest memory before
>> it may be used by the guest.  This is needed for integrity protection from
>> a potentially malicious hypervisor or other host components.
>>
>> For AMD SEV-SNP, the hypervisor assigns a page to the guest using the new
>> RMPUPDATE instruction.  The guest then transitions the page to a usable by
>> the new PVALIDATE instruction[1].  This sets the Validated flag in the
>> Reverse Map Table (RMP) for a guest addressable page, which opts into
>> hardware and firmware integrity protection.  This may only be done by the
>> guest itself and until that time, the guest cannot access the page.
>>
>> The guest can only PVALIDATE memory for a gPA once; the RMP then
>> guarantees for each hPA that there is only a single gPA mapping.  This
>> validation can either be done all up front at the time the guest is booted
>> or it can be done lazily at runtime on fault if the guest keeps track of
>> Valid vs Invalid pages.  Because doing PVALIDATE for all guest memory at
>> boot would be extremely lengthy, I'd like to discuss the options for doing
>> it lazily.
>>
>> Similarly, for Intel TDX, the hypervisor unmaps the gPA from the shared
>> EPT and invalidates the tlb and all caches for the TD's vcpus; it then
>> adds a page to the gPA address space for a TD by using the new
>> TDH.MEM.PAGE.AUG call.  The TDG.MEM.PAGE.ACCEPT TDCALL[2] then allows a
>> guest to accept a guest page for a gPA and initialize it using the private
>> key for that TD.  This may only be done by the TD itself and until that
>> time, the gPA cannot be used within the TD.
>>
>> Both AMD SEV-SNP and Intel TDX support hugepages.  SEV-SNP supports 2MB
>> whereas TDX has accept TDCALL support for 2MB and 1GB.
>>
>> I believe the UEFI ECR[3] for the unaccepted memory type to
>> EFI_MEMORY_TYPE was accepted in December.  This should enable the guest to
>> learn what memory has not yet been validated (or accepted) by the firmware
>> if all guest memory is not done completely up front.
>>
>> This likely requires a pre-validation of all memory that can be accessed
>> when handling a #VC (or #VE for TDX) such as IST stacks, including memory
>> in the x86 boot sequence that must be validated before the core mm
>> subsystem is up and running to handle the lazy validation.  I believe
>> lazy validation can be done by the core mm after that, perhaps by
>> maintaining a new "validated" bit in struct page flags.
>>
>> Has anybody looked into this or, even better, is anybody currently working
>> on this?
> It's likely I'm going to do this on Intel side, but I have not looked
> deeply into it.
>
>> I think quite invasive changes are needed for the guest to support lazy
>> validation/acceptance to core areas that lots of people on the recipient
>> list have strong opinions about.  Some things that come to mind:
>>
>>  - Annotations for pages that must be pre-validated in the x86 boot
>>    sequence, including IST stacks
>>
>>  - Proliferation of these annotations throughout any kernel code that can
>>    access memory for #VC or #VE
>>
>>  - Handling lazy validation of guest memory through the core mm layer,
>>    most likely involving a bit in struct page flags to track their status
>>
>>  - Any need for validating memory that is not backed by struct page that
>>    needs to be special-cased
>>
>>  - Any concerns about this for the DMA layer
>>
>> One possibility for minimal disruption to the boot entry code is to
>> require the guest BIOS to validate 4GB and below, and then leave 4GB and
>> above to be done lazily (the true amount of memory will actually be less
>> due to the MMIO hole).
> [ As I didn't looked into actual code, I may say total garbage below... ]
>
> Pre-validating 4GB would indeed be easiest way to go, but it's going to be
> too slow.
>
> The more realistic is for BIOS to pre-validate memory where kernel and
> initrd are placed, plus few dozen megs for runtime. It means decompression
> code would need to be aware about the validation.


I was thinking that BIOS validating the lower 4GB will simplify the
changes to the kernel entry code path as well provide a clean approach
to support kexec. 

My initial thought is

- BIOS or VMM validate lower 4GB memory.

- BIOS mark the higher 4GB as unaccepted in e820/efi memmap

- Kernel early boot can be achieved with minimal (or no changes)

- If there is an unaccepted type discovered then allocate a bitmap that
can be used to keep track of information (e.g which pages are
validated). We can also explore whether removing the unaccepted flag
from the memmap range will work.

- On #VC/#VE, look at the bitmap to see if we need to validate the
pages. To speed up, we can validate more than one page on #VC/#VE.

- If we get kexec'd then rebuild the e820/memmap based on the bitmap so
that we don't double validate. 


>
> The critical thing is that once memory is validate we must not validate
> it again. It's possible VMM->guest attack vector. We must track precisely
> what memory has been validated and stop the guest on detecting the
> unexpected second validation request.
>
> It also means that we has to keep the information when the control gets
> passed from decompression code to the real kernel. Page flag is no good
> for this.
>
> My initial thought that we can use e820/efi memmap to keep track of
> information -- remove the unaccepted memory flag from the range that got
> accepted.
>
> The decompression code validates the memory that it's need for
> decompression, modify memmap accordingly and pass control to the main
> kernel.
>
> The main kernel may accept the memory via #VE/#VC, but ideally it need to
> stay within memory accepted by decompression code for initial boot.
>
> I think the bulk of memory validation can be done via existing machinery:
> we have already deferred struct page initialization code in kernel and I
> believe we can hook up into it for the purpose.
>
> Any comments?
>


  reply	other threads:[~2021-02-03  0:17 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-02  1:51 David Rientjes
2021-02-02 13:17 ` Matthew Wilcox
2021-02-02 16:02 ` Kirill A. Shutemov
2021-02-03  0:16   ` Brijesh Singh [this message]
2021-02-11 17:46     ` Sean Christopherson
2021-02-02 22:37 ` Andi Kleen
2021-02-11 20:46 ` Peter Zijlstra
2021-02-12 13:19 ` Joerg Roedel
2021-02-12 14:17   ` Peter Zijlstra
2021-02-12 14:53     ` Joerg Roedel
2021-02-12 15:19       ` Peter Zijlstra
2021-02-12 15:28         ` Joerg Roedel
2021-02-12 16:12           ` Peter Zijlstra
2021-02-12 16:18             ` Joerg Roedel
2021-02-12 16:45               ` Peter Zijlstra
2021-02-12 17:48                 ` Dave Hansen
2021-02-12 18:22                   ` Sean Christopherson
2021-02-12 18:38                     ` Andy Lutomirski
2021-02-12 18:43                       ` Sean Christopherson
2021-02-12 18:46                     ` Dave Hansen
2021-02-12 19:24                       ` Sean Christopherson
2021-02-16 10:00                 ` Joerg Roedel
2021-02-16 14:27                   ` Andi Kleen
2021-02-16 14:46                     ` Peter Zijlstra
2021-02-16 15:59                       ` Paolo Bonzini
2021-02-16 16:25                         ` Joerg Roedel
2021-02-16 16:48                           ` Paolo Bonzini
2021-02-16 18:26                             ` Joerg Roedel
2021-02-16 18:33                               ` Paolo Bonzini
2021-02-16 16:47                         ` Peter Zijlstra
2021-02-16 16:57                         ` Andy Lutomirski
2021-02-16 17:05                           ` Paolo Bonzini
2021-02-16 16:55                       ` Andi Kleen
2021-02-12 21:42             ` Andi Kleen
2021-02-12 21:58               ` Peter Zijlstra
2021-02-12 22:39                 ` Andi Kleen
2021-02-12 22:46                   ` Andy Lutomirski
2021-02-13  9:38                   ` Peter Zijlstra
2021-02-12 23:51                 ` Paolo Bonzini
2021-03-23  9:33 ` Joerg Roedel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=961a2736-9bc9-43e1-1e75-6d373fe9590b@amd.com \
    --to=brijesh.singh@amd.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=hch@lst.de \
    --cc=jon.grimm@amd.com \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox