From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52394C07E95 for ; Tue, 20 Jul 2021 01:51:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C9CF0610D2 for ; Tue, 20 Jul 2021 01:51:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C9CF0610D2 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 00B118D0002; Mon, 19 Jul 2021 21:51:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EFD788D0001; Mon, 19 Jul 2021 21:51:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D77378D0002; Mon, 19 Jul 2021 21:51:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id B0B6B8D0001 for ; Mon, 19 Jul 2021 21:51:35 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3900C184C0D75 for ; Tue, 20 Jul 2021 01:51:34 +0000 (UTC) X-FDA: 78381289308.27.AC50026 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf08.hostedemail.com (Postfix) with ESMTP id E41DD30000BC for ; Tue, 20 Jul 2021 01:51:33 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id gp5-20020a17090adf05b0290175c085e7a5so1029101pjb.0 for ; Mon, 19 Jul 2021 18:51:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=a60zUi4JatDe+1IUK7YZbb7gPbrBkf8bKInwnOKBQgo=; b=T/zuk8l7Nv80r0u0DLfK2NlYYPgiUBchI6Gj9itD+nLtf+iBKO2bMtVoZ7Kzt2qvlb mmexpgjsa488CsTFdb1CasqNuq1zX5n+YdGkxhEypRmKWfk2lupRef3WsSeaxaTvQCPs pNy4FWu1crL9j0DHpV1Yy8Oat4pSacNvm/DPBzsNK2875Nb2uk50BUo21/y1VmaJeSzV 9g5OErt65bFhXR167fMmOxPM8TLkRZ5odWIEz+/RgaBdQ6P0YoAzmz6l27ZPEU0PrK4F uNlS1Ea/ik+ci8iqXsH2GLBUfPfWMjgLxJANm2UywzK5nW+1VPpHG1GiqvimYbzoKP2Y hi9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=a60zUi4JatDe+1IUK7YZbb7gPbrBkf8bKInwnOKBQgo=; b=ICGfEn7b3pP/G7K+avNrjW5A7houx8nru0hN3qoW3aVtpLhwSKqk9HoRe1AWpKZ+eO dOf/Wl1LW0nss6IO8ZGI3VOpQA5Ecf5R3P+thYPvevdOHCOjn1LGFC8CGgUvl0GiHKnZ BIz0chdrMgdE0wnTeBXNk89hmqoSTUz0GKYPh4BoWTox+sNzg0Ajl3K8xYZJT58AGfsA aisHOQAUQtVHfVPE/nvhAGJJ4Cz0ckkLVCJVGafrJYOqEvvYyrPZGYWGo76K4QTiau/6 5lfE+9BJYiEHsiUAyftXr0ARllu7tabcGMkboYGmb7yZ0TY7KW8vLpYDRqbOQ1CMYJ6E QA5g== X-Gm-Message-State: AOAM533XqMMWqqN8lmG/WQn8rhUgZPFkVRkoDyHk/PA646MmRIO8NKWk LOx5HCnmDht6O+dD3obNdnx8VLPfog3siEFVL7JVXw== X-Google-Smtp-Source: ABdhPJxhveUE6aWD6zm3fLHl2MB5a04h0BqCyfZjv0guqC75x5N334N+7HaR7FOjX11gcjboWkLH+hNdgaNmypIt86o= X-Received: by 2002:a17:90a:43c3:: with SMTP id r61mr33578484pjg.11.1626745892595; Mon, 19 Jul 2021 18:51:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Erdem Aktas Date: Mon, 19 Jul 2021 18:51:21 -0700 Message-ID: Subject: Re: Runtime Memory Validation in Intel-TDX and AMD-SNP To: Andy Lutomirski Cc: Joerg Roedel , David Rientjes , Borislav Petkov , Sean Christopherson , Andrew Morton , Vlastimil Babka , "Kirill A. Shutemov" , Andi Kleen , Brijesh Singh , Tom Lendacky , Jon Grimm , Thomas Gleixner , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , "Kaplan, David" , Varad Gautam , Dario Faggioli , x86 , linux-mm@kvack.org, linux-coco@lists.linux.dev Content-Type: multipart/alternative; boundary="000000000000125d1405c7844be8" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E41DD30000BC X-Stat-Signature: hsnyu971pgur19zgte8bt65h4s6g7qb6 Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b="T/zuk8l7"; spf=pass (imf08.hostedemail.com: domain of erdemaktas@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=erdemaktas@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1626745893-603081 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000125d1405c7844be8 Content-Type: text/plain; charset="UTF-8" With the new UEFI memory type, option 2 seems like a better option to me. I was thinking with the lack of new UEFI memory type support yet, option 3 can be implemented as a temporary solution. IMO, this is crucial for a reasonable boot performance. > There's one exception to this, which is the previous memory view in > crash kernels. But that's an relatively obscure case and there might be > other solutions for this. I think this is an important angle. It might cause reliability issues. if kexec kernel does not know which page is shared or private, it can use a previously shared page as a code page which will not work. It is also a security concern. Hosts can always cause crashes which forces guests to do kexec for crash dump. If the kexec kernel does not know which pages are validated before, it might be compromised with page replay attacks. Also kexec is not only for crash dumps. For warm resets, kexec kernel needs to know the valid page map. >> Also in general i don't think it will really happen, at least initially. >> All the shared buffers we use are allocated and never freed. So such a >> problem could be deferred. Does it not depend on kernel configs? Currently, there is a valid control path in dma_alloc_coherent which might alloc and free shared pages. >> At the risk of asking a potentially silly question, would it be >> reasonable to treat non-validated memory as not-present for kernel >> purposes and hot-add it in a thread as it gets validated? My concern with this is, it assumes that all the present memory is private. UEFI might have some pages which are shared therefore also are present. -Erdem On Mon, Jul 19, 2021 at 5:26 PM Andy Lutomirski wrote: > On 7/19/21 5:58 AM, Joerg Roedel wrote: > > > Memory Validation through the Boot Process and in the Running System > > -------------------------------------------------------------------- > > > > The memory is validated throughout the boot process as described below. > > These steps assume a firmware is present, but this proposal does not > > strictly require a firmware. The tasks done be the firmware can also be > > done by the hypervisor before starting the guest. The steps are: > > > > 1. The firmware validates all memory which will not be owned by > > the boot loader or the OS. > > > > 2. The firmware also validates the first X MB of memory, just > > enough to run a boot loader and to load the compressed Linux > > kernel image. X is not expected to be very large, 64 or 128 > > MB should be enough. This pre-validation should not cause > > significant delays in the boot process. > > > > 3. The validated memory is marked E820-Usable in struct > > boot_params for the Linux decompressor. The rest of the > > memory is also passed to Linux via new special E820 entries > > which mark the memory as Usable-but-Invalid. > > > > 4. When the Linux decompressor takes over control, it evaluates > > the E820 table and calculates to total amount of memory > > available to Linux (valid and invalid memory). > > > > The decompressor allocates a physically contiguous data > > structure at a random memory location which is big enough to > > hold the the validation states of all 4kb pages available to > > the guest. This data structure will be called the Validation > > Bitmap through the rest of this document. The Validation > > Bitmap is indexed by page frame numbers. > > At the risk of asking a potentially silly question, would it be > reasonable to treat non-validated memory as not-present for kernel > purposes and hot-add it in a thread as it gets validated? Or would this > result in poor system behavior before enough memory is validated? > Perhaps we should block instead of failing allocations if we want more > memory than is currently validated? > > --Andy > > --000000000000125d1405c7844be8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
With the new UEFI memory type, option 2 seems like a bette= r option to me.

I was thinking with the lack of new UEFI= memory type support yet, option 3 can be implemented as a temporary soluti= on. IMO, this is crucial=C2=A0for a reasonable=C2=A0boot performance.=C2=A0=

> There's one exception to this, which is = the previous memory view in
> crash kernels. But that's an rela= tively obscure case and there might be
> other solutions for this.
I think this is an important angle. It might cause reliabi= lity issues. if kexec kernel does not know which page is shared or private,= it can use a previously shared page as a code page which will not work. It= is also a security concern. Hosts can always cause crashes which forces gu= ests to do kexec for crash dump. If the kexec kernel does not know which pa= ges are validated before, it might be compromised with page replay attacks.=

Also kexec is not only for crash dumps. For warm = resets, kexec kernel needs to know the valid page map.

=
>> Also in general i don't think it will really happen, at l= east initially.
>> All the shared buffers we use are allocated a= nd never freed. So such a
>> problem could be deferred.

Does it not depend on kernel configs? Currently, there is a valid = control path in dma_alloc_coherent which might alloc and free shared pages.=

>> At the risk of asking a potentially sill= y question, would it be
>> reasonable to treat non-validated mem= ory as not-present for kernel
>> purposes and hot-add it in a thre= ad as it gets validated?=C2=A0

My concern with this is, = it assumes that all the present memory is private. UEFI might have some pag= es which are shared therefore also are present.=C2=A0

<= div>-Erdem

On Mon, Jul 19, 2021 at 5:26 PM Andy Lutomirski <luto@kernel.org> wrote:
On 7/19/21 5:58 AM, Joerg Roedel= wrote:

> Memory Validation through the Boot Process and in the Running System > -------------------------------------------------------------------- >
> The memory is validated throughout the boot process as described below= .
> These steps assume a firmware is present, but this proposal does not > strictly require a firmware. The tasks done be the firmware can also b= e
> done by the hypervisor before starting the guest. The steps are:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A01. The firmware validates all memory which w= ill not be owned by
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 the boot loader or the OS.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A02. The firmware also validates the first X M= B of memory, just
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 enough to run a boot loader and to l= oad the compressed Linux
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 kernel image. X is not expected to b= e very large, 64 or 128
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 MB should be enough. This pre-valida= tion should not cause
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 significant delays in the boot proce= ss.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A03. The validated memory is marked E820-Usabl= e in struct
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 boot_params for the Linux decompress= or. The rest of the
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 memory is also passed to Linux via n= ew special E820 entries
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 which mark the memory as Usable-but-= Invalid.
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A04. When the Linux decompressor takes over co= ntrol, it evaluates
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 the E820 table and calculates to tot= al amount of memory
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 available to Linux (valid and invali= d memory).
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 The decompressor allocates a physica= lly contiguous data
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 structure at a random memory locatio= n which is big enough to
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 hold the the validation states of al= l 4kb pages available to
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 the guest. This data structure will = be called the Validation
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Bitmap through the rest of this docu= ment. The Validation
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 Bitmap is indexed by page frame numb= ers.

At the risk of asking a potentially silly question, would it be
reasonable to treat non-validated memory as not-present for kernel
purposes and hot-add it in a thread as it gets validated?=C2=A0 Or would th= is
result in poor system behavior before enough memory is validated?
Perhaps we should block instead of failing allocations if we want more
memory than is currently validated?

--Andy

--000000000000125d1405c7844be8--