From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F943C433E0 for ; Tue, 2 Feb 2021 01:51:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 829CC64EBF for ; Tue, 2 Feb 2021 01:51:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 829CC64EBF Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id EB7196B0005; Mon, 1 Feb 2021 20:51:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E67606B0006; Mon, 1 Feb 2021 20:51:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D7F996B006E; Mon, 1 Feb 2021 20:51:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id A861A6B0005 for ; Mon, 1 Feb 2021 20:51:13 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 74F8E180AD81A for ; Tue, 2 Feb 2021 01:51:13 +0000 (UTC) X-FDA: 77771650026.28.veil97_300dbab275c7 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 4B29D6C33 for ; Tue, 2 Feb 2021 01:51:13 +0000 (UTC) X-HE-Tag: veil97_300dbab275c7 X-Filterd-Recvd-Size: 7152 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Tue, 2 Feb 2021 01:51:12 +0000 (UTC) Received: by mail-pj1-f50.google.com with SMTP id m12so1265711pjs.4 for ; Mon, 01 Feb 2021 17:51:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:mime-version; bh=9LdCxAbPV65AvzQBR+UGg5veHOZ6w97Bm62TF8B5JiM=; b=brX3cv/vJOCa4RMqKHyzVEO9EdGCsNmmNTlttTX8GoGItt7R8ervjPle+wCnKM6Cg9 gqfZucAmCiGJK9NfO1v3x/TW5GjpgJi8q3APtJZSURFqdkr3i+AlmW+pY/gAlgfgLiSs OyAihOZgfrHgoBmboMIX/FUHh82UNJbcdsPN7mhohMjLk4aXQkHWlu+dyHO8Nj2rXefJ hySCbuUkdOL7r80ATCVnFcO8a8d346fkLxDh59VJtE5nsmkWUlE8yZdHDZXR0KEV0zFC DPkwesN3kRaj/nUWgfsXpNn8Xx9ys1bBgr6oEowPWeedAXn15s+4U+LwJVhniIKf/6Mg 3scQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:mime-version; bh=9LdCxAbPV65AvzQBR+UGg5veHOZ6w97Bm62TF8B5JiM=; b=JMCFefQ/vjZ4j2JYiZbOiGZ1BKXjy6Up13jq3PI4dzOgNfb1wVkH+jC1zPQrymgnzH cIO6a1E9b1lxeKeeNpef0sJPED6mvHhbyh5wxYIFoQl3srHSIZJ4LmYJtIIPuOf+LnAn pz++ChUiNlcB+KH+FHZ5i1XxCLTzqjhHaJ//6H3ZPZLjZs8d2OOAqj1CtqdH5O2J26Mx UPpMgZLbRWA5vs9F3bA3QyqBoWAm8LNbiEXjhvH1s0L8/LY7blzeL4AG7lziZcq+2KCM J+wgimfydGUKbubetEJzRE/UdbVjrrmEJcAM1gyJKTt72MnEDxdppIiCAlaCPsEDZSeg iwIA== X-Gm-Message-State: AOAM5318CoNpu8/RCZ4REA6YkWNT6uVCWN1KOiZzh6YWU7R0KT39aDeP QhtjzktpDtFA+377k4l/NpkM5A== X-Google-Smtp-Source: ABdhPJx4+Lqz2mqwJiBkACUnPYR7AsqVYQfZedpBA6AhHXVlgRi5ZpSZCR8vM8IqI1CYzOjzNwDOtQ== X-Received: by 2002:a17:902:8ec7:b029:e0:a02:3d26 with SMTP id x7-20020a1709028ec7b02900e00a023d26mr20265901plo.24.1612230671477; Mon, 01 Feb 2021 17:51:11 -0800 (PST) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id b65sm20369801pfg.3.2021.02.01.17.51.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 01 Feb 2021 17:51:10 -0800 (PST) Date: Mon, 1 Feb 2021 17:51:09 -0800 (PST) From: David Rientjes To: Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , "Kirill A. Shutemov" , Andi Kleen , Brijesh Singh , Tom Lendacky , Jon Grimm , Thomas Gleixner , Christoph Hellwig , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Joerg Roedel cc: x86@kernel.org, linux-mm@kvack.org Subject: AMD SEV-SNP/Intel TDX: validation of memory pages Message-ID: <7515a81a-19e-b063-2081-3f5e79f0f7a8@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi everybody, I'd like to kick-start the discussion on lazy validation of guest memory for the purposes of AMD SEV-SNP and Intel TDX. Both AMD SEV-SNP and Intel TDX require validation of guest memory before it may be used by the guest. This is needed for integrity protection from a potentially malicious hypervisor or other host components. For AMD SEV-SNP, the hypervisor assigns a page to the guest using the new RMPUPDATE instruction. The guest then transitions the page to a usable by the new PVALIDATE instruction[1]. This sets the Validated flag in the Reverse Map Table (RMP) for a guest addressable page, which opts into hardware and firmware integrity protection. This may only be done by the guest itself and until that time, the guest cannot access the page. The guest can only PVALIDATE memory for a gPA once; the RMP then guarantees for each hPA that there is only a single gPA mapping. This validation can either be done all up front at the time the guest is booted or it can be done lazily at runtime on fault if the guest keeps track of Valid vs Invalid pages. Because doing PVALIDATE for all guest memory at boot would be extremely lengthy, I'd like to discuss the options for doing it lazily. Similarly, for Intel TDX, the hypervisor unmaps the gPA from the shared EPT and invalidates the tlb and all caches for the TD's vcpus; it then adds a page to the gPA address space for a TD by using the new TDH.MEM.PAGE.AUG call. The TDG.MEM.PAGE.ACCEPT TDCALL[2] then allows a guest to accept a guest page for a gPA and initialize it using the private key for that TD. This may only be done by the TD itself and until that time, the gPA cannot be used within the TD. Both AMD SEV-SNP and Intel TDX support hugepages. SEV-SNP supports 2MB whereas TDX has accept TDCALL support for 2MB and 1GB. I believe the UEFI ECR[3] for the unaccepted memory type to EFI_MEMORY_TYPE was accepted in December. This should enable the guest to learn what memory has not yet been validated (or accepted) by the firmware if all guest memory is not done completely up front. This likely requires a pre-validation of all memory that can be accessed when handling a #VC (or #VE for TDX) such as IST stacks, including memory in the x86 boot sequence that must be validated before the core mm subsystem is up and running to handle the lazy validation. I believe lazy validation can be done by the core mm after that, perhaps by maintaining a new "validated" bit in struct page flags. Has anybody looked into this or, even better, is anybody currently working on this? I think quite invasive changes are needed for the guest to support lazy validation/acceptance to core areas that lots of people on the recipient list have strong opinions about. Some things that come to mind: - Annotations for pages that must be pre-validated in the x86 boot sequence, including IST stacks - Proliferation of these annotations throughout any kernel code that can access memory for #VC or #VE - Handling lazy validation of guest memory through the core mm layer, most likely involving a bit in struct page flags to track their status - Any need for validating memory that is not backed by struct page that needs to be special-cased - Any concerns about this for the DMA layer One possibility for minimal disruption to the boot entry code is to require the guest BIOS to validate 4GB and below, and then leave 4GB and above to be done lazily (the true amount of memory will actually be less due to the MMIO hole). Thanks! [1] https://www.amd.com/system/files/TechDocs/56860.pdf [2] https://software.intel.com/content/dam/develop/external/us/en/documents/intel-tdx-module-1eas.pdf [3] https://github.com/microsoft/mu_basecore/blob/aa16ee6518b521a7d8101c34d2884ae09ef78bce/Unaccepted%20Memory%20UEFI%20ECR.md