From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MSGID_FROM_MTA_HEADER,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7D83C433DB for ; Wed, 3 Feb 2021 00:17:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F09AC64F66 for ; Wed, 3 Feb 2021 00:17:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F09AC64F66 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=amd.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 184E76B0005; Tue, 2 Feb 2021 19:17:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1331D6B006C; Tue, 2 Feb 2021 19:17:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F3D036B006E; Tue, 2 Feb 2021 19:17:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0044.hostedemail.com [216.40.44.44]) by kanga.kvack.org (Postfix) with ESMTP id DD3416B0005 for ; Tue, 2 Feb 2021 19:17:10 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9FE873631 for ; Wed, 3 Feb 2021 00:17:10 +0000 (UTC) X-FDA: 77775041820.19.duck57_070b532275cf Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id 7201A1AD1B4 for ; Wed, 3 Feb 2021 00:17:10 +0000 (UTC) X-HE-Tag: duck57_070b532275cf X-Filterd-Recvd-Size: 14866 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2061.outbound.protection.outlook.com [40.107.237.61]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Wed, 3 Feb 2021 00:17:09 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=eumFqBL2dIkjm5C+bPucxFeH8SUW53CPcoqxT0ZNSjUIdC3RAQLmC4ZtELLQ1JDLXsNoZJNlHWHqOdBne4UbTrvvqHM83VEZ6aMBOi0ZU+8u+VFEG+tGmVR6mEDHasBcMhe57Xyy74LWgYbxoMsdTxT+CRmHcu8omHgxCZN55TKaeZnwQwIlxbgWU3Fm7N9LFdb5u46mjJTxqkeWTpnkJjx4LqhSahgPm8mDN+ziefQB0MAt4zB2fILdEnDaEDJlNCnG87jfkxl0FEcvT9KK7dztE7H+h957I/0dpkI0oIMD8DGk404/Czm/7fLhCX4yzhzQaYlQhbItVd8opAKYYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4DlkAtx96W75vsFXFEYmIynUUxquabm0tiCHXrggY4A=; b=P8JD1I+IhuIryLvWX/y3PSRBJXV9Y3H9PiOk49zl7mJUQPd3m9WNJci2OAHN3ZHNrPlG/xUVb9H2vb7nxm9FVt32adTuJKpdDLWXbJ1HO30f1scGTZ/qJY2KTonPSFwHg3oNmUMoDKNLg+rkeD/fw1hVW0T20IgZBI8v1qB1YrQwqHAKaeGOvSu/mSwfD777PO5rITDKiLXJd2geX4PMTnTXOmVaplvM1cMFa6RKQvR3py9aI1RX739hH7NePEYwuLWkM5xHreRvzAWtgm2opElsNn2gP+Fiu9ozoNehITfT9R7W0ypXPXXasdWb4cwg4Uo34/lJWYMnSQQ86X6n9g== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=amd.com; dmarc=pass action=none header.from=amd.com; dkim=pass header.d=amd.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=4DlkAtx96W75vsFXFEYmIynUUxquabm0tiCHXrggY4A=; b=GlXCSEXO2Ly4ZvNSSVt8tDpfnqZ4RojsMFdThn0iaKA0DNIjSCNfJ7/LpFGJuSY4eKiP+Y0E2zUuVDbqvYxevMLYoUVNtQEPeP3kz9bE1MF+362MW1a6jfR6HzHjplX9seXsyqU44uMAc5qTerCczjQ/9ndEpgWm9iv34jNQ1/g= Authentication-Results: kvack.org; dkim=none (message not signed) header.d=none;kvack.org; dmarc=none action=none header.from=amd.com; Received: from SN6PR12MB2718.namprd12.prod.outlook.com (2603:10b6:805:6f::22) by SN1PR12MB2365.namprd12.prod.outlook.com (2603:10b6:802:2e::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3805.23; Wed, 3 Feb 2021 00:17:04 +0000 Received: from SN6PR12MB2718.namprd12.prod.outlook.com ([fe80::30fb:2d6c:a0bf:2f1d]) by SN6PR12MB2718.namprd12.prod.outlook.com ([fe80::30fb:2d6c:a0bf:2f1d%3]) with mapi id 15.20.3805.025; Wed, 3 Feb 2021 00:17:04 +0000 Cc: brijesh.singh@amd.com, Borislav Petkov , Andy Lutomirski , Sean Christopherson , Andrew Morton , Andi Kleen , Tom Lendacky , Jon Grimm , Thomas Gleixner , Christoph Hellwig , Peter Zijlstra , Paolo Bonzini , Ingo Molnar , Joerg Roedel , x86@kernel.org, linux-mm@kvack.org Subject: Re: AMD SEV-SNP/Intel TDX: validation of memory pages To: "Kirill A. Shutemov" , David Rientjes References: <7515a81a-19e-b063-2081-3f5e79f0f7a8@google.com> <20210202160205.3wfchtibq2sd7pe5@black.fi.intel.com> From: Brijesh Singh Message-ID: <961a2736-9bc9-43e1-1e75-6d373fe9590b@amd.com> Date: Tue, 2 Feb 2021 18:16:41 -0600 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 In-Reply-To: <20210202160205.3wfchtibq2sd7pe5@black.fi.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Originating-IP: [70.112.153.56] X-ClientProxiedBy: SN7PR04CA0155.namprd04.prod.outlook.com (2603:10b6:806:125::10) To SN6PR12MB2718.namprd12.prod.outlook.com (2603:10b6:805:6f::22) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from Brijeshs-MacBook-Pro.local (70.112.153.56) by SN7PR04CA0155.namprd04.prod.outlook.com (2603:10b6:806:125::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3825.17 via Frontend Transport; Wed, 3 Feb 2021 00:17:03 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 3a81075c-3e14-44cf-040b-08d8c7d908dc X-MS-TrafficTypeDiagnostic: SN1PR12MB2365: X-MS-Exchange-Transport-Forked: True X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: TSPkYnkb0dq6U+Arv+nk38fNeLcGwenoG0S969ksWVctkdUAuuccS89bkR0jCaeip2eabiw9Xz8xuv+HYWBYmHl4/E1WNxk3kIpva2ZWogSMxZ5mhXKhi+miF+h0p3XMX0MZNwJ9qL6OxJydUL4ZEqnTBK8Lo4a8OHJWNR1Jq8aZ/Dn2/7U08vzoJhozVE2OFUw++cI9uVOqjTCXmDDITG8Wv/sQrHXFXqaxSw7XItcXwGsVce+WxJN9UGZqWWKcEMDcoUM6wuv/dvdpcrOOENGPs/yUEcBM+pDbmPuzMCOOGUgVNCXc7hi5gnURKR66pdOCQSO8DvgwdbAkSaJlubz9sI1Oz775+qqRVIhabqlyh94l44rue1ZiY1j+vv5yHRCTakRXjAw0GyYDHSJHyZAtgRq1nBgSsnaXcAzJMSWtNov23stJDRSa0/TS8Pu+AsNjzvYI5xwNvuewneiLyYdptanIBMEhbhkDYMq3bdb1k7lX3halzp1JjM82XPKXgDQC57R1vQBjM+J65HDdR8yexe+QY7M+mG6C3Ia4lduoMEBowv3Iqf0YrNAUR0XMcSKl+BAJDzMMd4zdHBmcgSC9v6fjW7Nj1xEWU9YKKMs= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SN6PR12MB2718.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(4636009)(136003)(39860400002)(366004)(376002)(396003)(346002)(36756003)(31686004)(7416002)(2906002)(8676002)(8936002)(6512007)(6666004)(6486002)(31696002)(44832011)(316002)(4326008)(110136005)(6506007)(956004)(66946007)(26005)(5660300002)(2616005)(86362001)(478600001)(53546011)(186003)(52116002)(83380400001)(66556008)(16526019)(66476007)(54906003)(43740500002)(45980500001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?TE5KbldaMW0rM0c5dld0eWNPWk5Da1lidC9RQUVONHBNL1lPdDlkZi9QVE5N?= =?utf-8?B?alQ1VUlLZWMwdVdHalIwSXVmdXA1MmpWMDZhTFlDNW9BemZaWjFuaGNETDBU?= =?utf-8?B?MWd2NS8wbmxyaFBYV0RuOHBHY1hvc05yM25GdUVDRCt1Z0Y4Y2VlOVRUWHBp?= =?utf-8?B?U29PeHFtVFYxUEVrY2I4VlVYdGc1V05xSkQvMUdDSUdObDBjTTFpRVQ0RXlM?= =?utf-8?B?YlZIbllHWEN0RDQ4eGFSVDJoaUVwTXR6NWdhaklFRE1Cd0w5R29WSnoxMWxj?= =?utf-8?B?L2phai9KdU1RSU9KOXV6Vk52ZndWNlBpN05hbVlHMFJOSkY3Wmd3RkUzc0NH?= =?utf-8?B?UFdaRTRTYnY3bFRHR3FsUzYwVG9lM00xUWxHZXZFM3lNY1lBK2V1aXdTcEpp?= =?utf-8?B?TWdlU3EvdXVSdXZDcmxDWlEvZDB2NThHdURLalNFaHZLN1pKUng4cmhmRFgw?= =?utf-8?B?N2I4RE1wTVZUN3BDSTEzZlY3blFZUzBsRmFKdWpoYlJnZXlIczI5Qk91SHZn?= =?utf-8?B?OEk1QWJpalpsMFM1Y1hDT2s3dEkxVVJmN1R3UEdYNTlESk9SWW9mN2JqK0dj?= =?utf-8?B?K2hRcEJZOG55WnVwcVdzbGkrSTJVRTIwdkFNanVmekdPc2JNak51d2duTkhw?= =?utf-8?B?REZXZTdVTldTUVlOYWJzd1dUb0FRT0tyd3E4cytXVURGTmo1MURmU3A5eGVO?= =?utf-8?B?aU9HS3ZUdm04NFhyWGZkbnl5bW5tTTlPb3VZSEZUS1RTam5saXorVmwycDRJ?= =?utf-8?B?REQ2eEEzSlVSRnJxc21QRE1CS2JHckc4ZGNOL3lDY1BnSnhXNlNPVFVzUWdY?= =?utf-8?B?R2EyOFBacU00ZUk2WVdtQS95VEZVUEhZdjlGTmRGV3RXbGhDcHFpL0JwYkJJ?= =?utf-8?B?WFlLdld4MVoyUjJYSGZ6SmsvTDVMWEEwRHJCY2R3aHFmZW55SDI2aUtTN0xx?= =?utf-8?B?TVB3R0k5dHlXZi9RRVFUUzFhTHBWcUdOeklpT0ZaZ3VhZU5DeWlRRGlMdFBq?= =?utf-8?B?cDZZWlN6cFoyWGprY3FRQ0pyb3poaDVtWGNPNnJ2amxsbUhCMUVFVGJpdloz?= =?utf-8?B?dXVTN1c3Wllla3RBTlNqS3JOR3dxWEtyeFpsY3hUS2EvV3d1MC9GaUdXdmh0?= =?utf-8?B?QVJPQVVOVko0Z2xpcXVoWWMyNEtkb2lOdE1VT2ZqUG1jbXVBTHpxZkF4MHdj?= =?utf-8?B?ZXpQbXVGb2x0Z0FBWWRqMk9jU1d4dStmeHFkVHRxa2hpcnEvMEFPbTNOVkht?= =?utf-8?B?SktJNzJneUtKNDUvTXJVNm9ZQ1BBLzg1MmNTYjNpUEVDcG9wamxiWkxQc25U?= =?utf-8?B?dk1nTS9mckxGZzVZdUNCcXdlVE9KemQwS2gwMU1DOU1UdTdpei9yRXlTY0lJ?= =?utf-8?B?QWYzUEVacWp6d1ZQd05FQWQ0cEdJME1vK05XWjRtbFRqMnhRTmpaeGdVMi80?= =?utf-8?Q?pIKMSlI/?= X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3a81075c-3e14-44cf-040b-08d8c7d908dc X-MS-Exchange-CrossTenant-AuthSource: SN6PR12MB2718.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Feb 2021 00:17:04.7030 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ucw+S4KN4Z/j1Sm21BcGPnUF8F5cZToxfySiqm775cMy4WfpLQu02Rb2PbCWyn4uE2Vfj6hkF0AfNT1sRThtmQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN1PR12MB2365 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/2/21 10:02 AM, Kirill A. Shutemov wrote: > On Mon, Feb 01, 2021 at 05:51:09PM -0800, David Rientjes wrote: >> Hi everybody, >> >> I'd like to kick-start the discussion on lazy validation of guest memo= ry >> for the purposes of AMD SEV-SNP and Intel TDX. >> >> Both AMD SEV-SNP and Intel TDX require validation of guest memory befo= re >> it may be used by the guest. This is needed for integrity protection = from >> a potentially malicious hypervisor or other host components. >> >> For AMD SEV-SNP, the hypervisor assigns a page to the guest using the = new >> RMPUPDATE instruction. The guest then transitions the page to a usabl= e by >> the new PVALIDATE instruction[1]. This sets the Validated flag in the >> Reverse Map Table (RMP) for a guest addressable page, which opts into >> hardware and firmware integrity protection. This may only be done by = the >> guest itself and until that time, the guest cannot access the page. >> >> The guest can only PVALIDATE memory for a gPA once; the RMP then >> guarantees for each hPA that there is only a single gPA mapping. This >> validation can either be done all up front at the time the guest is bo= oted >> or it can be done lazily at runtime on fault if the guest keeps track = of >> Valid vs Invalid pages. Because doing PVALIDATE for all guest memory = at >> boot would be extremely lengthy, I'd like to discuss the options for d= oing >> it lazily. >> >> Similarly, for Intel TDX, the hypervisor unmaps the gPA from the share= d >> EPT and invalidates the tlb and all caches for the TD's vcpus; it then >> adds a page to the gPA address space for a TD by using the new >> TDH.MEM.PAGE.AUG call. The TDG.MEM.PAGE.ACCEPT TDCALL[2] then allows = a >> guest to accept a guest page for a gPA and initialize it using the pri= vate >> key for that TD. This may only be done by the TD itself and until tha= t >> time, the gPA cannot be used within the TD. >> >> Both AMD SEV-SNP and Intel TDX support hugepages. SEV-SNP supports 2M= B >> whereas TDX has accept TDCALL support for 2MB and 1GB. >> >> I believe the UEFI ECR[3] for the unaccepted memory type to >> EFI_MEMORY_TYPE was accepted in December. This should enable the gues= t to >> learn what memory has not yet been validated (or accepted) by the firm= ware >> if all guest memory is not done completely up front. >> >> This likely requires a pre-validation of all memory that can be access= ed >> when handling a #VC (or #VE for TDX) such as IST stacks, including mem= ory >> in the x86 boot sequence that must be validated before the core mm >> subsystem is up and running to handle the lazy validation. I believe >> lazy validation can be done by the core mm after that, perhaps by >> maintaining a new "validated" bit in struct page flags. >> >> Has anybody looked into this or, even better, is anybody currently wor= king >> on this? > It's likely I'm going to do this on Intel side, but I have not looked > deeply into it. > >> I think quite invasive changes are needed for the guest to support laz= y >> validation/acceptance to core areas that lots of people on the recipie= nt >> list have strong opinions about. Some things that come to mind: >> >> - Annotations for pages that must be pre-validated in the x86 boot >> sequence, including IST stacks >> >> - Proliferation of these annotations throughout any kernel code that = can >> access memory for #VC or #VE >> >> - Handling lazy validation of guest memory through the core mm layer, >> most likely involving a bit in struct page flags to track their sta= tus >> >> - Any need for validating memory that is not backed by struct page th= at >> needs to be special-cased >> >> - Any concerns about this for the DMA layer >> >> One possibility for minimal disruption to the boot entry code is to >> require the guest BIOS to validate 4GB and below, and then leave 4GB a= nd >> above to be done lazily (the true amount of memory will actually be le= ss >> due to the MMIO hole). > [ As I didn't looked into actual code, I may say total garbage below...= ] > > Pre-validating 4GB would indeed be easiest way to go, but it's going to= be > too slow. > > The more realistic is for BIOS to pre-validate memory where kernel and > initrd are placed, plus few dozen megs for runtime. It means decompress= ion > code would need to be aware about the validation. I was thinking that BIOS validating the lower 4GB will simplify the changes to the kernel entry code path as well provide a clean approach to support kexec.=C2=A0 My initial thought is - BIOS or VMM validate lower 4GB memory. - BIOS mark the higher 4GB as unaccepted in e820/efi memmap - Kernel early boot can be achieved with minimal (or no changes) - If there is an unaccepted type discovered then allocate a bitmap that can be used to keep track of information (e.g which pages are validated). We can also explore whether removing the unaccepted flag from the memmap range will work. - On #VC/#VE, look at the bitmap to see if we need to validate the pages. To speed up, we can validate more than one page on #VC/#VE. - If we get kexec'd then rebuild the e820/memmap based on the bitmap so that we don't double validate.=C2=A0 > > The critical thing is that once memory is validate we must not validate > it again. It's possible VMM->guest attack vector. We must track precise= ly > what memory has been validated and stop the guest on detecting the > unexpected second validation request. > > It also means that we has to keep the information when the control gets > passed from decompression code to the real kernel. Page flag is no good > for this. > > My initial thought that we can use e820/efi memmap to keep track of > information -- remove the unaccepted memory flag from the range that go= t > accepted. > > The decompression code validates the memory that it's need for > decompression, modify memmap accordingly and pass control to the main > kernel. > > The main kernel may accept the memory via #VE/#VC, but ideally it need = to > stay within memory accepted by decompression code for initial boot. > > I think the bulk of memory validation can be done via existing machiner= y: > we have already deferred struct page initialization code in kernel and = I > believe we can hook up into it for the purpose. > > Any comments? >