From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE105C4828F for ; Fri, 9 Feb 2024 14:28:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25D916B0081; Fri, 9 Feb 2024 09:28:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20D556B0083; Fri, 9 Feb 2024 09:28:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0D64F6B0085; Fri, 9 Feb 2024 09:28:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id F26C86B0081 for ; Fri, 9 Feb 2024 09:28:08 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id B4C81140682 for ; Fri, 9 Feb 2024 14:28:08 +0000 (UTC) X-FDA: 81772495056.25.011DFA5 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf08.hostedemail.com (Postfix) with ESMTP id E76B0160013 for ; Fri, 9 Feb 2024 14:28:06 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4RTbXHDv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3dTbGZQYKCDAeQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3dTbGZQYKCDAeQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707488887; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uCfsv6GrpUFnVCwcL3J5Wem8yNPUxVMsIIYMkBzD0b0=; b=iZNz0tyZ7ZkjFyhSd5y5Ni7lhFPuyKnL+5w5CA/bXIbFkiIZLvfe32RJ9rKI6ypI0U63jB cUntqo13GKWmqYn8V0s/6YceP4ERtdp/Oa6OvnJT1FZRH4fEWTY8Yzm7wZhmc5QfFisfrz h6S9VQha/eKNapG5UKJ7YqShAgW7HR8= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=4RTbXHDv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf08.hostedemail.com: domain of 3dTbGZQYKCDAeQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3dTbGZQYKCDAeQMZVOSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--seanjc.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707488887; a=rsa-sha256; cv=none; b=G/H2KBv5lT/9PlHq/Q6tEviTrDJInpkNwKCX7M+kZibso5yEZoFDVfGXlfom+7ettmv1Cg +A0Dw4ztmrTnSpCbPeqHTs/3t+arX7Jdq52WUrd0d5jGe2s5PmsgMYXwm86JE2IplsT4hH 2n4MpRBuFxE8iF9n5KcrHbo/XURBsM4= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-60492d6bfc0so17658537b3.2 for ; Fri, 09 Feb 2024 06:28:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707488886; x=1708093686; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uCfsv6GrpUFnVCwcL3J5Wem8yNPUxVMsIIYMkBzD0b0=; b=4RTbXHDvpXMxhWKFIYtvRfHqXJ/Y2CjJPugehtKTcJl4ClWGey4I14B6nLp50Gd9S/ kZnUdXlv26A4TBUN650Q9W2tq+sLzovk8trnjowxorEbbK8rVq242oPZLduUn9w8JDTh eGKIrY4eevboUGrEbaJtrrVpR2e0x2WEadU+C8YUmhNuy/t8AgK60o7oqA3oTDPga6l0 Y/jz866mRZEmRWDiC0mxVcabFZ0wtb3DJNXhLz2tFZfdWIwN5sAGfxlTafm1NmL3iEtM N0139CrxHI53HOk2xJRCWhuZjrx8km+P/7sM3STwRO0Uyn4v+yrvvb8NYg2mnmkClkxl 5oVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707488886; x=1708093686; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uCfsv6GrpUFnVCwcL3J5Wem8yNPUxVMsIIYMkBzD0b0=; b=YmCcxvNXjYbcHD7Ong1OgP/L6Cg4ict30ICJSPEiyQ0DOhrOZckcNZvuEmSEEtAifH LJVObs7uMV/0cN6Jky8VfcGuDjwQeTu2EVA7VYrq/EmVjXCBmLb4U4lXaT7cdBSG1Bif eDQgsMqleaUdGNQGX5/ZWKRlrACkVUo5j5Krhivvx6m1UE2ClzhC1mkReAojMsUoxxaY 7DOdsS9UEROEnCUkyByY+xMmG8fDagsmEfagE+VPqb7zM2HLNocb3da8+Z9oaeKswrSC N149+ebwD2EwRpdiVtTzODIJ5bd3k5RPYiRlc273cjweS6e+dOmetIAckWyVy6A5+DiK zNGA== X-Gm-Message-State: AOJu0Yymx5bv3bViKyEYcJhLclyiKPCQJob7+cLvW3HjaIBepzCzLP6P sJs5bNmgVueQcJYY0YwwOKOLBSAP9Oww9rOv6MykokGVJYaQXuWyw0x3+HJfpFCNRukJMnH/Vl4 TAw== X-Google-Smtp-Source: AGHT+IHy+djrwH9lBwMcfClT+8pK1QZMj2IcA9F/oSlT9qX98kneABwsfBtcWfL+f67hdIOBZn1ACZ3Dunw= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:6d82:0:b0:5ff:a885:65b with SMTP id i124-20020a816d82000000b005ffa885065bmr252030ywc.10.1707488885935; Fri, 09 Feb 2024 06:28:05 -0800 (PST) Date: Fri, 9 Feb 2024 06:28:04 -0800 In-Reply-To: Mime-Version: 1.0 References: <20231016115028.996656-1-michael.roth@amd.com> <20231016115028.996656-5-michael.roth@amd.com> Message-ID: Subject: Re: [PATCH RFC gmem v1 4/8] KVM: x86: Add gmem hook for invalidating memory From: Sean Christopherson To: Steven Price Cc: Michael Roth , kvm@vger.kernel.org, Suzuki K Poulose , "tabba@google.com" , linux-coco@lists.linux.dev, linux-mm@kvack.org, linux-crypto@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, pbonzini@redhat.com, isaku.yamahata@intel.com, ackerleytng@google.com, vbabka@suse.cz, ashish.kalra@amd.com, nikunj.dadhania@amd.com, jroedel@suse.de, pankaj.gupta@amd.com Content-Type: text/plain; charset="us-ascii" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: E76B0160013 X-Stat-Signature: 3184aucpeniuaeqtf9gd4zmmpnrg733a X-Rspam-User: X-HE-Tag: 1707488886-115724 X-HE-Meta: U2FsdGVkX1/oX9qq7eT2diAvYAFjCiHO8yQ+4Rw4R31e39mqG5qQYlcPmlBmhq9NaoEZtByQn2ZvKsMSoS8n5M7aY47yeMh2ZsO6DgOLfCdi3Gbrwo23pqHXi8UIRH6pCdaucJLDkoqVfwjzNJnvcCJePn7pPSWCvRX2PPjeoR33zsKWAaJop4e+Xys8WGyyDJz2X2xzweT8QfsbRK6kcY+UusKDkXpKRwPQEYajJzJqHw6SvI7z/zqo+x3bUp+yvhxkVAFh5In2cwX5BVINYlrZh+accfzevIjNWBnWpilatvZX18bGcW53/4q+SJAvy39GrNWjPJl+L7e9hu8VxtUveKhJa7jicFC03fFUZkDEry58bqJQSiJ8bBj7bhl/K/118oJPJn9DoYfas+z7C7PCAv1dJ2yvJlzVLgFA2rJHRPCfo9I6WyGaJbPo7ZfY+rXNDkxBRLMxPna40Ods1fiwH+c3220sxv8NmKT29TlUHgpbMtdWIqZlMUYUTAH9N3iDw+w0txx1BSOPCUV9hDcgdDbIC+FR90S1MhU9ZESwccZWxPEEl04reEQXvjqHo+cLRDmQWsm6g/mnQhYMUdXsC3WCJEaVTJLAYpw2RZD4anp2xxUdYdkYcjxnhuMSVxT+9SfzDhkRAaUk2ksxKCDXFed8ZoU1pS/16GFN85Qr55V5QeryTfRIqec9B69H0ulMvICOHiQ6WJbvWuFrkauexWBlXDeNsEc10lger34xb/IPDYhF8288OJB3/yVW9JsTQPT9n200Cn6hLtgPZPF08lDJVf/11gMHrkDUKEaL5nkl2iLgd3AnuiqddTFlOrbTxnkY/ArUXKUqmEFhYnHgGRCkJ9LiMayeUh4wf4PILrXVhzX/0wXfrEv3+VFbbPyoB4yMSDgtteZb/IOmK/xEjOty3xaFJN5RODiZBTAO3Hbiov7OzMA/gaGexULO2EVFYIVyocnT+vCMrAU 0icUJgVk v6C9ETBrunggwkkl7ufHjnv4LUYxP7yJ4lWw14lBgd0y77O/qIUYc0jGmNFXFQddIGh8NR7i/f0xihzkR7Z2gCe8Uo6kGDmatYS+W1yjDDjhJ9YG3V1SUay9z2/QtqkO3fMMTqKDbY22LT2E5rwbitap9wgWZW1C/mNM00tJ9CQWq+bVPp83OZJEo30V4Z0xL0DYZVF/82pVHs79b1CGfxDge5fzmxAkRV/kRaEt8Sm+Wr7Lgp81cbltZ8AlnXsYC7895p5EQRfkI3J8YzP+egMRUDR1jf50Zhmm5IsQPxaQ8Rm9wJImVAdGaRiaNNlurX+3zkzFDwbaHQS54LJaW9quxD1J2Dt5loYbsowVTEtsbSJ8ZF86Pk/3KRNU/j/khXUqmQcdRkHXvOEZpXPoRBmKm90wCepP+PwovBg/xpdvkKCMS/vXAvriuRaY31KAdCYd3rqs+anzd6+1Gx9lGbKbowbykCsZT5WaQtThmuYB2CJfFBLPf+vcEwnsX7UP+NlZ8sZv6cw9cGubGbfhQIJ2/i3Y9eFXMyPisihQrGxHooaJO4V6h12HRasuWs8MfswetVjczuCHVI/KA8P8hWU9vl3pG7NTFxKgoX5DjeSbGCslOJZsahQeWxhf9U/MOATgaQ/wCjmcGoyTddBNOr3bbR6nCoUojXv7r04M1OXgNoH99UNxWr9nllYpbCYRZVO8Y6WOR57oaU3WcIo+dcxAJceud0nfbWhPFXlX8Rlgg7Cs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Feb 09, 2024, Steven Price wrote: > On 16/10/2023 12:50, Michael Roth wrote: > > In some cases, like with SEV-SNP, guest memory needs to be updated in a > > platform-specific manner before it can be safely freed back to the host. > > Wire up arch-defined hooks to the .free_folio kvm_gmem_aops callback to > > allow for special handling of this sort when freeing memory in response > > to FALLOC_FL_PUNCH_HOLE operations and when releasing the inode, and go > > ahead and define an arch-specific hook for x86 since it will be needed > > for handling memory used for SEV-SNP guests. > > Hi all, > > Arm CCA has a similar need to prepare/unprepare memory (granule > delegate/undelegate using our terminology) before it is used for > protected memory. > > However I see a problem with the current gmem implementation that the > "invalidations" are not precise enough for our RMI API. When punching a > hole in the memfd the code currently hits the same path (ending in > kvm_unmap_gfn_range()) as if a VMA is modified in the same range (for > the shared version). > > The Arm CCA architecture doesn't allow the protected memory to be removed and > refaulted without the permission of the guest (the memory contents would be > wiped in this case). TDX behaves almost exactly like CCA. Well, that's not technically true, strictly speaking, as there are TDX APIs that do allow for *temporarily* marking mappings !PRESENT, but those aren't in play for invalidation events like this. SNP does allow zapping page table mappings, but fully removing a page, as PUNCH_HOLE would do, is destructive, so SNP also behaves the same way for all intents and purposes. > One option that I've considered is to implement a seperate CCA ioctl to > notify KVM whether the memory should be mapped protected. That's what KVM_SET_MEMORY_ATTRIBUTES+KVM_MEMORY_ATTRIBUTE_PRIVATE is for, no? > The invalidations would then be ignored on ranges that are currently > protected for this guest. That's backwards. Invalidations on a guest_memfd should affect only *protected* mappings. And for that, the plan/proposal is to plumb only_{shared,private} flags into "struct kvm_gfn_range"[1] so that guest_memfd invalidations don't zap shared mappings, and mmu_notifier invalidation don't zap private mappings. Sample usage in the TDX context[2] (disclaimer, I'm pretty sure I didn't write most of that patch despite, I only provided a rough sketch). [1] https://lore.kernel.org/all/20231027182217.3615211-13-seanjc@google.com [2] https://lore.kernel.org/all/0b308fb6dd52bafe7153086c7f54bfad03da74b1.1705965635.git.isaku.yamahata@intel.com > This 'solves' the problem nicely except for the case where the VMM > deliberately punches holes in memory which the guest is using. I don't see what problem there is to solve in this case. PUNCH_HOLE is destructive, so don't do that. > The issue in this case is that there's no way of failing the punch hole > operation - we can detect that the memory is in use and shouldn't be > freed, but this callback doesn't give the opportunity to actually block > the freeing of the memory. Why is this KVM's problem? E.g. the same exact thing happens without guest_memfd if userspace munmap()s memory the guest is using. > Sadly there's no easy way to map from a physical page in a gmem back to > which VM (and where in the VM) the page is mapped. So actually ripping > the page out of the appropriate VM isn't really possible in this case. I don't follow. guest_memfd has a 1:1 binding with a VM *and* a gfn, how can you not know what exactly needs to be invalidated? > How is this situation handled on x86? Is it possible to invalidate and > then refault a protected page without affecting the memory contents? My > guess is yes and that is a CCA specific problem - is my understanding > correct? > > My current thoughts for CCA are one of three options: > > 1. Represent shared and protected memory as two separate memslots. This > matches the underlying architecture more closely (the top address bit is > repurposed as a 'shared' flag), but I don't like it because it's a > deviation from other CoCo architectures (notably pKVM). > > 2. Allow punch-hole to fail on CCA if the memory is mapped into the > guest's protected space. Again, this is CCA being different and also > creates nasty corner cases where the gmem descriptor could have to > outlive the VMM - so looks like a potential source of memory leaks. > > 3. 'Fix' the invalidation to provide more precise semantics. I haven't > yet prototyped it but it might be possible to simply provide a flag from > kvm_gmem_invalidate_begin specifying that the invalidation is for the > protected memory. KVM would then only unmap the protected memory when > this flag is set (avoiding issues with VMA updates causing spurious unmaps). > > Fairly obviously (3) is my preferred option, but it relies on the > guarantees that the "invalidation" is actually a precise set of > addresses where the memory is actually being freed. #3 is what we are planning for x86, and except for the only_{shared,private} flags, the requisite functionality should already be in Linus' tree, though it does need to be wired up for ARM.