From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0CC7C83029 for ; Mon, 30 Jun 2025 14:44:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F2FC8D0007; Mon, 30 Jun 2025 10:44:25 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A40C8D0001; Mon, 30 Jun 2025 10:44:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 36AFA8D0007; Mon, 30 Jun 2025 10:44:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 228648D0001 for ; Mon, 30 Jun 2025 10:44:25 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E147F123A26 for ; Mon, 30 Jun 2025 14:44:24 +0000 (UTC) X-FDA: 83612337648.29.BB28AB2 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf02.hostedemail.com (Postfix) with ESMTP id 0A08F8000C for ; Mon, 30 Jun 2025 14:44:22 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a67YCtOd; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3xaJiaAsKCN4ACKERLEYTNGGOOGLE.COMLINUX-MMKVACK.ORG@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3xaJiaAsKCN4ACKERLEYTNGGOOGLE.COMLINUX-MMKVACK.ORG@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751294663; a=rsa-sha256; cv=none; b=1EMUDSPy8lJjW+DvwylNKYdULGcdtTWHcq7EW5VGW0J0ekkBXHRwGlYRA3FIoK/YcqPZdr 6K1i+wTpGG5pU7PLD2fH6aG278yDzy44pi0qMdCw5BOFYi9L0zQwH61ioNi/G170qW412N UhAF4SnG0vEhnjKuCu/0yeGMLxP9pvg= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=a67YCtOd; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3xaJiaAsKCN4ACKERLEYTNGGOOGLE.COMLINUX-MMKVACK.ORG@flex--ackerleytng.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3xaJiaAsKCN4ACKERLEYTNGGOOGLE.COMLINUX-MMKVACK.ORG@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751294663; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WDNBu9w19s+kAcSrRe9w5bRPZMQtJhhAx+mG+VBvNKk=; b=3EGsdqGRRPke7jri9XhtXJWzI3I5WNuabVK46uBzmFP0ZoR52VWcCJZg+T303aZE0GoN2u 0czBT/xsZKT+L8KRAO4usMTHun6bzG+2uSmGdN+TGiHiT3G+pOj15xwtr4TT7Mex7pGlvj TLedOTsdwnrjdmFhnmqWa4XDlKAjYGY= Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7492da755a1so1851636b3a.1 for ; Mon, 30 Jun 2025 07:44:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751294662; x=1751899462; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=WDNBu9w19s+kAcSrRe9w5bRPZMQtJhhAx+mG+VBvNKk=; b=a67YCtOdrCTGdY7ebvvk16PSOQhdGMCh77jdh/Dxy56AQeYPP8NiplvqwQlC9xMiMT +KTuiW6lC/aV0oqBIAOjs+Y7FGb3QOgNSoUMvqkctRL9YKi0cR1ZsF4H1m2lhy+prXfM zYCY+FtDKx9gF7hAapGcPbCgAahGaTo/9IcoojmDbL9NXZjxtnCErsNz+Ll6eSPLyZb/ dGNybAMVb0Ba2EYE4vYUVFyy/u8ljx8wPyAD9fj+JcH9aWSbdp8dWJ2ixexLuc90SKP/ HT2ioPQvlkHkiUY+5X0A5SP6tr04iuyKxXrXojOCs2jSEJHa+tEtkhwrcuEGsc/J1S69 yUHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751294662; x=1751899462; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=WDNBu9w19s+kAcSrRe9w5bRPZMQtJhhAx+mG+VBvNKk=; b=RNQUEi2MeJD1z/+DBW1hxB6/M/I53HPI83y1BzPehdqRi8k5HrisBvRVgICCLUR28v UM0nAy1vA86QzTjI4VbrXV7TkMRFd2Bln/bx0a+0cRCeosy9qN3WjeYDX+r/X/wSOvC0 kzH+8kx4TTMLtR9uKUNUWEz6cQRMuRbxpVlJBwwNyxLb1Kij5jrXyfteD3cfUsk7vfT3 7MSgFpFf9y3vE0ffj7Z8ZAQCDVnYcsxdKxyNwpQjS4iykzSnQtS7PS5QRg0t+NiruCCd 1zPU0zHlRxYk/mbg04F06zTMltX1kVR6qaQ23itiLdu/OmA7kiZIM7rRIFlHLnzlgz9G Q4mQ== X-Forwarded-Encrypted: i=1; AJvYcCWMy7AKooPKJzOjB6bmAGgxfeRgIXBfY3c/X03/lzFtf5RypXvqwHWUhwnHUWvj6cXkPVfBWCH+SQ==@kvack.org X-Gm-Message-State: AOJu0YyL38aKIfx99cm0muPJkfqydcDLkxZOWeVBfESegeg2ooUxvAP+ s71T/efTmh/qpyyx6XMNYVjyfPHIBiEGBzCOC7cc+xIIeQJl5QPt68OS6S7OU8Q+UtaEHOe4DjA oI5PcgMI1VBnrDWdQSjSbf8rBBQ== X-Google-Smtp-Source: AGHT+IFunOp0qoSe6XZoJHonOUytaiBByj0BZgN04binEVTRRYUnTm4vU2sNBIU0+S9DVAea3Y9BNG9BxDYxIWzIkQ== X-Received: from pghp6.prod.google.com ([2002:a63:fe06:0:b0:b34:c533:cd4e]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:4683:b0:220:245d:a30b with SMTP id adf61e73a8af0-220a17e9d75mr20279198637.38.1751294661741; Mon, 30 Jun 2025 07:44:21 -0700 (PDT) Date: Mon, 30 Jun 2025 07:44:20 -0700 In-Reply-To: Mime-Version: 1.0 References: <20250611133330.1514028-1-tabba@google.com> <20250611133330.1514028-11-tabba@google.com> Message-ID: Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory From: Ackerley Tng To: Fuad Tabba Cc: Sean Christopherson , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, david@redhat.com, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 0A08F8000C X-Stat-Signature: o4cghocnrwj8dg3jgkrskg81r1x43gd5 X-HE-Tag: 1751294662-898214 X-HE-Meta: U2FsdGVkX1/cbhxeGFoUIfTerBmpC+5lFvI+djQeOg1oci0mZuRQY8xBsRq9XNQI40n0DIw4ngF1VWlaheh7UpEBoSOs2Dt7swzWTQtvrrwRV+RZdkWzHXAhw5tcKQA2RdAItLZe7KHQDUCE1NJmygGWWjqkcFCdBzkLo7pXbRrA7tswW+dwD3pfD5pAELQ3TAAQnFZxQCTv4lz58zCT3jumaW2uJzflCK3Ap37ATA4hxI9M1cTuzghw01xvYoIPF3Rf81xwBozUENAKthIcBGp/5tAtuT2AMiWm+i9f5okxMnxctaUf1+jfd/h7Kw0JsUWjjSdzq24ILR3V2N7UGYN+puB+6jHycinUV9WaPE95ANAAVVyvhEbhlpz+8zGLQ7wlLdlRaYW1o5gfpAjLqbhzIP9YjEqmgKuPg2NKCQDcHqG42pbma/2QptAj8Aw6i0IU4dQgS33uqhRvD82obQ6f8G876uRYfPSMjskrutCwajStw2O1zqU/87SqrXXQRaJkNoScGhYFYqMJ1iiofHeIgWoWfaXPNJqhCjn3wiA7yGjwzFMuQKAU2dK7theGv3y0wiCdsdphFaiijycxrbgjUOwr0PSgXe5labqPf3X3tvUtXNj4YtyLUcyH4rGz3ZRHBpbK0uMZ+MeErtYaPiCRbb+wXnqhvgSljABCfovfPZJBivzLCPt0jlGFijVN36OT/5FMXiRP0NYB4gTR485XvgIC5zE6zY2qf5KtzOAIpvZ0MbKom8bzMwLWlZvUpz7qUhFQpNlsB8eILRkfeBMA+yVkS/HldZY9IRvEhRcH1uHXR9UciozAyQiW1Y2acvTVW6WJx5sP15XRMp1kZ50IWX6XhBK3DxyyLx0HY7isKCaylLwXBnB9sZORNE6P1BshBuYUsaD041x0IFZBWQbm21yV/CXNrFgxqyAqNTCLlZlD+4I+Kysk3Eh+DATlJZjT+Vj3sNoyr78ScAP yrkbuXiM nSgAtpLQRWDW+e+ionlX3F/ckWNj53KfsN1R453kav8JKRSmGIYxSWnyfp0Tp7v+93+inw8uCWzt3mDTWQhrw2E8vc5DSirlSm5x7B4I2Q56D0/BJUufAJ2tmbl72LfuF0+uDbnpaf3H0YaE/N+jlb+tGU+diiuVUpJTTJL54UJgISH32dltdJXteO0oqquECDWSftcbrWVIv3wXM6XvG/E/6pxK3lMmHHAcjQza292hL6a4Q0O4XWwCe6sC84xzNI1YmgmUmVbS0m79FLhMetipTbtX/IFleTetJe+jClcxUJPKG4D7J7EbJG/YXx5ObL/H9UcPUma7jzGA7x5QQvpqo5KNYjexBZoPTN99kD9i9zZoA7qlHS1W+NhSKdYhL6LVnu2HanuyqJmLzEzsS3umWNt2roH8De21Zi1JKcfL2ChqxR3hsthsYM0bkp9D+Q2jaqLDFP2uqXQ1DjKg6u2FE8v6QRJtACC7tJWA++WeZkq3uSJ4UwAJEVlw3zc0O/fjf6DHk9O+gXY7HcCi0txyIbuMvcyl6P2Cri4wgalZd9zqYjb5fer99a8j/bGEVo7E2V4mZoFNmbFWyQQ3/oMrdQIwLzgTwzIJf0rPaL0U4Kuaz0rX2N5+GBxvh5vwNEvGcKEOFA+rxS5R36VV6l8HJpHzB0J5cD0xEYPpE47kv2ZRcIOouBxGzuPpuaTwsBW3+2CJ5Y8xbVosf2QVOj99EW2DHY8VoycSF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Fuad Tabba writes: > Hi Ackerley, > > On Fri, 27 Jun 2025 at 16:01, Ackerley Tng wrote= : >> >> Ackerley Tng writes: >> >> > [...] >> >> >>> +/* >> >>> + * Returns true if the given gfn's private/shared status (in the Co= Co sense) is >> >>> + * private. >> >>> + * >> >>> + * A return value of false indicates that the gfn is explicitly or = implicitly >> >>> + * shared (i.e., non-CoCo VMs). >> >>> + */ >> >>> static inline bool kvm_mem_is_private(struct kvm *kvm, gfn_t gfn) >> >>> { >> >>> - return IS_ENABLED(CONFIG_KVM_GMEM) && >> >>> - kvm_get_memory_attributes(kvm, gfn) & KVM_MEMORY_ATTRIBUT= E_PRIVATE; >> >>> + struct kvm_memory_slot *slot; >> >>> + >> >>> + if (!IS_ENABLED(CONFIG_KVM_GMEM)) >> >>> + return false; >> >>> + >> >>> + slot =3D gfn_to_memslot(kvm, gfn); >> >>> + if (kvm_slot_has_gmem(slot) && kvm_gmem_memslot_supports_shared(= slot)) { >> >>> + /* >> >>> + * Without in-place conversion support, if a guest_memfd= memslot >> >>> + * supports shared memory, then all the slot's memory is >> >>> + * considered not private, i.e., implicitly shared. >> >>> + */ >> >>> + return false; >> >> >> >> Why!?!? Just make sure KVM_MEMORY_ATTRIBUTE_PRIVATE is mutually excl= usive with >> >> mappable guest_memfd. You need to do that no matter what. >> > >> > Thanks, I agree that setting KVM_MEMORY_ATTRIBUTE_PRIVATE should be >> > disallowed for gfn ranges whose slot is guest_memfd-only. Missed that >> > out. Where do people think we should check the mutual exclusivity? >> > >> > In kvm_supported_mem_attributes() I'm thiking that we should still all= ow >> > the use of KVM_MEMORY_ATTRIBUTE_PRIVATE for other non-guest_memfd-only >> > gfn ranges. Or do people think we should just disallow >> > KVM_MEMORY_ATTRIBUTE_PRIVATE for the entire VM as long as one memslot = is >> > a guest_memfd-only memslot? >> > >> > If we check mutually exclusivity when handling >> > kvm_vm_set_memory_attributes(), as long as part of the range where >> > KVM_MEMORY_ATTRIBUTE_PRIVATE is requested to be set intersects a range >> > whose slot is guest_memfd-only, the ioctl will return EINVAL. >> > >> >> At yesterday's (2025-06-26) guest_memfd upstream call discussion, >> >> * Fuad brought up a possible use case where within the *same* VM, we >> want to allow both memslots that supports and does not support mmap in >> guest_memfd. >> * Shivank suggested a concrete use case for this: the user wants a >> guest_memfd memslot that supports mmap just so userspace addresses can >> be used as references for specifying memory policy. >> * Sean then added on that allowing both types of guest_memfd memslots >> (support and not supporting mmap) will allow the user to have a second >> layer of protection and ensure that for some memslots, the user >> expects never to be able to mmap from the memslot. >> >> I agree it will be useful to allow both guest_memfd memslots that >> support and do not support mmap in a single VM. >> >> I think I found an issue with flags, which is that GUEST_MEMFD_FLAG_MMAP >> should not imply that the guest_memfd will provide memory for all guest >> faults within the memslot's gfn range (KVM_MEMSLOT_GMEM_ONLY). >> >> For the use case Shivank raised, if the user wants a guest_memfd memslot >> that supports mmap just so userspace addresses can be used as references >> for specifying memory policy for legacy Coco VMs where shared memory >> should still come from other sources, GUEST_MEMFD_FLAG_MMAP will be set, >> but KVM can't fault shared memory from guest_memfd. Hence, >> GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY. >> >> Thinking forward, if we want guest_memfd to provide (no-mmap) protection >> even for non-CoCo VMs (such that perhaps initial VM image is populated >> and then VM memory should never be mmap-ed at all), we will want >> guest_memfd to be the source of memory even if GUEST_MEMFD_FLAG_MMAP is >> not set. >> >> I propose that we should have a single VM-level flag to solve this (in >> line with Sean's guideline that we should just move towards what we want >> and not support non-existent use cases): something like >> KVM_CAP_PREFER_GMEM. >> >> If KVM_CAP_PREFER_GMEM_MEMORY is set, >> >> * memory for any gfn range in a guest_memfd memslot will be requested >> from guest_memfd >> * any privacy status queries will also be directed to guest_memfd >> * KVM_MEMORY_ATTRIBUTE_PRIVATE will not be a valid attribute >> >> KVM_CAP_PREFER_GMEM_MEMORY will be orthogonal with no validation on >> GUEST_MEMFD_FLAG_MMAP, which should just purely guard mmap support in >> guest_memfd. >> >> Here's a table that I set up [1]. I believe the proposed >> KVM_CAP_PREFER_GMEM_MEMORY (column 7) lines up with requirements >> (columns 1 to 4) correctly. >> >> [1] https://lpc.events/event/18/contributions/1764/attachments/1409/3710= /guest_memfd%20use%20cases%20vs%20guest_memfd%20flags%20and%20privacy%20tra= cking.pdf > > I'm not sure this naming helps. What does "prefer" imply here? If the > caller from user space does not prefer, does it mean that they > mind/oppose? > Sorry, bad naming. I used "prefer" because some memslots may not have guest_memfd at all. To clarify, a "guest_memfd memslot" is a memslot that has some valid guest_memfd fd and offset. The memslot may also have a valid userspace_addr configured, either mmap-ed from the same guest_memfd fd or from some other backing memory (for legacy CoCo VMs), or NULL for userspace_addr. I meant to have the CAP enable KVM_MEMSLOT_GMEM_ONLY of this patch series for all memslots that have some valid guest_memfd fd and offset, except if we have a VM-level CAP, KVM_MEMSLOT_GMEM_ONLY should be moved to the VM level. > Regarding the use case Shivank mentioned, mmaping for policy, while > the use case is a valid one, the raison d'=C3=AAtre of mmap is to map int= o > user space (i.e., fault it in). I would argue that if you opt into > mmap, you are doing it to be able to access it. The above is in conflict with what was discussed on 2025-06-26 IIUC. Shivank brought up the case of enabling mmap *only* to be able to set mempolicy using the VMAs, and Sean (IIUC) later agreed we should allow userspace to only enable mmap but still disable faults, so that userspace is given additional protection, such that even if a (compromised) userspace does a private-to-shared conversion, userspace is still not allowed to fault in the page. Hence, if we want to support mmaping just for policy and continue to restrict faulting, then GUEST_MEMFD_FLAG_MMAP should not imply KVM_MEMSLOT_GMEM_ONLY. > To me, that seems like > something that merits its own flag, rather than mmap. Also, I recall > that we said that later on, with inplace conversion, that won't be > even necessary. On x86, as of now I believe we're going with an ioctl that does *not* check what the guest prefers and will go ahead to perform the private-to-shared conversion, which will go ahead to update shareability. > In other words, this would also be trying to solve a > problem that we haven't yet encountered and that we have a solution > for anyway. > So we don't have a solution for the use case where userspace wants to mmap but never fault for userspace's protection from stray private-to-shared conversions, unless we decouple GUEST_MEMFD_FLAG_MMAP and KVM_MEMSLOT_GMEM_ONLY. > I think that, unless anyone disagrees, is to go ahead with the names > we discussed in the last meeting. They seem to be the ones that make > the most sense for the upcoming use cases. > We could also discuss if we really want to support the use case where userspace wants to mmap but never fault for userspace's protection from stray private-to-shared conversions. > Cheers, > /fuad > > > >> > [...] >>