From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B3C32C8303C for ; Tue, 8 Jul 2025 13:44:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 436076B010F; Tue, 8 Jul 2025 09:44:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 40D416B0110; Tue, 8 Jul 2025 09:44:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 34A7D6B0111; Tue, 8 Jul 2025 09:44:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 21DB06B010F for ; Tue, 8 Jul 2025 09:44:32 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C44C11605F7 for ; Tue, 8 Jul 2025 13:44:31 +0000 (UTC) X-FDA: 83641217142.16.A871350 Received: from mail-qv1-f74.google.com (mail-qv1-f74.google.com [209.85.219.74]) by imf26.hostedemail.com (Postfix) with ESMTP id F2A5A140011 for ; Tue, 8 Jul 2025 13:44:29 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=l1mhKXvP; spf=pass (imf26.hostedemail.com: domain of 3sCBtaAsKCO8RTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com designates 209.85.219.74 as permitted sender) smtp.mailfrom=3sCBtaAsKCO8RTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751982270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uhhghHOWkTnR0hLVF1kHakjup/7Sho01+bx/DM99C1g=; b=KISZE0mKeIjQ++YJLyFihXF/zwSyi0Z0bNy8yvgqQv6n4RFFlDy++5Uz8KkuCc8euWtl5r QqN1yVOODQzSm000jqDAQgfc0EMc7lhzczB/qK3wTKxSyqhWnfDPC4MQAvw5xlMnLe6i6Y k6t4o+MEIrcL2kiiaJF/suWzcg15/jg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=l1mhKXvP; spf=pass (imf26.hostedemail.com: domain of 3sCBtaAsKCO8RTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com designates 209.85.219.74 as permitted sender) smtp.mailfrom=3sCBtaAsKCO8RTbVicVpkeXXffXcV.TfdcZelo-ddbmRTb.fiX@flex--ackerleytng.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751982270; a=rsa-sha256; cv=none; b=d0LGpFV6F27OulRlGeimz4m4hDXh5FNmWHVh3UcFEPDg5J8ixyOYzg4izneini4KmWzv6J 2Rlh7Zu+L1+N1wdfLBSig5eTR9NPW/Voegpg0P75oN+AGMZ3CIAcn2Y2hYabEjicQOAQEP izQkPWBBwVq4g0uv6Y5E8Y4G7OsA1c0= Received: by mail-qv1-f74.google.com with SMTP id 6a1803df08f44-6fab979413fso80944126d6.2 for ; Tue, 08 Jul 2025 06:44:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1751982269; x=1752587069; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uhhghHOWkTnR0hLVF1kHakjup/7Sho01+bx/DM99C1g=; b=l1mhKXvPq0ZnqtdOF5NWpbsZGQgIR8VfEXkceasRSymhYO42XZsw7BnveGBThbVKL/ QCa7q3JNhQ0YR3UjwyXyHrfLBMJDG/W29pbJK4xYz8KJlKjOCsTXFqJdcMEN2ecHIR15 f8VYmNmAy63aD8TiAPbTCqUpRPjwvyyoUXklyLVtC+mwOcX39Y0vY0E5gPpcsZc2CwCT 3alMZiM5LtI+zfqonk68G+huaD8wQ6INWaOE2QIs8Y9hOjLEMas+c2hjL83fG+lrcRAz SR2q77pa3uiD69L565kEDGuoGQe0lYm102jwVsTbMVroUVeZl31BVYqaIB1Z2sgEMXKe GK8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751982269; x=1752587069; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uhhghHOWkTnR0hLVF1kHakjup/7Sho01+bx/DM99C1g=; b=gu0h4584NcXaJpU0uBjK+d/hHr6kaK9HgPMKnuobIo7aNUOjbbwabDcjBsH18CS3/X NS1kvmyjW11dozvDtqV8G+BqagmhET6OsY2PYyy0TKOTVBHr9+yXic+2VoSo+d6GY8hh BtL95is12yrBkmWe50Dr+dJINnwYiGb3WuWvdIRjD71+WdeVAf1OoQ825GgNxB6EjrW2 pRJUC4HuNfq8c3ZpYAZiXwO8vX1lQ4o8QT42Ix+YgxZkfp43M0LpUfhqErUFArROykt9 GcOXLldPmsYr/R/qU8GqbaWW5eoSLAg70P/Xh77FNRQos2ZTV+j9dPG2g2iy1kypWz7V GhJQ== X-Forwarded-Encrypted: i=1; AJvYcCWSYN5dTYG7quB6YyhRoTIZVWUF1L+9gmnerAWm/DXl6C2XJAS44227O7GK8y3v0DZkpnlL/pmAEg==@kvack.org X-Gm-Message-State: AOJu0YxkMLzX8dA4r4Cqj7k6pmLGKRHA5i0y+0w9IdHMrpuK9DyNF10v +rtVzmYJPGVG0L0RlMOPrgcMJVN69Ih3qOV7r6i4SJ4G22Oks7NlcOYdgTVl4RYiTH56DhVvP3f M9sy783MjUesrxtwBMa7MDrBHuw== X-Google-Smtp-Source: AGHT+IG8mP9gY5/bBz/PPDZ950yjoWSdMml7V++iKYK1MlN+GfOj+Ia3LLbhfTBv0RJYHh5T/kuJRSivcI5DQlUR2Q== X-Received: from pjvf5.prod.google.com ([2002:a17:90a:da85:b0:311:a879:981f]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3ec2:b0:311:fde5:c4be with SMTP id 98e67ed59e1d1-31aac544158mr19610623a91.35.1751982256358; Tue, 08 Jul 2025 06:44:16 -0700 (PDT) Date: Tue, 08 Jul 2025 06:44:14 -0700 In-Reply-To: Mime-Version: 1.0 References: <434ab5a3-fedb-4c9e-8034-8f616b7e5e52@amd.com> <923b1c02-407a-4689-a047-dd94e885b103@redhat.com> <0cdc7890-aade-4fa5-ad72-24cde6c7bce9@redhat.com> Message-ID: Subject: Re: [PATCH v12 10/18] KVM: x86/mmu: Handle guest page faults for guest_memfd with shared memory From: Ackerley Tng To: Sean Christopherson , David Hildenbrand Cc: Shivank Garg , Fuad Tabba , kvm@vger.kernel.org, linux-arm-msm@vger.kernel.org, linux-mm@kvack.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, willy@infradead.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, vannapurve@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, will@kernel.org, qperret@google.com, keirf@google.com, roypat@amazon.co.uk, shuah@kernel.org, hch@infradead.org, jgg@nvidia.com, rientjes@google.com, jhubbard@nvidia.com, fvdl@google.com, hughd@google.com, jthoughton@google.com, peterx@redhat.com, pankaj.gupta@amd.com, ira.weiny@intel.com Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: F2A5A140011 X-Rspamd-Server: rspam09 X-Stat-Signature: 1xyiyyjywuzguzn3gqagxuttdduq9iyt X-HE-Tag: 1751982269-442953 X-HE-Meta: U2FsdGVkX1+XO0Wpn4ScrHfffBcZeHW2D+T+5D46s/4ItS4nyoTy65kR81lvGfyF0XutMKCkYLGyRYS1oHLFU4vUvO7X3rMpAUpMafiXWdOsnzAeHu/n0VNRtHMszhmC8N9wXRV1lJIsF6NL4t2QmClawAyOyJuC9m1x/8t3hM7+HhXCNf1gXCJpuJja9jwHbTTQFPVn63lkRudBb+OrFGvssNMCfD2RVx48qgLTJ9+g4dAVN7FC5JJLdRD3kMxgT+H1l1K/k4ghQ/zSwWw8g8vGIKf18EMT5vg5M6JnZQKUZAZM5xaPVL11x0kaVwAPHFWMjpC34IWXj8mrVRiwfi9hZ1OsbXfRxjJXFOf9SBTjT54iT7xtu29JJLcUrOXWQI+Mr3yjN7hRxNiTSMYbohYs+tbXTRAss1H4GmqUPL/87ouu1CgJkigu8+zZjiJy93H3z8IbfOdz2dGTEca5jzK0ew7KHHT0XrRTpckyUsyLmA4UfBCaLAlO7An5WPVfRo5DuvwKR2TIFenjbuo/PFjlOZvXh6Q8v/owQicjv1/uqEi+7k9GxJCoRAqLUAIH/cDEBE9qFDWd/jBWTghwbL8mmSX93bro7I8mUNVD2Nm9o/gkUb2DRCkSHQltJAhgpNVpuTsQBkCUGx7/sBuVWj2bb+FiWbGHE0RbGmVWLoVNrfoWx1YSOW7uPvWi6QrK18VbE2Scl1ALkKpRH5aiYsfWmnUHB+s/+qgeaZnNAtoctRNlQWhhZvF7I2g0AYOcSusow66oN3Mv7edFbvJ/9ADAphluCrx0ue6OQBDis2JuKeqkd/1yj7qSRDYmpOgOUBusJ4G+re9tpAj/MKQlg37zaLXSvuQbz1s/or0GnDp4a6d5THqZA2AjP5cmMRuEe3iBD0gJUf45ssk6Zt/UrO8BaLQuUp8kT1at2SZNzaDDyeszGvMt7hWG+yvgF0uucAP/5RDdNdNsGdQhZW5 heW4Rul1 yffHQ+/4XUxst74VsqkAPNIKUwtg1P0bbtbdYhu6h0r9yqyjKo7ul+PAHyAeHc7kk8J0f6ziofZW6xSmYBGi/ED9moFg5SOZMkAN4hvQYB64Lwr7s0CXBfc6FAjEnONdX3x4MIWqJ1Jk5IYceTDfFN82AuHOvFovnw3gp+FG64YCwvF1dQfMUjMMrw63aqy4c2sZt2Dzq7BYuem2hmuLFPfYGDV4QOkVpwXk6nNRwSIpAD0sXJ+6l6G2249tk8XkiYMITLEcjEhoc9igVeecv+8Jw8G2loqUecd5lQ6oW1gkhpq9t88nIY+amHbuNS+fmhaU6Bv7/k+6bYXHISXvf/lmkyEwnrJEM+Bq7CL/Bgbtnh4fE6KGwZ0U811/1mg/vQmnVLQbMOp+k/tOdQROgP/5roPNYZRioGZrDnoVa4+7QuF1FtaQd6FbNE5ZefHcrow7GEqc5no0AZfpxR/1Q9+vmac1cCxtHkxqS5TEfgJ0IZG6ZR2FqssXICsvq+W5MEql3Jdik3jhAgptd/W6Y0UH3WdSmS+QLI7HBDg9kZT28MXb4D/xz+/Pa7qhscZ0jvhP/fExOvKUp6H7GI2uRvE04OKTm5zI0AmPTk5tlcFyS2x/NN4m6Te3uq6nbF54XsKLfT0lxhY3rha8= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Sean Christopherson writes: > On Tue, Jul 01, 2025, David Hildenbrand wrote: >> > > > I support this approach. >> > > >> > > Agreed. Let's get this in with the changes requested by Sean applied. >> > > >> > > How to use GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM with >> > > legacy mem attributes (-> all memory in guest_memfd private) could be >> > > added later on top, once really required. >> > > >> > > As discussed, CoCo VMs that want to support GUEST_MEMFD_FLAG_MMAP will >> > > have to disable legacy mem attributes using a new capability in stage-2. >> > > >> > >> > I rewatched the guest_memfd meeting on 2025-06-12. We do want to >> > support the use case where userspace wants to have mmap (e.g. to set >> > mempolicy) but does not want to allow faulting into the host. >> > >> > On 2025-06-12, the conclusion was that the problem will be solved once >> > guest_memfd supports shareability, and that's because userspace can set >> > shareability to GUEST, so the memory can't be faulted into the host. >> > >> > On 2025-06-26, Sean said we want to let userspace have an extra layer of >> > protection so that memory cannot be faulted in to the host, ever. IOW, >> > we want to let userspace say that even if there is a stray >> > private-to-shared conversion, *don't* allow faulting memory into the >> > host. > > Eh, my comments were more along the lines of "it would be nice if we could have > such protections", not a "we must support this". And I suspect that making the > behavior all-or-nothing for a given guest_memfd wouldn't be very useful, i.e. > that userspace would probably want to be able to prevent accessing a specific > chunk of the gmem instance. > > Actually, we can probably get that via mseal(), maybe even for free today? E.g. > mmap() w/ PROT_NONE, mbind(), and then mseal(). > > So yeah, I think we do nothing for now. > >> > The difference is the "extra layer of protection", which should remain >> > in effect even if there are (stray/unexpected) private-to-shared >> > conversions to guest_memfd or to KVM. Here's a direct link to the point >> > in the video where Sean brought this up [1]. I'm really hoping I didn't >> > misinterpret this! >> > >> > Let me look ahead a little, since this involves use cases already >> > brought up though I'm not sure how real they are. I just want to make >> > sure that in a few patch series' time, we don't end up needing userspace >> > to use a complex bunch of CAPs and FLAGs. >> > >> > In this series (mmap support, V12, patch 10/18) [2], to allow >> > KVM_X86_DEFAULT_VMs to use guest_memfd, I added a `fault_from_gmem()` >> > helper, which is defined as follows (before the renaming Sean requested): >> > >> > +static inline bool fault_from_gmem(struct kvm_page_fault *fault) >> > +{ >> > + return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot); >> > +} >> > >> > The above is changeable, of course :). The intention is that if the >> > fault is private, fault from guest_memfd. If GUEST_MEMFD_FLAG_MMAP is >> > set (KVM_MEMSLOT_GMEM_ONLY will be set on the memslot), fault from >> > guest_memfd. >> > >> > If we defer handling GUEST_MEMFD_FLAG_MMAP in combination with a CoCo VM >> > with legacy mem attributes to the future, this helper will probably >> > become >> > >> > -static inline bool fault_from_gmem(struct kvm_page_fault *fault) >> > +static inline bool fault_from_gmem(struct kvm *kvm, struct kvm_page_fault *fault) >> > +{ >> > - return fault->is_private || kvm_gmem_memslot_supports_shared(fault->slot); >> > + return fault->is_private || (kvm_gmem_memslot_supports_shared(fault->slot) && >> > + !kvm_arch_disable_legacy_private_tracking(kvm)); >> > +} >> > >> > And on memslot binding we check >> > >> > if kvm_arch_disable_legacy_private_tracking(kvm) > > I would invert the KVM-internal arch hook, and only have KVM x86's capability refer > to the private memory attribute as legacy (because it simply doesn't exist for > any thing else). > >> > and not GUEST_MEMFD_FLAG_MMAP >> > return -EINVAL; >> > >> > 1. Is that what yall meant? > > I was thinking: > > if (kvm_arch_has_private_memory_attribute(kvm) == > kvm_gmem_mmap(...)) > return -EINVAL; > > I.e. in addition to requiring mmap() when KVM doesn't track private/sahred via > memory attributes, also disallow mmap() when private/shared is tracked via memory > attributes. > >> My understanding: >> >> CoCo VMs will initially (stage-1) only support !GUEST_MEMFD_FLAG_MMAP. >> >> With stage-2, CoCo VMs will support GUEST_MEMFD_FLAG_MMAP only with >> kvm_arch_disable_legacy_private_tracking(). > > Yep, and everything except x86 will unconditionally return true for > kvm_arch_disable_legacy_private_tracking() (or false if it's inverted as above). > >> Non-CoCo VMs will only support GUEST_MEMFD_FLAG_MMAP. (no concept of >> private) >> >> > >> > 2. Does this kind of not satisfy the "extra layer of protection" >> > requirement (if it is a requirement)? > > It's not a requirement. > >> > A legacy CoCo VM using guest_memfd only for private memory (shared >> > memory from say, shmem) and needing to set mempolicy would >> > * Set GUEST_MEMFD_FLAG_MMAP > > I think we should keep it simple as above, and not support mmap() (and therefore > mbind()) with legacy CoCo VMs. Given the double allocation flaws with the legacy > approach, supporting mbind() seems like putting a bandaid on a doomed idea. > >> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false >> > but still be able to send conversion ioctls directly to guest_memfd, >> > and then be able to fault guest_memfd memory into the host. >> >> In that configuration, I would expect that all memory in guest_memfd is >> private and remains private. >> >> guest_memfd without memory attributes cannot support in-place conversion. >> >> How to achieve that might be interesting: the capability will affect >> guest_memfd behavior? >> >> > >> > 3. Now for a use case I've heard of (feel free to tell me this will >> > never be supported or "we'll deal with it if it comes"): On a >> > non-CoCo VM, we want to use guest_memfd but not use mmap (and the >> > initial VM image will be written using write() syscall or something >> > else). >> > >> > * Set GUEST_MEMFD_FLAG_MMAP to false >> > * Leave KVM_CAP_DISABLE_LEGACY_PRIVATE_TRACKING defaulted to false >> > (it's a non-CoCo VM, weird to do anything to do with private >> > tracking) >> > >> > And now we're stuck because fault_from_gmem() will return false all >> > the time and we can't use memory from guest_memfd. > > Nah, don't support this scenario. Or rather, use mseal() as above. If someone > comes along with a concrete, strong use case for backing non-CoCo VMs and using > mseal() to wall off guest memory doesn't suffice, then they can have the honor > of justifying why KVM needs to take on more complexity. :-) > >> I think I discussed that with Sean: we would have GUEST_MEMFD_FLAG_WRITE >> that will imply everything that GUEST_MEMFD_FLAG_MMAP would imply, except >> the actual mmap() support. > > Ya, for the write() access or whatever. But there are bigger problems beyond > populating the memory, e.g. a non-CoCo VM won't support private memory, so without > many more changes to redirect KVM to gmem when faulting in guest memory, KVM won't > be able to map any memory into the guest. Thanks for clarifying everything above :). Next respin (with Fuad's help) coming soon!