From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74D53E8FDB8 for ; Tue, 3 Oct 2023 20:51:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90C5D8D0084; Tue, 3 Oct 2023 16:51:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 894018D0003; Tue, 3 Oct 2023 16:51:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E8988D0084; Tue, 3 Oct 2023 16:51:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 5A4F38D0003 for ; Tue, 3 Oct 2023 16:51:57 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 250221CA39F for ; Tue, 3 Oct 2023 20:51:57 +0000 (UTC) X-FDA: 81305347074.17.DE3191C Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf25.hostedemail.com (Postfix) with ESMTP id 65694A0018 for ; Tue, 3 Oct 2023 20:51:55 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0ZnuToM2; spf=pass (imf25.hostedemail.com: domain of 36n4cZQYKCIk5rn0wpt11tyr.p1zyv07A-zzx8npx.14t@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=36n4cZQYKCIk5rn0wpt11tyr.p1zyv07A-zzx8npx.14t@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1696366315; a=rsa-sha256; cv=none; b=TsEodVIks2TqRodC0mWrcO2aOPU0F9reMKy0VEA2HQSLXWMQ0+xFvL7psr1p5SNVB1IFmg n3i3Vwh1WYUtdFD2vHztBK7F66Dm0jYEDDrrh3NEnp6Gk+1GzWjeABGtwuzD0U2UTWcDnH eJWbsK4FB4KWUjnjdsvu7vEX3TJm3nk= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=0ZnuToM2; spf=pass (imf25.hostedemail.com: domain of 36n4cZQYKCIk5rn0wpt11tyr.p1zyv07A-zzx8npx.14t@flex--seanjc.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=36n4cZQYKCIk5rn0wpt11tyr.p1zyv07A-zzx8npx.14t@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1696366315; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iN51NFJCcwM1r+zCqreeXDah/vTq4bXHZX6Qt4tveaY=; b=FwgSEN1aVKGaQ9yM5bFIf9JNVQmsmt8/y09o+5E3idPuJV0ifO9za0Mf2e5eyW5Arb0Cta mEoHKT2s5rGc59CONDgptA8Gll7GyjFUpgCe9Se4LevpERwaGwW1SZU7AS67hCapv8xwtl RW1s8CpuiUbh2N861R9xkAd8ytgdc0k= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5a1f12cf1ddso2029267b3.0 for ; Tue, 03 Oct 2023 13:51:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1696366314; x=1696971114; darn=kvack.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id :reply-to; bh=iN51NFJCcwM1r+zCqreeXDah/vTq4bXHZX6Qt4tveaY=; b=0ZnuToM2TCz88rLABNfKZJ3f/jVSg1SduJfslTJlS1CrSwmGLEulPU6Y8J9Da6DQjK njmjGqzD0iSCAyOtHSzi/JGSMPRSfWi380gIU0vC/7L9aHiRIVzNm+ZYNyuFrCx1QAqh 3rHDGI7asYD3N9tO7tpVmiqhun87NJbUJrr/md4jQbkUxVGjnqHbo395DyBZf69gu551 rgqS4eApO2gWzfcRyor4M/Za7IPp/Q89w7KY5axhAa8Mw50Sf62u2dN/am38R+Q8th9R s9kws9uEG1PbslVDWPUWYCR+K1L7Ge3EKbULrPRrilMeA+fCfCG/4eda2iSh256kCOj7 KUoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1696366314; x=1696971114; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject :date:message-id:reply-to; bh=iN51NFJCcwM1r+zCqreeXDah/vTq4bXHZX6Qt4tveaY=; b=dmrGmgOpMaJXG7vJYJh+Pwtbm/IM2eTA8oIImcvTy3RvRwNC3vIqXV0j4sDeox8Qyc LhdwRJGLF4ndGmsQTSgezf63W0UofU82V2x02a+66z7OiChFOrFqXLedVwfMWK8Ocbm2 wYbmiD5YI0xKHm2Xe5GpXplQICTQvwwdjULNkZ/Z7YhxmHg2fSZkCOwepuO7xhukoTwz 8ihDz3ouTohc+95hXy86e45kRE/IZYx/ihdFzRbeMWV54GhtACksUmMVLK2MfWHOxPnt 7BDttitbRqr8eiNT1PiQiC/RtQ3t2X8qREpZZR7xuZJpWMwzXiuW0b88X62Cz0n5feeb A4Vg== X-Gm-Message-State: AOJu0Yzb3bBaghcRzgeWhNW4cuTogVj6lfXxz1Pl0zo2Im4b/tjGacxd hAW2Hl82sRWKZ/ay2ezCRNperPUG7sk= X-Google-Smtp-Source: AGHT+IH9SpWB5ZYki6B4kneRjXWww1fW5aqwjgYpBvf3LyPRvX5qN4iOqywaVkHZ9PW5pYyi5WCeWQUyODc= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:690c:b83:b0:59b:b0b1:d75a with SMTP id ck3-20020a05690c0b8300b0059bb0b1d75amr85511ywb.4.1696366314084; Tue, 03 Oct 2023 13:51:54 -0700 (PDT) Date: Tue, 3 Oct 2023 13:51:52 -0700 In-Reply-To: Mime-Version: 1.0 References: <20230914015531.1419405-1-seanjc@google.com> <20230914015531.1419405-12-seanjc@google.com> Message-ID: Subject: Re: [RFC PATCH v12 11/33] KVM: Introduce per-page memory attributes From: Sean Christopherson To: Fuad Tabba Cc: Paolo Bonzini , Marc Zyngier , Oliver Upton , Huacai Chen , Michael Ellerman , Anup Patel , Paul Walmsley , Palmer Dabbelt , Albert Ou , "Matthew Wilcox (Oracle)" , Andrew Morton , Paul Moore , James Morris , "Serge E. Hallyn" , kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-security-module@vger.kernel.org, linux-kernel@vger.kernel.org, Chao Peng , Jarkko Sakkinen , Anish Moorthy , Yu Zhang , Isaku Yamahata , Xu Yilun , Vlastimil Babka , Vishal Annapurve , Ackerley Tng , Maciej Szmigiero , David Hildenbrand , Quentin Perret , Michael Roth , Wang , Liam Merwick , Isaku Yamahata , "Kirill A . Shutemov" Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 65694A0018 X-Stat-Signature: bkpshnfrtmrk8b9of3se1izzune58afk X-Rspam-User: X-HE-Tag: 1696366315-368765 X-HE-Meta: U2FsdGVkX180wUNzPLIDLjLQEDzpvFzxDLybTPax9hX3CLTx0OdoDWyb0Y+lp6v0s50JVqKzxaiB3TmBRPyiJvxU6WNsg1zaR/TNYS+5G0pxm5UKZpxgO2kry2B6xnU4xmQ0Ptbz8b4v86sm8jDICt0jGBkxcBU9LLAaQBmUYGypvkLVqrPDB47fdk8j1l0NwSw3CH9C9GSYbgSGZwPTUQkIVfgaqjhKTFYnX8c6ARYC83U/z+5k2Rje90jW9f/VNVNp/uL6jm83ZNykBDo4DwcnJOCfuUTEtE31c/Y1kGww1SWCCFJe4B6Y29s3QxJrnGUAp+k57b7aS2nu1d71zNk5e3NI+RZ8dbhF61t4Q5wgiFG0y2Jjpx54RePalop35ucdzAt8aXZkqiN+w6q7grfCHKTzP2jj4mYauFonxZR6bdLGo7dm8jOgOll1v9QmXuI5wymh5W8+MOPrHvBfH7+YpAtiblC3ECkrHhbiTB7gHyK3bqjbYgRbsthjf8C/lBR91W0bbPw6kZ8NiJQtQUh1U/e8MdGYAO3QbrM4iOOvq1e/v27aQGBis9NYukCjuu4C8LQaCbH9xWe6g1lO6GZzMu7GgP2Be+EHi0y3OqAElYkGVyEfALjLBNGUPjSTPPi4Q+u3ilEGJyNu/nQvsmi5KvGMYulm40fo101wr4Pgl62NFoF/a1O4NjlhR1U295U5fZF1xZ1BAZVC3EOnvE5f1XTEHSeUpq1YKJttFuHKMisU2AMOtDi9MBszPi1ghZyqEPp/Mp/YcsiQ+brMXJxpDPiorlbgvtVYU5QFBEc7tLe7agwdyO8GcpOZDo3PcdbzKyFOCpy7/O4DetvlcY08P6Z4c0cUSm0iJ52PE+lb29CPKfE+48jYgMNrVZMvy4P8f87NBhOJwqD6EMWbGcJJByQbIz4qZrdNHeDHyyhNEm5aDQ0NLFTm5jyjh5j4d6uSUQ0omMcEUxM6HcL 3e2cJ1Ur Q6dNMZmPb45oTkPFMjGts+8Pd814a9PP4KTh/nmftFk4GbgAuONg2pHpiKLprbVnYwoxfrP2dQ06JPsI8jUUXiP///BTqCgRhhhHbsouPkWpk6+76lPa97f/PCTFi03Dd9ff4GloTkXhSNCuylIVApyAhrJ5NEeEu/GLHhyBkajs7WgdpoY2XcTfoTHnP331FfDRkLPrg4MCf1UNhfcb47IvfIPXHe/GikyHoEjCxofh5i4oRgkaE3c1cPyXLLymcBOmp88WjiLgU46UQ4L64czDNiWn5mWbIIBXwaKu2H/OSAC3r25+XZn157EQjg/G+0Ynt+QUBb7nQd3uujsVzIYE0P0BwYcrTjKYB0A6sahRJcbcTDRGEhUQtowmePi1cgKmmblCs8/0/4kB9zhxA+Zvf8zkTRixQDVzbh70qvb2m3tLBPnO89I6rUKQYQ7yT89bgSlodugOgL6Qcy88GN4DL/CAF9X8mUJ//hKJC002DeA5yDsPlvOlSfmwfa9j16KPqSqpj5dl4CM84AIk4NFqOWJdm6AMd5GZNzl6YgOWwPf5SdGcicKbmCrldH+foVfObuKY3HvzfH5GJj+uRmBRivK0Rmd7d+CiNqxtstrWeZ+G0Sg6VmIF8nnIpWNa+SUI9t2r3IDMXXfZCKrTFtBQBgGw2o3AgoFsuU+FZyfIq2BEJvSRsSxoCBQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 03, 2023, Fuad Tabba wrote: > On Tue, Oct 3, 2023 at 4:59=E2=80=AFPM Sean Christopherson wrote: > > On Tue, Oct 03, 2023, Fuad Tabba wrote: > > > > +#define KVM_MEMORY_ATTRIBUTE_PRIVATE (1ULL << 3) > > > > + > > > > > > In pKVM, we don't want to allow setting (or clearing) of PRIVATE/SHAR= ED > > > attributes from userspace. > > > > Why not? The whole thing falls apart if userspace doesn't *know* the s= tate of a > > page, and the only way for userspace to know the state of a page at a g= iven moment > > in time is if userspace controls the attributes. E.g. even if KVM were= to provide > > a way for userspace to query attributes, the attributes exposed to usrs= pace would > > become stale the instant KVM drops slots_lock (or whatever lock protect= s the attributes) > > since userspace couldn't prevent future changes. >=20 > I think I might not quite understand the purpose of the > KVM_SET_MEMORY_ATTRIBUTES ABI. In pKVM, all of a protected guest's memory= is > private by default, until the guest shares it with the host (via a > hypercall), or another guest (future work). When the guest shares it, > userspace is notified via KVM_EXIT_HYPERCALL. In many use cases, userspac= e > doesn't need to keep track directly of all of this, but can reactively un= /map > the memory being un/shared. Yes, and then userspace needs to tell KVM, via KVM_SET_MEMORY_ATTRIBUTES, t= hat userspace has agreed to change the state of the page. Userspace may not ne= ed/want to explicitly track the state of pages, but userspace still needs to tell K= VM what userspace wants. KVM is primarily an accelerator, e.g. KVM's role is to make things go fast = (relative to doing things in userspace) and provide access to resources/instructions = that require elevated privileges. As a general rule, we try to avoid defining t= he vCPU model, security policies, etc. in KVM, because hardcoding policy into KVM (= and the kernel as a whole) eventually limits the utility of KVM. As it pertains to PRIVATE vs. SHARED, KVM's role is to define and enforce t= he basic rules, but KVM shouldn't do things like define when it is (il)legal to conv= ert memory to/from SHARED, what pages can be converted, what happens if the gue= st and userspace disagree, etc. > > Why does pKVM need to prevent userspace from stating *its* view of attr= ibutes? > > > > If the goal is to reduce memory overhead, that can be solved by using a= n internal, > > non-ABI attributes flag to track pKVM's view of SHARED vs. PRIVATE. If= the guest > > attempts to access memory where pKVM and userspace don't agree on the s= tate, > > generate an exit to userspace. Or kill the guest. Or do something els= e entirely. >=20 > For the pKVM hypervisor the guest's view of the attributes doesn't > matter. The hypervisor at the end of the day is the ultimate arbiter > for what is shared and with how. For pKVM (at least in my port of > guestmem), we use the memory attributes from guestmem essentially to > control which memory can be mapped by the host. The guest's view absolutely matters. The guest's view may not be expressed= at access time, e.g. as you note below, pKVM and other software-protected VMs = don't have a dedicated shared vs. private bit like TDX and SNP. But the view is = still there, e.g. in the pKVM model, the guest expresses its desire for shared vs= . private via hypercall, and IIRC, the guest's view is tracked by the hypervi= sor in the stage-2 PTEs. pKVM itself may track the guest's view on things, but= the view is still the guest's. E.g. if the guest thinks a page is private, but in reality KVM and host use= rspace have it as shared, then the guest may unintentionally leak data to the untr= usted world. IIUC, you have implemented guest_memfd support in pKVM by changing the attr= ibutes when the guest makes the hypercall. This can work, but only so long as the= guest and userspace are well-behaved, and it will likely paint pKVM into a corner= in the long run. E.g. if the guest makes a hypercall to convert memory to PRIVATE, but there= is no memslot or the memslot doesn't support private memory, then unless there= is policy baked into KVM, or an ABI for the guest<=3D>host hypercall interface= that allows unwinding the program counter, you're stuck. Returning an error for= the hypercall straight from KVM is undesirable as that would put policy into KV= M that doesn't need to be there, e.g. that would prevent userspace from manipulati= ng memslots in response to (un)share requests from the guest. It's a similar = story if KVM marks the page as PRIVATE, as that would prevent userspace from retu= rning an error for the hypercall, i.e. would prevent usersepace from denying the = request to convert to PRIVATE. > One difference between pKVM and TDX (as I understand it), is that TDX > uses the msb of the guest's IPA to indicate whether memory is shared > or private, and that can generate a mismatch on guest memory access > between what it thinks the state is, and what the sharing state in > reality is. pKVM doesn't have that. Memory is private by default, and > can be shared in-place, both in the guest's IPA space as well as the > underlying physical page. TDX's shared bit and SNP's encryption bit are just a means of hardware enfo= rcement. pKVM does have a hardware bit because hardware doesn't provide any enforcem= ent. But as above, pKVM does have an equivalent *somewhere*. > > > The other thing, which we need for pKVM anyway, is to make > > > kvm_vm_set_mem_attributes() global, so that it can be called from out= side of > > > kvm_main.c (already have a local patch for this that declares it in > > > kvm_host.h), > > > > That's no problem, but I am definitely opposed to KVM modifying attribu= tes that > > are owned by userspace. > > > > > and not gate this function by KVM_GENERIC_MEMORY_ATTRIBUTES. > > > > As above, I am opposed to pKVM having a completely different ABI for ma= naging > > PRIVATE vs. SHARED. I have no objection to pKVM using unclaimed flags = in the > > attributes to store extra metadata, but if KVM_SET_MEMORY_ATTRIBUTES do= esn't work > > for pKVM, then we've failed miserably and should revist the uAPI. >=20 > Like I said, pKVM doesn't need a userspace ABI for managing PRIVATE/SHARE= D, > just a way of tracking in the host kernel of what is shared (as opposed t= o > the hypervisor, which already has the knowledge). The solution could simp= ly > be that pKVM does not enable KVM_GENERIC_MEMORY_ATTRIBUTES, has its own > tracking of the status of the guest pages, and only selects KVM_PRIVATE_M= EM. At the risk of overstepping my bounds, I think that effectively giving the = guest full control over what is shared vs. private is a mistake. It more or less= locks pKVM into a single model, and even within that model, dealing with errors a= nd/or misbehaving guests becomes unnecessarily problematic. Using KVM_SET_MEMORY_ATTRIBUTES may not provide value *today*, e.g. the use= rspace side of pKVM could simply "reflect" all conversion hypercalls, and terminat= e the VM on errors. But the cost is very minimal, e.g. a single extra ioctl() pe= r converion, and the upside is that pKVM won't be stuck if a use case comes a= long that wants to go beyond "all conversion requests either immediately succeed= or terminate the guest".