From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1655C54E71 for ; Fri, 22 Mar 2024 18:47:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 728926B0088; Fri, 22 Mar 2024 14:47:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D8BE6B008A; Fri, 22 Mar 2024 14:47:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 579086B008C; Fri, 22 Mar 2024 14:47:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 48E106B0088 for ; Fri, 22 Mar 2024 14:47:07 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D52E9A1CE4 for ; Fri, 22 Mar 2024 18:47:06 +0000 (UTC) X-FDA: 81925557252.12.0C20FE8 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by imf23.hostedemail.com (Postfix) with ESMTP id 9260A14001A for ; Fri, 22 Mar 2024 18:47:03 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=Tw2OjtgO; spf=pass (imf23.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711133223; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=d9aPd7iDjqttzRs468Xk9dTSHBIOlc/IfTxqCu9i+pI=; b=5ke4y/6Apa7LTFfg1IAIipnc1Xnv0wFC5DeMMQxOdY1Vdu42nKbCEoTqDG1EnUSmECTB2e Yj1laRL1aXaOyzS+KgPZ4gYSOcWApY/uykmVySMzlK/JievCTmZd17zSWRiq328Ig205/o iAmWVHVDESZ1/ryErQc0mHVmWiVK790= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=Tw2OjtgO; spf=pass (imf23.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.168.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com; dmarc=pass (policy=none) header.from=quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711133223; a=rsa-sha256; cv=none; b=0mxAtq/XLEeff3biiiOhW+y995wbUEzGK1Q4aT1RiBnrVWyI1Uv9rlmJoBlinpwwpjsLma 9xpE0MrkN/Fc49c8moq6TEXXctiHtXqVuIJkOXp7AqWSEi1F12otKXWgKcQVd7GebTwyow 39HPmuzYxaWl/3NSW97TX0o6Bg90f7g= Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 42MBQklN024736; Fri, 22 Mar 2024 18:46:26 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=qcppdkim1; bh=d9aPd7iDjqttzRs468Xk9 dTSHBIOlc/IfTxqCu9i+pI=; b=Tw2OjtgOb88MSB3+gu+9gYRUUQsGeYp2sUiz6 JptKVzKgAB0yW1Pk0Y3rT3qHOFdSKBhTuXro+vvk5CLuY+XaU3lmHkkiWtvXSf2h LDBZciCrlnW3F+4+QrZJnhEPyf4vRxhw9XH9XOySQ3eTuRwq0yJvowB1ZlTcccOR JiC20aPuODkiRrIj/mV+FSDJ7irYvx+GZttBXrK2mywMbPM3PhylwblmQ4gbGNn1 tOLi56EiM+LktAlk5BLUKRhTUlC95zYwRUMut9Fmws4PZ8y8ECdgvPUmPPEjZ96o JA+okuyyPT1nfLSU5ASBRKFMFuBpivSSdh9OMDskBUtnXXs2Q== Received: from nasanppmta04.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3x0wy8u13q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2024 18:46:26 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 42MIkCWQ032407 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 22 Mar 2024 18:46:12 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Fri, 22 Mar 2024 11:46:10 -0700 Date: Fri, 22 Mar 2024 11:46:10 -0700 From: Elliot Berman To: Will Deacon CC: David Hildenbrand , Sean Christopherson , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: Re: Re: folio_mmapped Message-ID: <20240322111214274-0700.eberman@hu-eberman-lv.qualcomm.com> Mail-Followup-To: Will Deacon , David Hildenbrand , Sean Christopherson , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, keirf@google.com, linux-mm@kvack.org References: <7470390a-5a97-475d-aaad-0f6dfb3d26ea@redhat.com> <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com> <20240319143119.GA2736@willie-the-truck> <20240319155648990-0700.eberman@hu-eberman-lv.qualcomm.com> <20240322163654.GG5634@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20240322163654.GG5634@willie-the-truck> X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01a.na.qualcomm.com (10.47.209.196) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: 4Iv94yt67x3r7p0CElbDGTl-2mqX9tDX X-Proofpoint-GUID: 4Iv94yt67x3r7p0CElbDGTl-2mqX9tDX X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-22_11,2024-03-21_02,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 lowpriorityscore=0 malwarescore=0 impostorscore=0 phishscore=0 mlxlogscore=999 adultscore=0 priorityscore=1501 mlxscore=0 spamscore=0 bulkscore=0 suspectscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2403210001 definitions=main-2403220135 X-Rspamd-Queue-Id: 9260A14001A X-Rspam-User: X-Stat-Signature: exfqxo9zzfk787x4my7ydazgi41pkus8 X-Rspamd-Server: rspam01 X-HE-Tag: 1711133223-515353 X-HE-Meta: U2FsdGVkX19C+LRfAwp7otc6DtgiYLXgkbVlG+LCsVS3KyB18XMgGHtBQ2Ndm0gNn6i/Fe0rctrR9flPfsmVF5AJ/YMVL5U7wFUNic0ENT//FoXJkA4eNu6dYuXOMcUqxaNBB2hAUdBnQEyC6CMNXdX/PHb4FiOzlaFJpCgB1GdptRA9gTM5KmfAtX/UNlA7gBzz9RysKnKwp4yy8gbcF3gyqnLbIxMRTwNwU2xQTyOdpP6hzvIuFtmTe0aoqzHLRKeC5soNIeSLHt1FaIn/D0qYyc8z7YeS+pHSOOt66Nb8nmJub1SR2EuWNhjK6pgkolpFIKKbjfXsgQZPoRF7cOEgEJQ3PfIUdGAsI4BT41Zn7eCXnm7jsHfNGxa6f+Yp1USrFGjUmvK+OwHUwZsZgB/TrXYKln5DZ8X/ZSO5dVZPTiF6hWo/0eqn9KmGmeRIhXl99BoSvbVkecLeSydKFFdkaCTgLPRwB4TXYNAxak6b9CkOCMxapCJX5KaSnaMygHSpwVKgjVrNUHRxZumZcCunIiNXMniMvxC8Waf6VIj0sXcbktSLDBgNYt8+0buOH19XlMO+/awf51LAxYUC6A57BS+59z9ibyBdzxfNzrzJgNtiyUiq4BmKMrO8xmfNSooKKpfjGYMZqQYB3Rv3bNAtzKPs16gz15O9QK6pFwqLMQVCxerAwf/Wks1MxvTmat/n24LpFzgU7CBbxHlfjN29jy1ZlU7bQt1PrYSUnLbZaoS7eOdpC4+dqY+654H50yTdNJMXpPi3wtBBt2cS/KHp67XZRW9m5u8fdN7hJKL7k+fDxHuwGjVUrWJS0aevfI/l2VZUk4ha6jr3pASbhiW/ouEcScqAVky6Petxw1gHjJFdVBFxw+V/IShyGIIGZL1FTX2q+NPHUJrw1gLBsCoTF0JkUFlW+2+pLdqr1E8C2F6PjKxpSeEiMygZJtTFcqZ+x4z2s7W08x5vieL NCuuToYw Tl4WJ8SyRe8LEezDg2U6mkp9mSgG6LFVEQqNFRqmzd2P0ly8iFlTs9DQo3U+iikyLa9ft2pxf+sIC5z0egqqC/w5lRixeDdVArT6cHbZ4YteGr6Peca4KmLOaIroX8Co7Q+8H66LDYxLyis1OqyNj954suHUbYzH0YWXlm2yDLNZj6ergKu1LQw1cMm2/oKtypBAznBLajeS3k87P6ouF/Ngo5lopWuZMuzReSpTvKWSdrtfEbFTvRjchVeq++EcLUaBc6PR4vdtf70Kk3zk+RfYT/WkTJPFGIfjZktGWTWtHssMgotBYlaNXb1oXMQJ4d6Vd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000399, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Mar 22, 2024 at 04:36:55PM +0000, Will Deacon wrote: > Hi Elliot, > > On Tue, Mar 19, 2024 at 04:54:10PM -0700, Elliot Berman wrote: > > On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote: > > > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote: > > > > On 19.03.24 01:10, Sean Christopherson wrote: > > > > > +1. I am not completely opposed to letting SNP and TDX effectively convert > > > > > pages between private and shared, but I also completely agree that letting > > > > > anything gup() guest_memfd memory is likely to end in tears. > > > > > > > > Yes. Avoid it right from the start, if possible. > > > > > > > > People wanted guest_memfd to *not* have to mmap guest memory ("even for > > > > ordinary VMs"). Now people are saying we have to be able to mmap it in order > > > > to GUP it. It's getting tiring, really. > > > > > > From the pKVM side, we're working on guest_memfd primarily to avoid > > > diverging from what other CoCo solutions end up using, but if it gets > > > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do > > > today with anonymous memory, then it's a really hard sell to switch over > > > from what we have in production. We're also hoping that, over time, > > > guest_memfd will become more closely integrated with the mm subsystem to > > > enable things like hypervisor-assisted page migration, which we would > > > love to have. > > > > > > Today, we use the existing KVM interfaces (i.e. based on anonymous > > > memory) and it mostly works with the one significant exception that > > > accessing private memory via a GUP pin will crash the host kernel. If > > > all guest_memfd() can offer to solve that problem is preventing GUP > > > altogether, then I'd sooner just add that same restriction to what we > > > currently have instead of overhauling the user ABI in favour of > > > something which offers us very little in return. > > > > How would we add the restriction to anonymous memory? > > > > Thinking aloud -- do you mean like some sort of "exclusive GUP" flag > > where mm can ensure that the exclusive GUP pin is the only pin? If the > > refcount for the page is >1, then the exclusive GUP fails. Any future > > GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS. > > Yes, I think we'd want something like that, but I don't think using a > bias on its own is a good idea as false positives due to a large number > of page references will then actually lead to problems (i.e. rejecting > GUP spuriously), no? I suppose if you only considered the new bias in > conjunction with the AS_NOGUP flag you proposed then it might be ok > (i.e. when you see the bias, you then go check the address space to > confirm). What do you think? > I think the AS_NOGUP would prevent GUPing the first place. If we set the EXCLUSIVE_BIAS value to something like INT_MAX, do we need to be worried about there being INT_MAX-1 valid GUPs and wanting to add another? From the GUPer's perspective, I don't think it would be much different from overflowing the refcount. > > > On the mmap() side of things for guest_memfd, a simpler option for us > > > than what has currently been proposed might be to enforce that the VMM > > > has unmapped all private pages on vCPU run, failing the ioctl if that's > > > not the case. It needs a little more tracking in guest_memfd but I think > > > GUP will then fall out in the wash because only shared pages will be > > > mapped by userspace and so GUP will fail by construction for private > > > pages. > > > > We can prevent GUP after the pages are marked private, but the pages > > could be marked private after the pages were already GUP'd. I don't have > > a good way to detect this, so converting a page to private is difficult. > > For anonymous memory, marking the page as private is going to involve an > exclusive GUP so that the page can safely be donated to the guest. In > that case, any existing GUP pin should cause that to fail gracefully. > What is the situation you are concerned about here? > I wasn't thinking about exclusive GUP here. The exclusive GUP should be able to get the guarantees we need. I was thinking about making sure we gracefully handle a race to provide the same page. The kernel should detect the difference between "we're already providing the page" and "somebody has an unexpected pin". We can easily read the refcount if we couldn't take the exclusive pin to know. Thanks, Elliot > > > We're happy to pursue alternative approaches using anonymous memory if > > > you'd prefer to keep guest_memfd limited in functionality (e.g. > > > preventing GUP of private pages by extending mapping_flags as per [1]), > > > but we're equally willing to contribute to guest_memfd if extensions are > > > welcome. > > > > > > What do you prefer? > > > > > > > I like this as a stepping stone. For the Android use cases, we don't > > need to be able to convert a private page to shared and then also be > > able to GUP it. > > I wouldn't want to rule that out, though. The VMM should be able to use > shared pages just like it can with normal anonymous pages. > > > I don't think this design prevents us from adding "sometimes you can > > GUP" to guest_memfd in the future. > > Technically, I think we can add all the stuff we need to guest_memfd, > but there's a desire to keep that as simple as possible for now, which > is why I'm keen to explore alternatives to unblock the pKVM upstreaming. > > Will >