From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5D14C54E71 for ; Tue, 19 Mar 2024 23:54:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5507E6B0089; Tue, 19 Mar 2024 19:54:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DBA96B008C; Tue, 19 Mar 2024 19:54:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32C2C6B0092; Tue, 19 Mar 2024 19:54:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 1D8AF6B0089 for ; Tue, 19 Mar 2024 19:54:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AAFF78119A for ; Tue, 19 Mar 2024 23:54:50 +0000 (UTC) X-FDA: 81915446340.25.751815E Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf13.hostedemail.com (Postfix) with ESMTP id 8BDB72001E for ; Tue, 19 Mar 2024 23:54:48 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="EP KR0PO"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf13.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710892488; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aeq5QRLdnBnxL7CPkUwlT7Hq5FWu/MXH5rgdftDhMUE=; b=QwJSUJt+ake+Mr8F/k6YmFj2JeAlDbj2HBftxYyKuVfrin3kXvfBlZx5orj4DLO8bjVe3z u3565BYk3Y4fTGCjLMqiViAUe0zzOF+y/L66F/GBdUBlHfxm+zig6BmrXrUogbxKBHFmxK pbsK+u8zqz2lxa+eanHf3riRFY7B/sM= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b="EP KR0PO"; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf13.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710892488; a=rsa-sha256; cv=none; b=VQ2UFtVoYQnUH9oZwrBeLxTjo01QH3GxxIMO1y5fIJsYgRo+3UbgkLeezWqQZKxlhz6+Q4 geB/Mx7LgGSiFB7eszougxQUfz077TO4cPIoYv5/vzRNDCC/MJ6lC31jSu0j8zihAdFmO4 Vb7zoqUiN9rZLet4tBLd0F1NbN1Rrwc= Received: from pps.filterd (m0279870.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 42JMfNxS000987; Tue, 19 Mar 2024 23:54:14 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= date:from:to:cc:subject:message-id:references:mime-version :content-type:content-transfer-encoding:in-reply-to; s= qcppdkim1; bh=aeq5QRLdnBnxL7CPkUwlT7Hq5FWu/MXH5rgdftDhMUE=; b=EP KR0POTMpUHtBZ6m2OzvR66QXx487ED+ItB/VzztJtiMhmhYv1mupobx0MSesEaWD bvgiYbnkdL6T41Ob7m8uuiOKh4kME9BVsw0Kt77FOuZJ+BUzCyuH767UPNN5Nn8Q z2zY7A63E5HkxIkwonqY/ffJxNK8vSDfp8Jr8uCAb4IFdJzYAioB9O3aj7c7RmqR H0Zzz/cDrAPYvxXhekUjQxE0rVFa7T/eetY3tbyQssKrDUhIMnDWa3YrrZyy5sD8 YGmDiLJvR/4SnzKN9uCYEcin3dijLJcDKqN9q7f8Ux/f2ggzWq8+TDZHnTYbPEoL nFR+IJo9PCVbfdWup7/w== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3wy9ee9qcy-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 19 Mar 2024 23:54:13 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA03.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 42JNsC0b001314 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 19 Mar 2024 23:54:12 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 19 Mar 2024 16:54:11 -0700 Date: Tue, 19 Mar 2024 16:54:10 -0700 From: Elliot Berman To: Will Deacon CC: David Hildenbrand , Sean Christopherson , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Subject: Re: Re: folio_mmapped Message-ID: <20240319155648990-0700.eberman@hu-eberman-lv.qualcomm.com> Mail-Followup-To: Will Deacon , David Hildenbrand , Sean Christopherson , Vishal Annapurve , Quentin Perret , Matthew Wilcox , Fuad Tabba , kvm@vger.kernel.org, kvmarm@lists.linux.dev, pbonzini@redhat.com, chenhuacai@kernel.org, mpe@ellerman.id.au, anup@brainfault.org, paul.walmsley@sifive.com, palmer@dabbelt.com, aou@eecs.berkeley.edu, viro@zeniv.linux.org.uk, brauner@kernel.org, akpm@linux-foundation.org, xiaoyao.li@intel.com, yilun.xu@intel.com, chao.p.peng@linux.intel.com, jarkko@kernel.org, amoorthy@google.com, dmatlack@google.com, yu.c.zhang@linux.intel.com, isaku.yamahata@intel.com, mic@digikod.net, vbabka@suse.cz, ackerleytng@google.com, mail@maciej.szmigiero.name, michael.roth@amd.com, wei.w.wang@intel.com, liam.merwick@oracle.com, isaku.yamahata@gmail.com, kirill.shutemov@linux.intel.com, suzuki.poulose@arm.com, steven.price@arm.com, quic_mnalajal@quicinc.com, quic_tsoni@quicinc.com, quic_svaddagi@quicinc.com, quic_cvanscha@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, catalin.marinas@arm.com, james.morse@arm.com, yuzenghui@huawei.com, oliver.upton@linux.dev, maz@kernel.org, keirf@google.com, linux-mm@kvack.org References: <7470390a-5a97-475d-aaad-0f6dfb3d26ea@redhat.com> <40f82a61-39b0-4dda-ac32-a7b5da2a31e8@redhat.com> <20240319143119.GA2736@willie-the-truck> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240319143119.GA2736@willie-the-truck> X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01c.na.qualcomm.com (10.47.97.35) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: FEqxfT4Zfq9jRu8C0SNX6rBbDdaZhX_- X-Proofpoint-ORIG-GUID: FEqxfT4Zfq9jRu8C0SNX6rBbDdaZhX_- X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-03-19_10,2024-03-18_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 lowpriorityscore=0 priorityscore=1501 mlxlogscore=999 bulkscore=0 impostorscore=0 clxscore=1011 phishscore=0 spamscore=0 suspectscore=0 adultscore=0 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2403140001 definitions=main-2403190185 X-Rspamd-Queue-Id: 8BDB72001E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 95kmegzddz4xtjo7esbrada6rqrw3q8s X-HE-Tag: 1710892488-120136 X-HE-Meta: U2FsdGVkX19mH4B8ehzdtz+P6XzdST2uhK3duawLK6sc+eP6iTxBsNLvLnGL29WNNyYKszLa72mr6c6xX3x4NTwjQaNg4oXgCVcLnF5jW5Cy7ZRRcLwxXzNLUSHWE3rOsAgPRUnipXTlvWSNUc8qbgpK4c0zrtnMO1yx6rX4oxAtos7Hby4AR/8k0tpF0OZiHPx/n1Q63ffws2TmEliNRyw7tnyJ7ZtSQ0pBI3JOJeVqlf87hcFIRHzCVRr/ZbP2PFkJsGM52UU5v4UJRan+WuYKkFdcHZbLLJm4W4BsMaRCyTUs+1VkpenmQ+m/sXfNPoi+oQk2Zxma2dVQjP4y5l6EecOCnnqc587dkqhhisuoiApj/BGswcIzlfCqemGHOIoJNWtleVPAXSM6p4DpeKgBT/qTp+IqYSHpDpz/z5ou5zFoEVS4hvSrr3TPZ8xRMbKlaYbKCHZklfXvKYpP+CcfVA2ZhZ4CagkWJ2e2ophgD0NahERMVp89wMKYSbNpijwW3EBFmLikorwwsppVUIne8jXJ/R9XaSjwaTEQhhATtDTTUFiP0PiCLgGwu4LRSMU3cAM20ithtm4P287m8Ei149z+famyJHeJHGC7tm0O23/sev2ecvHS+xdA+WiPdyqP0NQE9n+gSM2ZFZPVAjXeJBpOEmeTeexPN0uLIG1mnon0aBVsAFVVOp/OEguuHIBdBoGLDuTin7MFv5QedTCGjCKQG/muqDfTHS+uFcDBypyPspWt+Ho3PuZjGXJeUlH/o1IPjy+pyuzKMnoQxwMyzBgDHtWRdGpzncROqJJ1CjO+RTo/YCPeOD0LFUFGdZl1gHgKwRAYwPeFHw3yq9lT3H6M+eUA4GxDfMGMQiv3Jem7j4jdBntnuz8KnoT/n8zf/h6lX+7DgctPcZQyvI/KYjraKRpUnJhIzt1DaGrQyFyyUwq91n67b8Q0TDvyxVV3GjBAykhldqpeQGb zzQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.001286, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Mar 19, 2024 at 02:31:19PM +0000, Will Deacon wrote: > On Tue, Mar 19, 2024 at 11:26:05AM +0100, David Hildenbrand wrote: > > On 19.03.24 01:10, Sean Christopherson wrote: > > > On Mon, Mar 18, 2024, Vishal Annapurve wrote: > > > > On Mon, Mar 18, 2024 at 3:02 PM David Hildenbrand wrote: > > > > > Second, we should find better ways to let an IOMMU map these pages, > > > > > *not* using GUP. There were already discussions on providing a similar > > > > > fd+offset-style interface instead. GUP really sounds like the wrong > > > > > approach here. Maybe we should look into passing not only guest_memfd, > > > > > but also "ordinary" memfds. > > > > > > +1. I am not completely opposed to letting SNP and TDX effectively convert > > > pages between private and shared, but I also completely agree that letting > > > anything gup() guest_memfd memory is likely to end in tears. > > > > Yes. Avoid it right from the start, if possible. > > > > People wanted guest_memfd to *not* have to mmap guest memory ("even for > > ordinary VMs"). Now people are saying we have to be able to mmap it in order > > to GUP it. It's getting tiring, really. > > From the pKVM side, we're working on guest_memfd primarily to avoid > diverging from what other CoCo solutions end up using, but if it gets > de-featured (e.g. no huge pages, no GUP, no mmap) compared to what we do > today with anonymous memory, then it's a really hard sell to switch over > from what we have in production. We're also hoping that, over time, > guest_memfd will become more closely integrated with the mm subsystem to > enable things like hypervisor-assisted page migration, which we would > love to have. > > Today, we use the existing KVM interfaces (i.e. based on anonymous > memory) and it mostly works with the one significant exception that > accessing private memory via a GUP pin will crash the host kernel. If > all guest_memfd() can offer to solve that problem is preventing GUP > altogether, then I'd sooner just add that same restriction to what we > currently have instead of overhauling the user ABI in favour of > something which offers us very little in return. How would we add the restriction to anonymous memory? Thinking aloud -- do you mean like some sort of "exclusive GUP" flag where mm can ensure that the exclusive GUP pin is the only pin? If the refcount for the page is >1, then the exclusive GUP fails. Any future GUP pin attempts would fail if the refcount has the EXCLUSIVE_BIAS. > On the mmap() side of things for guest_memfd, a simpler option for us > than what has currently been proposed might be to enforce that the VMM > has unmapped all private pages on vCPU run, failing the ioctl if that's > not the case. It needs a little more tracking in guest_memfd but I think > GUP will then fall out in the wash because only shared pages will be > mapped by userspace and so GUP will fail by construction for private > pages. We can prevent GUP after the pages are marked private, but the pages could be marked private after the pages were already GUP'd. I don't have a good way to detect this, so converting a page to private is difficult. > We're happy to pursue alternative approaches using anonymous memory if > you'd prefer to keep guest_memfd limited in functionality (e.g. > preventing GUP of private pages by extending mapping_flags as per [1]), > but we're equally willing to contribute to guest_memfd if extensions are > welcome. > > What do you prefer? > I like this as a stepping stone. For the Android use cases, we don't need to be able to convert a private page to shared and then also be able to GUP it. If you want to GUP a page, use anonymous memory and that memory will always be shared. If you don't care about GUP'ing (e.g. it's going to be guest-private or you otherwise know you won't be GUP'ing), you can use guest_memfd. I don't think this design prevents us from adding "sometimes you can GUP" to guest_memfd in the future. I don't think it creates extra changes for KVM since anonymous memory is already supported; although I'd have to add the support for Gunyah. > [1] https://lore.kernel.org/r/4b0fd46a-cc4f-4cb7-9f6f-ce19a2d3064e@redhat.com Thanks, Elliot