From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EC709C49EA1 for ; Wed, 31 Jul 2024 22:22:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 724506B0083; Wed, 31 Jul 2024 18:22:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D46C6B0088; Wed, 31 Jul 2024 18:22:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 575096B0089; Wed, 31 Jul 2024 18:22:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 363CE6B0083 for ; Wed, 31 Jul 2024 18:22:12 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A06B01607F5 for ; Wed, 31 Jul 2024 22:22:11 +0000 (UTC) X-FDA: 82401472062.20.05FD982 Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) by imf14.hostedemail.com (Postfix) with ESMTP id 7B56A10001C for ; Wed, 31 Jul 2024 22:22:09 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=lLiGyYxH; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf14.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1722464467; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PkS1pxyTMdqdnnXPAyaaYeM/WEVnTjm0ZKicIg64RXU=; b=T+nI4+wty1oSmAlc8zdZrL/4SLYwLxoPSuh/JZvF1pRKUf3NrOjqWUpLoP0uC19OmeY9wA 9eFvXPYEUQEesnUm0Jv/7MORixZUqZiocaVVW2x1iJUByuJNIq5nEpsdjevmkjXM4WzB9V vB7mt2XuhX6RSFI05BGkl1Dv2j2lpU8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1722464467; a=rsa-sha256; cv=none; b=xEvJj8Pt5coOje66H8CxrN4Hr0pnZCuEpcwgA7W2MqU5OPOW9RFVPevL0qoEqkJsY2neB3 CDdC8M/ebvqdPHXIzYG97+FkHEV9W+EPcaCVJ7LVxD2+4Sjv/ZeKlRMvfeQNnFHspHf9X7 QgpctDxs1jcxUj5WsqwUktNZRLizr3c= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=quicinc.com header.s=qcppdkim1 header.b=lLiGyYxH; dmarc=pass (policy=none) header.from=quicinc.com; spf=pass (imf14.hostedemail.com: domain of quic_eberman@quicinc.com designates 205.220.180.131 as permitted sender) smtp.mailfrom=quic_eberman@quicinc.com Received: from pps.filterd (m0279870.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.18.1.2/8.18.1.2) with ESMTP id 46VCUhJw010022; Wed, 31 Jul 2024 22:21:56 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= content-type:date:from:in-reply-to:message-id:mime-version :references:subject:to; s=qcppdkim1; bh=PkS1pxyTMdqdnnXPAyaaYeM/ WEVnTjm0ZKicIg64RXU=; b=lLiGyYxHehouvl1UqcMAWVkvaq9JNGwuc7yqF4H3 qT9Dx4aAWEUV09yH/M/3gppkk6/++E3DREgr9NW1RvV/hm5PwLCR55Erm6CAmHx2 lV44WrTyI/ZCs209FfyLng72CxW4i0k+Zc8UBRcsXud/kcEDrM3Pk35tPMDaxyRl uY94O3tmGuWJJzC2MaSMHOEc9eVvshVMnMgJUspPsg2cv8C+eWSv+ATUdr5rU4e5 HcpAt8Xi/DeVY8ACpwxm0VEi5ymiE0OvPUnPz45HBEujHPEy75l11W+tYSEo5nf0 KEKkZuCNUpMweIFrqZJdeURK+I3LMdpnET7TkKWOJEA0Xw== Received: from nasanppmta03.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 40qnba9k50-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 Jul 2024 22:21:55 +0000 (GMT) Received: from nasanex01b.na.qualcomm.com (nasanex01b.na.qualcomm.com [10.46.141.250]) by NASANPPMTA03.qualcomm.com (8.17.1.19/8.17.1.19) with ESMTPS id 46VMLsqj014520 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 31 Jul 2024 22:21:54 GMT Received: from hu-eberman-lv.qualcomm.com (10.49.16.6) by nasanex01b.na.qualcomm.com (10.46.141.250) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1544.9; Wed, 31 Jul 2024 15:21:53 -0700 Date: Wed, 31 Jul 2024 15:21:53 -0700 From: Elliot Berman To: David Hildenbrand , Christoph Hellwig , Will Deacon , Quentin Perret , Chris Goldsworthy , "Android KVM" , Patrick Daly , "Alex Elder" , Srinivas Kandagatla , Murali Nalajal , Trilok Soni , Srivatsa Vaddagiri , Carl van Schaik , Philip Derrin , Prakruthi Deepak Heragu , Jonathan Corbet , Rob Herring , Krzysztof Kozlowski , Conor Dooley , Catalin Marinas , Konrad Dybcio , Bjorn Andersson , "Dmitry Baryshkov" , Fuad Tabba , "Sean Christopherson" , Andrew Morton , , , , , , Subject: Re: [PATCH v17 19/35] arch/mm: Export direct {un,}map functions Message-ID: <20240731140323693-0700.eberman@hu-eberman-lv.qualcomm.com> References: <20240222-gunyah-v17-0-1e9da6763d38@quicinc.com> <20240222-gunyah-v17-19-1e9da6763d38@quicinc.com> <20240223071006483-0800.eberman@hu-eberman-lv.qualcomm.com> <2f4c44ad-b309-4baa-ac21-2ae19efd31fb@redhat.com> <20240226092020370-0800.eberman@hu-eberman-lv.qualcomm.com> <49d14780-56f4-478d-9f5f-0857e788c667@redhat.com> <20240229170329275-0800.eberman@hu-eberman-lv.qualcomm.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20240229170329275-0800.eberman@hu-eberman-lv.qualcomm.com> X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nasanex01b.na.qualcomm.com (10.46.141.250) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: j4I54y4kCbHW5wCFqcXe9qcnZqg6h32C X-Proofpoint-GUID: j4I54y4kCbHW5wCFqcXe9qcnZqg6h32C X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-07-31_10,2024-07-31_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 bulkscore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 mlxscore=0 impostorscore=0 clxscore=1011 suspectscore=0 adultscore=0 spamscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2407110000 definitions=main-2407310157 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7B56A10001C X-Stat-Signature: f1ghyp1s1588otzp4s4p78wyx58jnzda X-Rspam-User: X-HE-Tag: 1722464529-320040 X-HE-Meta: U2FsdGVkX1/YNPxBY+vaAckseG3z4ufgWGgZ8WPZXx0N98r89lmhGEKO8pUSzPa+txqwzEgloXaYYQPaOj3gsSmxuG9xtSolDW7ONUMqT/+hql3t2IPdVI1DXFouT7FMs8O2h+WpotG+FpVvrUd8EvukWSjblknMq0Wle3Tbx8Am0s4nO8hUpU3RT8xHKduc3oc6y61+PzBGB5S0TzS0mYrpZAfMyTvxgawU7nA+L56jA3tS8ifJngK6fiDWZoHfItaJx4IOC/JYVGboYIGWBt/B9qpyn/34+Fn+T2PnkIlhAoWOuZId18z9NiSZ9lbQvmcGOHysPIBbY0MQfA5AnMPn6n03OW3iVrIvIUXWItmEz9k8sUM0z7iHHv3PqQwAy6KxrUTxocD4pIvCRzFqLG1sUR6Gmv3aBDTCb6TmtfYy5UrCcNR98HT34e8rOcrMroo7vQ6uPIEHI8ehrWFNsM5Iwww9PmuTRc2IfUCLtFsQ9oTBAA4KuNrHoZkyko46nhBFgHFw9n6gKOWBeFAUYDJWUFEW70X/I80QsECRu9jJHqx08BsS2BiANOR5HRJCdZD1pfITbSTg+Q+DzmgBD1quRIh5l0B2w6aSbSofd1Ic6hdiMhhkR2USYPHsAUoBXKjPn2QLdjBPnv320hPZKARCeNk+LOHIhWLe3XODLkUr9whqR6uQGumBnMVDBMCfcPB75elCyG5nJ7QryhznjMeNwqXqO5NXYbmyPtb672DnKhqDIgKN/lV2YTTjkM6Mgi+jcG3kfmv7DN/z5Y9OJCJFMmJqHcVN4wLPJk9F7Stch5G6CwnCIb3z9xGYsmUalMgyd3GxnE3KJxwz9kqEiL3QPJeDj5adidUri7dGEjCGxy8NEip7q46mfSP1Tuhrj78hqo5jfHQyQ0wZiFPNXsqlqcqtShJf2dgIqpRdd4mLw6AIxxQtaJ1GsSw8f9p6F7+D2aRZK111uoArwqz CmJqm4D5 rMB+1sIcR8KWHIJbRAcKquvG71nxnll0a43CdujoNTJxxqTcxsI3Qrzp9TVNYooCLCRE1bulkDBfMg29miLcySmRnOd850GfJZoUHPZJVSS0gNXTemljqTv79rH5+K1VbRZQcIzEZM6t1C2ex+4iCWH+lvHmKNz5tXbnfT5G9gN/zqB3py0IGhUAnK0gk4JcPTcwYSFfVyKFYgqoNXkFHD7wO1VYT2cXDHdaDuDwRQhecybM8LRJMmmvo5MFpV3KwQsjtif4Qbvk2DAKQi1v0qFXcMQcro6cK3FpSu9dQTB1tFqOs+cYQ/TCIhg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: I wanted to revive this thread based on the mm alignment discussion for guest_memfd. Gunyah's guest_memfd allocates memory via filemap_alloc_folio, identical to KVM's guest_memfd. There's a possiblity of a stage-2 fault when memory is donated to guest VM and Linux incidentally tries to access the donated memory with an unaligned access. This access will cause kernel to panic as it expects to be able to access all memory which has been mapped in stage 1. We don't want to disallow unaligned access simply because Gunyah drivers are enabled. There are two options I see to prevent the stage-2 fault from crashing the kernel: we can fix up the stage-2 fault or ensure that Linux has a S1 table consistent with S2. To do the latter, the obvious solution seemed to be using the set_direct_map functions, but you and Christoph have valid concerns about exporting this to modules since it's a low-level API. One way to avoid exporting the symbols is to make Gunyah a built-in, but I'd like to find a better solution. One way I can think of is to create a "guest_memfd library" that both KVM and Gunyah can use. It abstracts the common bits between the 2 into a built-in module and can be the one to call the set_direct_map functions. I also think the abstraction will also help keep KVM guest_memfd cleaner once we start supporting huge folios (and splitting them). Do KVM and mm folks also see value to using a library-fied guest_memfd? Thanks, Elliot On Thu, Feb 29, 2024 at 05:35:45PM -0800, Elliot Berman wrote: > On Tue, Feb 27, 2024 at 10:49:32AM +0100, David Hildenbrand wrote: > > On 26.02.24 18:27, Elliot Berman wrote: > > > On Mon, Feb 26, 2024 at 12:53:48PM +0100, David Hildenbrand wrote: > > > > On 26.02.24 12:06, Christoph Hellwig wrote: > > > > > The point is that we can't we just allow modules to unmap data from > > > > > the kernel mapping, no matter how noble your intentions are. > > > > > > > > I absolutely agree. > > > > > > > > > > Hi David and Chirstoph, > > > > > > Are your preferences that we should make Gunyah builtin only or should add > > > fixing up S2 PTW errors (or something else)? > > > > Having that built into the kernel certainly does sound better than exposing > > that functionality to arbitrary OOT modules. But still, this feels like it > > is using a "too-low-level" interface. > > > > What are your thoughts about fixing up the stage-2 fault instead? I > think this gives mmu-based isolation a slight speed boost because we > avoid modifying kernel mapping. The hypervisor driver (KVM or Gunyah) > knows that the page isn't mapped. Whether we get S2 or S1 fault, the > kernel is likely going to crash, except in the rare case where we want > to fix the exception. In that case, we can modify the S2 fault handler > to call fixup_exception() when appropriate. > > > > > > > Also, do you extend that preference to modifying S2 mappings? This would > > > require any hypervisor driver that supports confidential compute > > > usecases to only ever be builtin. > > > > > > Is your concern about unmapping data from kernel mapping, then module > > > being unloaded, and then having no way to recover the mapping? Would a > > > permanent module be better? The primary reason we were wanting to have > > > it as module was to avoid having driver in memory if you're not a Gunyah > > > guest. > > > > What I didn't grasp from this patch description: is the area where a driver > > would unmap/remap that memory somehow known ahead of time and limited? > > > > How would the driver obtain that memory it would try to unmap/remap the > > direct map of? Simply allocate some pages and then unmap the direct map? > > That's correct. > > > > > For example, we do have mm/secretmem.c, where we unmap the directmap on > > allocation and remap when freeing a page. A nice abstraction on alloc/free, > > so one cannot really do a lot of harm. > > > > Further, we enlightened the remainder of the system about secretmem, such > > that we can detect that the directmap is no longer there. As one example, > > see the secretmem_active() check in kernel/power/hibernate.c. > > > > I'll take a look at this. guest_memfd might be able to use PM notifiers here > instead, but I'll dig in the archives to see why secretmem isn't using that. > > > A similar abstraction would make sense (I remember a discussion about having > > secretmem functionality in guest_memfd, would that help?), but the question > > is "which" memory you want to unmap the direct map of, and how the driver > > became "owner" of that memory such that it would really be allowed to mess > > with the directmap. >