From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9EA4C282EC for ; Tue, 11 Mar 2025 11:18:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 24DA2280004; Tue, 11 Mar 2025 07:18:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FA8D280003; Tue, 11 Mar 2025 07:18:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C514280004; Tue, 11 Mar 2025 07:18:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E3074280003 for ; Tue, 11 Mar 2025 07:18:46 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 9619F161DE0 for ; Tue, 11 Mar 2025 11:18:48 +0000 (UTC) X-FDA: 83209022736.15.64F0F03 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf29.hostedemail.com (Postfix) with ESMTP id 9E1EA120003 for ; Tue, 11 Mar 2025 11:18:46 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aR5Xv+JX; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf29.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741691926; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zTInLBZAi6dngAdI5tqAttPHHYifpveKWKtHyNBTHCw=; b=CHXucHtaXX5MwPhyjOfcVMd+2F9PxYeWFq/zGnFE2Db6WwsHeotL4AkkeSTOW589EOiMBN KjRqNLOJm3GbdI1psYYsoHh+9EnuqBgPo68UWpPnKHIPFDxnNjAcfeLIeeEy09E1j12Tim 5mhmKmcTyHd2ucm9857isWeRIeDAGGU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741691926; a=rsa-sha256; cv=none; b=yT6jlc+a7lFDoz+ldFTU5iXzrshlPask9R8d4a+Z0epleZrOpeAk65ienbHROq7YEJYxNQ vsrCfC9UdJs9YtnjQQmbR/+XQij4IwFLrsOtUu/4ASpFgnSXyPMXgzCnP7ZitoCLbBjfzF 2oGNErJQMfMt5GD/cv6hwruQs4wFXoQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=aR5Xv+JX; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf29.hostedemail.com: domain of maz@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=maz@kernel.org Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id B1A9A5C5D1D; Tue, 11 Mar 2025 11:16:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1FC84C4CEE9; Tue, 11 Mar 2025 11:18:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1741691925; bh=eV1Ot+6YRrZcBA7zWzuNDm7nIRQcDW2LUambjbhiAhs=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=aR5Xv+JXW/OGpjkpBu6OK786z2RKa6Uftin8rPAQP29jmSSBS7ZZSdyHN0j1A5RIO KopJBmyosJTsEE8bck5mz6EKpGpxitwM6TLjkSfHGP2IjlPunbngwds6yIFPkNo9yq b2+QqXvJ9g7mV+9ZnlLbH9O2pejE0Cj8b9Dt0n5lW9PthEqBPXTDQw1qNxha32mm4K wexs1y1c72xFIlrENyFXdBemKuVmVPsM3pwtNn2+o/w5DK3zY11vswz+QFWG1yBRlb NUfpjSoD8h+3WmMwAKrHJQoqLsaYfPlXMShejU/4CewkdungRFrJcRgIQbeRobBZVh m+MFd1Dd7nYYQ== Received: from sofa.misterjones.org ([185.219.108.64] helo=goblin-girl.misterjones.org) by disco-boy.misterjones.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1trxdB-00CVMK-Ga; Tue, 11 Mar 2025 11:18:42 +0000 Date: Tue, 11 Mar 2025 11:18:40 +0000 Message-ID: <86r033olwv.wl-maz@kernel.org> From: Marc Zyngier To: Ankit Agrawal Cc: Jason Gunthorpe , "oliver.upton@linux.dev" , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "catalin.marinas@arm.com" , "will@kernel.org" , "ryan.roberts@arm.com" , "shahuang@redhat.com" , "lpieralisi@kernel.org" , "david@redhat.com" , Aniket Agashe , Neo Jia , Kirti Wankhede , "Tarun Gupta\ (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , Zhi Wang , Matt Ochs , Uday Dhoke , Dheeraj Nigam , Krishnakant Jaju , "alex.williamson@redhat.com" , "sebastianene@google.com" , "coltonlewis@google.com" , "kevin.tian@intel.com" , "yi.l.liu@intel.com" , "ardb@kernel.org" , "akpm@linux-foundation.org" , "gshan@redhat.com" , "linux-mm@kvack.org" , "ddutile@redhat.com" , "tabba@google.com" , "qperret@google.com" , "seanjc@google.com" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags In-Reply-To: References: <20250310103008.3471-1-ankita@nvidia.com> <20250310103008.3471-2-ankita@nvidia.com> <861pv5p0c3.wl-maz@kernel.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) SEMI-EPG/1.14.7 (Harue) FLIM-LB/1.14.9 (=?UTF-8?B?R29qxY0=?=) APEL-LB/10.8 EasyPG/1.0.0 Emacs/29.4 (aarch64-unknown-linux-gnu) MULE/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI-EPG 1.14.7 - "Harue") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-SA-Exim-Connect-IP: 185.219.108.64 X-SA-Exim-Rcpt-To: ankita@nvidia.com, jgg@nvidia.com, oliver.upton@linux.dev, joey.gouly@arm.com, suzuki.poulose@arm.com, yuzenghui@huawei.com, catalin.marinas@arm.com, will@kernel.org, ryan.roberts@arm.com, shahuang@redhat.com, lpieralisi@kernel.org, david@redhat.com, aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com, targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com, apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com, zhiw@nvidia.com, mochs@nvidia.com, udhoke@nvidia.com, dnigam@nvidia.com, kjaju@nvidia.com, alex.williamson@redhat.com, sebastianene@google.com, coltonlewis@google.com, kevin.tian@intel.com, yi.l.liu@intel.com, ardb@kernel.org, akpm@linux-foundation.org, gshan@redhat.com, linux-mm@kvack.org, ddutile@redhat.com, tabba@google.com, qperret@google.com, seanjc@google.com, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org X-SA-Exim-Mail-From: maz@kernel.org X-SA-Exim-Scanned: No (on disco-boy.misterjones.org); SAEximRunCond expanded to false X-Stat-Signature: fugaoyasjs1hwxxxrnjux3icrx75d9ep X-Rspam-User: X-Rspamd-Queue-Id: 9E1EA120003 X-Rspamd-Server: rspam04 X-HE-Tag: 1741691926-976389 X-HE-Meta: U2FsdGVkX1884Ev+81roiSAxpkpN21ArY928x+s+I5X7s0EqWYKqiVWTa0b6vmwQNVy9nC2K3ZngUWK9mIFeQlLgBkpfpeCCjdSJksBPkLIMDvtt1AO6ZTX4aCl0vSxTXbbwQDrvVXFl9ub1mkod2Z/8qZqxeUB0zNdXIQafzayM3r01iRPAF/XBoJgcZi2yEVRk3bw8dpfVgVVGZDdvM/mefo9mHKwZC4bNBPUg/4CiasPXUXMhuXyxbLSfR1/UGMs+iHhOcKYj6DIzdodkJonCdmosDta8wGhA1s4/15xLwXHsnp2isOiTdYDCYalAA4gmT68EKGDP86zpFPxCGH6zDlm+9fsqR5xaU1MnV8vJAK8ZC5zCxx/WTbvc+biqBpwlv/paQlasDh41LXC7NruA9sAWtyCYiQsbiGwYW9QhcXj/QE1dqX50QIWSvXYPhrXaZikJqX/+3oHKFyEtxHweLS71urwjYcIE8GIaepbhzepzo/LIW8/czUmmSSlynMakPpkw4SQ0N0uEAA/3RjXTpN9dcmhQVsGkqVgF2Eq5HIzrqC0O57mjpNvfEmHKI7dLx5OQN95YNBPYC1CNOa2Zt55fhJYwnVExVLDNI1Htdh9HTN5niDznZXCYvMN010QW6zf9ezMXazj+aHwf+CJdWdFm1XE1ovy36XExNLc0wgpHFPFuiMVdlcAW0wGkLnXlcOr1Z/LTQlZBdN/fZOzV4c1htrxZWcP9lfw9HWHb++ssF+sHcuLp2CVtKVBrAbFHpFNZi680IcJCdzmW814kxkYL549duH4/ERACvqTrblTyqYSokLhpyCQdygDKkIB3bsJCDv4T5rb4vBJedzgf0LfA7M79U8pv45ThKGvwoJ+0L3RMv5JnkbzPyWGXY8mS+551GtFE3/4LC5/wHijiPDw3EC26cqkJ3PgwW3t/FYR96xd27c7eDMdimORk5bGp4KUd8J/YOdmn+0s Vk5QAbRG NsmLlMVaEKy3MhjGArhlmwOgqWbNUpb7aBzmmze6aZ+1pfYLnDR5TjWUNpS3DxKEXyJBGhLXHa26Xtyw54Lu0zhyNdHCuvvCQlaDry+uCGLF222N1l1nR508jl5qm3bhoJTNk84Hs6SUi4tdoiUyFOF4DSzJF/eIpacnc1TqtcduWIB+B8DTGT5YLVNHFgiedsSXdlg3yIgeOCqCsqNaH0mbsHFCjC2L83vsE7yBlreBfsiAC/io40n50Sca231WxWVIUJ38qtNXDwLJIjeSm07qByTpJzqDwFAXSjR6/bXwMxmkLlADAI3tZ08nmwq89SZNT X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, 11 Mar 2025 03:42:23 +0000, Ankit Agrawal wrote: >=20 > >> +=C2=A0=C2=A0=C2=A0=C2=A0 /* > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *=C2=A0 When FWB is unsupported KVM ne= eds to do cache flushes > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *=C2=A0 (via dcache_clean_inval_poc())= of the underlying memory. This is > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *=C2=A0 only possible if the memory is= already mapped into the kernel map. > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *=C2=A0 Outright reject as the cacheab= le device memory is not present in > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *=C2=A0 the kernel map and not suitabl= e for cache management. > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ > >> +=C2=A0=C2=A0=C2=A0=C2=A0 if (cacheable_devmem && !stage2_has_fwb(pgt)= ) { > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 ret =3D -EINVAL; > >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 goto out_unlock; > >> +=C2=A0=C2=A0=C2=A0=C2=A0 } > >> + > > > > These new error reasons should at least be complemented by an > > equivalent check at the point where the memslot is registered. It >=20 > Understood. I can add such check in kvm_arch_prepare_memory_region(). >=20 >=20 > > maybe OK to blindly return an error at fault time (because userspace > > has messed with the mapping behind our back), but there should at > > least be something telling a well behaved userspace that there is a > > bunch of combination we're unwilling to support. >=20 > How about WARN_ON() or BUG() for the faulty situation? Absolutely not. Do you really want any user to randomly crash the kernel because they flip a mapping, which they can do anytime they want? The way KVM works is that we return to userspace for the VMM to fix things. Either by emulating something we can't do in the kernel, or by fixing things so that the kernel can replay the fault and sort it out. Either way, this requires some form of fault syndrome so that usespace has a chance of understanding WTF is going on. > > Which brings me to the next point: FWB is not discoverable from > > userspace. How do you expect a VMM to know what it can or cannot do? >=20 > Good point. I am not sure if it can. I suppose you are concerned about er= ror > during fault handling when !FWB without VMM having any clear indications > of the cause? No, I'm concerned that a well established API (populating a memslot) works in some case and doesn't work in another without a clear indication of *why* we have this behaviour. To me, this indicates that userspace needs to buy in this new behaviour, and that behaviour needs to be advertised by a capability, which is in turn conditional on FWB. > Perhaps we can gracefully fall back to the default device mapping > in such case? But that would cause VM to crash as soon as it makes some > access violating DEVICE_nGnRE. Which would now be a regression... My take is that this cacheable PNFMAP contraption must only be exposed to a guest if FWB is available. We can't prevent someone to do an mmap() behind our back, but we can at least: - tell userspace whether this is supported - only handle the fault if userspace has bought in this mode - report the fault to userspace for it to fix things otherwise M. --=20 Without deviation from the norm, progress is not possible.