From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3FC9AC36008 for ; Wed, 26 Mar 2025 16:10:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 369E0280094; Wed, 26 Mar 2025 12:10:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F3C428008D; Wed, 26 Mar 2025 12:10:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19457280094; Wed, 26 Mar 2025 12:10:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E87A528008D for ; Wed, 26 Mar 2025 12:10:49 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C3B01B9DBE for ; Wed, 26 Mar 2025 16:10:50 +0000 (UTC) X-FDA: 83264190660.09.130C402 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf26.hostedemail.com (Postfix) with ESMTP id 7D845140016 for ; Wed, 26 Mar 2025 16:10:48 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XMTkHZ58; spf=pass (imf26.hostedemail.com: domain of 3BifkZwYKCCsZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3BifkZwYKCCsZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743005448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hLrRlp5KJ9KUIi71uk+34vr7bPpoyZem9rwFkZwg0ow=; b=CpNqVB6r+8eBF84Zbdi8YHNLfULsMCjnzWJPEpZqyn93gvW8rI7I7eXBO1ZC00oERw62xC R2CxzkqWixkRPWQbsglA8dyakW7h6Jpb6zw38be+fPz7jkznnEdNalKMVUhNX3VCF2pmPk 61hhoUxXC6n9C08ePgbaF2sk/z7QrO8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=XMTkHZ58; spf=pass (imf26.hostedemail.com: domain of 3BifkZwYKCCsZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3BifkZwYKCCsZLHUQJNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--seanjc.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743005448; a=rsa-sha256; cv=none; b=8J1TKZ1F+SjJDUJRxsitu6iM3aCPsEjd0E+KHW02cWSHVJJeiJ+tJdsGySRM7/1K8ojz7f 5tASnRwYEYPRYHuWasBfKVyjtI+zte91jPatNBdZGJkt4ie9/ZS37RydXMt5U63w+DMkwH H/CCFLbAy2bn6juZcQCLFk/sINuOE6o= Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-2ff798e8c93so12109134a91.2 for ; Wed, 26 Mar 2025 09:10:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1743005447; x=1743610247; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hLrRlp5KJ9KUIi71uk+34vr7bPpoyZem9rwFkZwg0ow=; b=XMTkHZ58moCaCSpKtaCRSVfzLW4GiIhqQOl9H+Y318khTSngNcl9LhCtqglkUeckCH g9dU6APOgJMI4VgQah7t1KqjEO/+vPSf5obM5VcIX+jPr4gZGYOQCLsFZ1WF2/XH5/Iw eDna9Fy5Py3Dyy7gaKNiiUC19lsqHToZ8mdWVJW4hsYjGZyoDtJT6KiqD8PIingtWHZ8 MefjE9CeNsy1qXX50klGiSazHnIbwVFKqoBPvjDZKSQjSnewcfpk3c4vPBDatEBx0kNq hCw2HdN/wXp39Lhl8nv5Ox0dww9qu84m4mIobKMkpqq2rVG2w2JHPhCTPSvqLqEcBm8I QuGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743005447; x=1743610247; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hLrRlp5KJ9KUIi71uk+34vr7bPpoyZem9rwFkZwg0ow=; b=Zpo8LPh5plzlaNMkDRZEHEdPhDF2W1nY3ZSkNBX7frkNMAynjskICKbmCeMU0HILRS Eo8+qgv3Jr/B7Y5Np+RAG7X9PSD6f4roun0zND2E5nO0FPnkg75t0puDAeyDmZLo85M/ uEPiyA8xWo2g9ZtQou6MYYwqU7pRq1ZC9xZ89HB7wrVBBuanhq2cmmRFqcRA1CR96NW7 dv3wgWAX/B0+ClXiaTJLV5THfVQfAfREdY/QacE/S3EO0rDWyR9SmtorPcHKY56+8RgW fqizmAJH4JC/GmbPnN2VVia3VQgVDN/T1QeArvEBP5Sx26SuoeLbJtq3GVJ3Zuyz+wIZ 3Nrg== X-Forwarded-Encrypted: i=1; AJvYcCVzhibBLhsPw8XwbyTCt+UlwsS/oCgf5SrWHBOWpUnvCHLFcHWO5cdbzLCo8MrUx4wJix3pmP4MtQ==@kvack.org X-Gm-Message-State: AOJu0YxCk+9nK53Uvq3YZT1E6iDYsh/FRG5E0ASOBC1aFns5JxB8dKNp GsCBGM38GGHd1zoJOUMJjfL3ntmawgZVodgirBA85b22M+UVcHQkkLJvmnRQTB/2mP0ULtsx8qP S1g== X-Google-Smtp-Source: AGHT+IGMVcgjpgCj0fYnAQnPKXCiqk1ze3TosYAsajSIn5jZ7TsL42ec0e2/MVtEl+5C4vOlf2f6anST6us= X-Received: from pjbqx4.prod.google.com ([2002:a17:90b:3e44:b0:2fa:1803:2f9f]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:1f81:b0:2ff:784b:ffe with SMTP id 98e67ed59e1d1-303a7d6a9b4mr402639a91.11.1743005446972; Wed, 26 Mar 2025 09:10:46 -0700 (PDT) Date: Wed, 26 Mar 2025 09:10:45 -0700 In-Reply-To: <86y0wrlrxt.wl-maz@kernel.org> Mime-Version: 1.0 References: <20250318230909.GD9311@nvidia.com> <20250319170429.GK9311@nvidia.com> <20250319192246.GQ9311@nvidia.com> <86y0wrlrxt.wl-maz@kernel.org> Message-ID: Subject: Re: [PATCH v3 1/1] KVM: arm64: Allow cacheable stage 2 mapping using VMA flags From: Sean Christopherson To: Marc Zyngier Cc: Ankit Agrawal , Catalin Marinas , Jason Gunthorpe , Oliver Upton , "joey.gouly@arm.com" , "suzuki.poulose@arm.com" , "yuzenghui@huawei.com" , "will@kernel.org" , "ryan.roberts@arm.com" , "shahuang@redhat.com" , "lpieralisi@kernel.org" , "david@redhat.com" , Aniket Agashe , Neo Jia , Kirti Wankhede , "Tarun Gupta (SW-GPU)" , Vikram Sethi , Andy Currid , Alistair Popple , John Hubbard , Dan Williams , Zhi Wang , Matt Ochs , Uday Dhoke , Dheeraj Nigam , Krishnakant Jaju , "alex.williamson@redhat.com" , "sebastianene@google.com" , "coltonlewis@google.com" , "kevin.tian@intel.com" , "yi.l.liu@intel.com" , "ardb@kernel.org" , "akpm@linux-foundation.org" , "gshan@redhat.com" , "linux-mm@kvack.org" , "ddutile@redhat.com" , "tabba@google.com" , "qperret@google.com" , "kvmarm@lists.linux.dev" , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Content-Type: text/plain; charset="us-ascii" X-Rspamd-Queue-Id: 7D845140016 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: eeo8x6wddapoxp8fcppg9xus7up6r6sn X-HE-Tag: 1743005448-909821 X-HE-Meta: U2FsdGVkX18TLqdr8ON9RtpOiOxRTXWXTQd0zmpYmzo6yWaaldcI8o4AhZCwYWh9rwnq1XnLmFC/rwirfAMq+sKPZ/zFlwDSEL+f15fFsRwnXI3B1On19Qg6YZB3fdK3I+zR/pVes8SJIs8GWkWcFicDQSLzIjvZ+ez5eI3jRS52PiP3JxsG0u8njCCYEWswBh7uZbLGdE7K8hOyDJhIPi30L7OdxsyUHCWO3HZIkkYmb1kWgWpR9baJo9pnXWZPDkvR1W53mLJBwEz43a79EuXrYx0oPonY4Aq6AsHd3tJCTnreN/UuUC4cNpI+kBHXLH7xHDPgo7Phd1hJVjumE4QyFiX8rSrKMyAvx/mxW6ZauobzGIhLS2TTjCvM3wLyKZo6c1pps/byEONuFBFxlRDgLBixDLDR0cxWeW7MriaBEiobxONRF5ccXYvk67jkJc9aulsCvl9vvMYpLHi9nB6Waas8dHVWrS+sA7vD9LXY4cPrakspNJ+Iq1Kr4YCI7QB6FCmoi/zHRlNReSv4YL4AMWJPd0t3bDiY5+wbCVCENbCGj5/HrbQz1UzvTeqiJ3IDUNEpNGbLhgQtromJSKtgq/Z0U2KHVNveAdvKlJKwURlm5Sxcfk4eQOUrALu+9/YnEzeKgqgutUBLaVzxKv8vLdmuKUFGP8ULqwBaWrw4kT1fVtlrjLRZGl1PjuiuOI7Qtz/TCioTdFskVFiFNZ+5hM66i5VpGvkUcidA54VD3e5pire8w8g9cbzQV/AJFEI+oi12F72udkmNibqKPxtxsRjhCEjeysP8Cyw9MeE84zAgN4mJUCTjN7MMtmHAWAwsHKI1DyHNVR2ZFVyELN4PrqlLwlSqhKWIr89JNl5m0r9+lcQIxzcxSeJNlKJfLjEaEEafhHaBiemXdPZsopmYFFizircmwn8Sg2ch7DSxy45fINEYdUOwuLL5Ozrj4I37UnylrBnblmZdJb2 zbE0i4ag Cu2dxBHkdCqKif71PB019RO+3sqOP7jNbuD3+18aj7WYemWSCrJAzuCSsoeR5AcdTP208rjQwRirljwbKHTxjDw2RLez9c07+f3bxKB9kQE51wPpq7Xhd5/vVaC9XXwK1U0CuJzUsfI6HYZ6ehHXmGGhUeahTNmu42fVazXRDDyt1royNDCQEqYMOm6NBUcfW/CALGMq5Tk/uCEFnLOfxTutyPYIzKPd3EeKVgAiQpjPP6gsaU7ZB57zVqn7jns3500bqM+QGcDfqM6VnH7Ere2Rgl7K6FRhUSnIFjqPr4UnlOtTYxSaufC8u/JuypsVgAAAs/JrPMjHQSmd3pkWdhSx5ikUs+QDu8IUeOLQr3BL8oS6qYVvHqZXqzVMuE1u8bC/81qa/ncL4Ca9/C6nzDhfZTX3ZtvUbXbk5ExKTjG65sLzPR8vcMqQr4eBn7paPs4eI0OltfrPnoFXd3YmIl1MmkgrRIqMDsOY6hoj753A/kVWsWpf0WMCMwnX33FwWO9A4pFnNdOSqJpTWnio9m/gyfABr8YmFgtI8a0e7MODQkjxyEpiGVBGAckeEqymHPpKtT8XUaLUDFhE4W5733J9+5sea700cBmI6+Dx3DaiqW7J2Y1jHXdt9Bw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000002, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 26, 2025, Marc Zyngier wrote: > On Wed, 26 Mar 2025 14:53:34 +0000, > Sean Christopherson wrote: > > > > On Wed, Mar 26, 2025, Ankit Agrawal wrote: > > > > On Wed, Mar 19, 2025 at 04:22:46PM -0300, Jason Gunthorpe wrote: > > > > > On Wed, Mar 19, 2025 at 06:11:02PM +0000, Catalin Marinas wrote: > > > > > > On Wed, Mar 19, 2025 at 02:04:29PM -0300, Jason Gunthorpe wrote: > > > > > > > On Wed, Mar 19, 2025 at 12:01:29AM -0700, Oliver Upton wrote: > > > > > > > > You have a very good point that KVM is broken for cacheable PFNMAP'd > > > > > > > > crap since we demote to something non-cacheable, and maybe that > > > > > > > > deserves fixing first. Hopefully nobody notices that we've taken away > > > > > > > > the toys... > > > > > > > > > > > > > > Fixing it is either faulting all access attempts or mapping it > > > > > > > cachable to the S2 (as this series is trying to do).. > > > > > > > > > > > > As I replied earlier, it might be worth doing both - fault on !FWB > > > > > > hardware (or rather reject the memslot creation), cacheable S2 > > > > > > otherwise. > > > > > > > > > > I have no objection, Ankit are you able to make a failure patch? > > > > > > > > I'd wait until the KVM maintainers have their say. > > > > > > > > > > Maz, Oliver any thoughts on this? Can we conclude to create this failure > > > patch in memslot creation? > > > > That's not sufficient. As pointed out multiple times in this thread, any checks > > done at memslot creation are best effort "courtesies" provided to userspace to > > avoid terminating running VMs when the memory is faulted in. > > > > I.e. checking at memslot creation is optional, checking at fault-in/mapping is > > not. > > > > With that in place, I don't see any need for a memslot flag. IIUC, without FWB, > > cacheable pfn-mapped memory is broken and needs to be disallowed. But with FWB, > > KVM can simply honor the cacheability based on the VMA. Neither of those requires > > Remind me how this work with stuff such as guestmemfd, which, by > definition, doesn't have a userspace mapping? Definitely not through a memslot flag. The cacheability would be a property of the guest_memfd inode, similar to how it's a property of the underlying device in this case. I don't entirely see what guest_memfd has to do with this. One of the big advantages of guest_memfd is that KVM has complete control over the lifecycle of the memory. IIUC, the issue with !FWB hosts is that KVM can't guarantee there are valid host mappings when memory is unmapped from the guest, and so can't do the necessary maintenance. I agree with Jason's earlier statement that that's a solvable kernel flaw. For guest_memfd, KVM already does maintenance operations when memory is reclaimed, for both SNP and TDX. I don't think ARM's cacheability stuff would require any new functionality in guest_memfd. > > a memslot flag. A KVM capability to enumerate FWB support would be nice though, > > e.g. so userspace can assert and bail early without ever hitting an > > ioctl error. > > It's not "nice". It's mandatory. And FWB is definitely *not* something > we want to expose as such. I agree a capability is mandatory if we're adding a memslot flag, but I don't think it's mandatory if this is all handled through kernel plumbing. > > If we want to support existing setups that happen to work by dumb luck or careful > > configuration, then that should probably be an admin decision to support the > > "unsafe" behavior, i.e. an off-by-default KVM module param, not a memslot flag. > > No. That's not how we handle an ABI issue. VM migration, with and > without FWB, can happen in both direction, and must have clear > semantics. So NAK to a kernel parameter. > > If I have a VM with a device mapped as *device* on FWB host, I must be > able to migrate it to non-FWB host, and back. A device mapped as > *cacheable* can only be migrated between FWB-capable hosts. But I thought the whole problem is that mapping this fancy memory as device is unsafe on non-FWB hosts? If it's safe, then why does KVM needs to reject anything in the first place? > Importantly, it is *userspace* that is in charge of deciding how the > device is mapped at S2. And the memslot flag is the correct > abstraction for that. I strongly disagree. Whatever owns the underlying physical memory is in charge, not userspace. For memory that's backed by a VMA, userspace can influence the behavior through mmap(), mprotect(), etc., but ultimately KVM needs to pull state from mm/, via the VMA. Or in the guest_memfd case, from guest_memfd. I have no objection to adding KVM uAPI to let userspace add _restrictions_, e.g. to disallow mapping memory as writable even if the VMA is writable. But IMO, adding a memslot flag to control cacheability isn't purely substractive.