From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA35ECEFC39 for ; Tue, 8 Oct 2024 18:07:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D6BB6B0092; Tue, 8 Oct 2024 14:07:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 686706B0098; Tue, 8 Oct 2024 14:07:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 550046B0099; Tue, 8 Oct 2024 14:07:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3558F6B0092 for ; Tue, 8 Oct 2024 14:07:23 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 2B97EC092F for ; Tue, 8 Oct 2024 18:07:21 +0000 (UTC) X-FDA: 82651217124.20.4AE7867 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf20.hostedemail.com (Postfix) with ESMTP id F11441C0022 for ; Tue, 8 Oct 2024 18:07:20 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=x6O3zOi9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 313QFZwsKCBYwy60D70KF922AA270.yA8749GJ-886Hwy6.AD2@flex--ackerleytng.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=313QFZwsKCBYwy60D70KF922AA270.yA8749GJ-886Hwy6.AD2@flex--ackerleytng.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1728410731; a=rsa-sha256; cv=none; b=mrvvfbXidQDTQqlYx+Y09TcJ9zImZ9aLPfs76mhtKzCRidoPx9Vg7qkIEbhXJBouSX5jqM d0w2GhAVwY9RIQh2kbQZzpWmy+xq5cMDREnoOVXRSeiUv1DuPUd/UrQqiwkpqQQHbGyWPP inbPby8bWQ9v2/2oGUjby1cAGj0MyZE= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=x6O3zOi9; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 313QFZwsKCBYwy60D70KF922AA270.yA8749GJ-886Hwy6.AD2@flex--ackerleytng.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=313QFZwsKCBYwy60D70KF922AA270.yA8749GJ-886Hwy6.AD2@flex--ackerleytng.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1728410731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=E+Uxs2xH4NoLlbk4hH0dKufZ3jGhZXR1OulYKjDzCf8=; b=wEg5lqnpnhrmo4w9wizG0tHxMFbtZUg/fiCRlGy9mOHDVswTCDyZnv7OaCk5W9YOe3pAqG Ykw7L4UafCGQT3RSbqXKO2Q5I8rbQS03dpCwSZ5FaYm0jN+uxPWs/7lZ+y5Td/iYJHMk34 330k9KEHq1GZARItA9EEDQz0Fk/HhHw= Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-6e2d287f944so66198257b3.3 for ; Tue, 08 Oct 2024 11:07:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728410840; x=1729015640; darn=kvack.org; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date:from:to :cc:subject:date:message-id:reply-to; bh=E+Uxs2xH4NoLlbk4hH0dKufZ3jGhZXR1OulYKjDzCf8=; b=x6O3zOi9Op29wsjnAhkpr1O+NR5MHroGDWe00G7TxNc+uxz7Nu2w4WwsyCMKw+qGgB ujwOfKT6c6LQJg7pJRF4/x/ZCpSEPdOLreAVqhoJgzE2Rso9FR+R4/96U7KkpK1FVDtF lPW9uslnScteuZCFBYZrPIsCzTNIHWLGTvrqjDFf3OKO6bxAM1opav5XDESTLB2SYVzk tKiYX7fj06MtzTYpNdpouV0O65HVLZHreja9ktBU5Eb0MWZiWPxYUv/wOmMNEh0Gsk9U p+G74odqTvAE0CbUyCwZG+mUl2MZe/+TuqO7whmSCIWX9QIV5/z4vhKxWGrDrmM3+efV 62YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728410840; x=1729015640; h=cc:to:from:subject:message-id:mime-version:in-reply-to:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=E+Uxs2xH4NoLlbk4hH0dKufZ3jGhZXR1OulYKjDzCf8=; b=H0xrBb9RYqaOYyGL0K6BUI+akMLUirFw14adgGa2vIz1EzBESMdZwXgBj+Et4C4cAS MQrJ101KpsipqIXiqil8DMn8U2iYxoV27tfN1ZAic7AmlISQDqQ0vlAlEj2Yuwff1O0+ Ohi+p8xdW38DqNsaJGaXI/8OESbIc5OCZ+q/cn+M/OYYkWNAznf2cB/xWDbFMrvsQVJL X7ZUGMWEqmKo9e8KqZ0JgE/K7dXAskZCu3AG9l6X8hg1OTDl7NfgVgKve1pkHBsO0ead dZs1/ELLX2Cq6B7sSx3fp7r/QQD6gqcE5pKBC97rmJf/zgxAabSDKKGg9haWG5YTGWkZ em8w== X-Forwarded-Encrypted: i=1; AJvYcCX/OSPKJ1TvaUF7XXiQFk24NlJc0ywWw27wXGXyZeQ8K8tZnnKehCwMT0L8P2rLeLNcND8m2GABSA==@kvack.org X-Gm-Message-State: AOJu0Ywlzw3hqH1Q4AIz2LIRz3pwxJbjtsVMuG4EcRtNBjuViZ3OAv2j sj07v9b7cwPabz9kmAfRV4e6SnnM3GGCl9osFV/VMnClcUjP4PpvOiUlF/Uz3WDcJZd+Y1IbTIa eEFBSaYhKL6Qw1ypHpTLUAQ== X-Google-Smtp-Source: AGHT+IE9oeDn9TjYqmz9fnNZvA2vGhI22YX38uYS3CQQYmwwAnoN7YG1nYKm0OprX/ZwSMpbDB3n14/P1EgAWSj+Kg== X-Received: from ackerleytng-ctop.c.googlers.com ([fda3:e722:ac3:cc00:146:b875:ac13:a9fc]) (user=ackerleytng job=sendgmr) by 2002:a05:6902:4a8:b0:e28:e4a7:3206 with SMTP id 3f1490d57ef6-e28e4a732e6mr33127276.8.1728410839734; Tue, 08 Oct 2024 11:07:19 -0700 (PDT) Date: Tue, 08 Oct 2024 18:07:18 +0000 In-Reply-To: (message from Patrick Roy on Mon, 7 Oct 2024 16:56:42 +0100) Mime-Version: 1.0 Message-ID: Subject: Re: [RFC PATCH 30/39] KVM: guest_memfd: Handle folio preparation for guest_memfd mmap From: Ackerley Tng To: Patrick Roy Cc: quic_eberman@quicinc.com, tabba@google.com, jgg@nvidia.com, peterx@redhat.com, david@redhat.com, rientjes@google.com, fvdl@google.com, jthoughton@google.com, seanjc@google.com, pbonzini@redhat.com, zhiquan1.li@intel.com, fan.du@intel.com, jun.miao@intel.com, isaku.yamahata@intel.com, muchun.song@linux.dev, mike.kravetz@oracle.com, erdemaktas@google.com, vannapurve@google.com, qperret@google.com, jhubbard@nvidia.com, willy@infradead.org, shuah@kernel.org, brauner@kernel.org, bfoster@redhat.com, kent.overstreet@linux.dev, pvorel@suse.cz, rppt@kernel.org, richard.weiyang@gmail.com, anup@brainfault.org, haibo1.xu@intel.com, ajones@ventanamicro.com, vkuznets@redhat.com, maciej.wieczor-retman@intel.com, pgonda@google.com, oliver.upton@linux.dev, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-fsdevel@kvack.org, jgowans@amazon.com, kalyazin@amazon.co.uk, derekmn@amazon.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F11441C0022 X-Stat-Signature: wpdz4ybiwjet791cnbh1wmznxxw5ewp3 X-Rspam-User: X-HE-Tag: 1728410840-111556 X-HE-Meta: U2FsdGVkX1/CLs0DIMT2GtkZX2Px3AR6hq9Ve/FfHYg+7t9jkpVGkNwGdbS0pa0r0s6hFsXEFwRjsO23T3ARzBrHQiCz8Xkzg1k1DzgB3dtGi9et9rFZL5WGCnIjMIsbEENapnpI0Hh9ou6vITumisYgiSWYIz9RdjzEy2WEoM9oQzRweER3BWHiPch7fyFkIurUW0P0VascuWAMn9DhTAcLC61zX7gIIdvQn33yOIveCgq/2ocYRpxgACDWxiDjUnyNOorhf3TquINlhdrA9jzlBHGqtsf46olsHqX72J8i7X45XavqE7v3yW2NMzRIanArSHFKsAYcr/LvTFUedy5K+KoHFNqTJMVl5Pao8drO94mdBb6b3a6qnLo+k7wKjwzL/Pf6DnpF47KxOF9GUbPxGfxWYHGCQfayl3lGr+YOMwIMrfyJLnmxyXSJmjwMUtyx3GnmfaYSwP2X0zBWBF6OFFjo3ziNvjOEDAkhlNu9f8zCpkk6eeeKfvutMgpKLE6VvIIj3O23pcSHG7Eep3XItAd7Xu77GLd/ZwYKNl7g+xY8XUqx/Yd37ceZVb9Z9UV1pgoSVA+3Ygu1F3DvTLfhAW/hA9ixd5E4TuyupNAYpnBK0Rh8XosCmZ2O/JARHK7jX8xPThV3xtW+iw2k3o+OwqiaklwBkjJQXcbyv7KzYp3hTLNZtaR43yxQYF0M3GNIHAGkFgP7fDUhf623ndCplfDZNZzLfHKkUivhbadRzUzSA2au7F8Z2RVqV/Fc/KrRqPKE+qlPEs/5Tv70vulV4IN3ltlxxt+78TEdhEZplK4F9HrBQ3+PzXO6FVhWcwnTZGLHx4sbmXnfKcXOWMkSkZPevxw9EWAowaqoxbtWc8yZK5Fm9fupf6UgLPlxlr4XvYneSMbloVpHwxrsHFQN6FsWfI/OaGe3H+qnEy7XVKS5dVIX0Soe/N1RmuCiA0Pn3MwrY+Tm0Wyi0QG FWYn+xr8 YVYQz2a/RHg+5yoKmQdYCXA8lxTmcQHdmat59VOAIVLIFQ0VInc3njGudwEJANmhJy1zMxVbj0GFWNRmMl02W3kC/d29edhZLPEXbgZ9lxBW/kTKVBln9LHO/RBowTYn3f/W6Q6WgwVMVL4K5YxW/PYuEgSh/gwx00SBKjX4Iem64EeNU8Q6fmo/xmoP5Ma/wJhREREibjUvG1RE06mkSzfl8mSRBSnRlHcDZt6cmPA+y6ObRoRFOGbyEVl/Cpsc6mWj65sPe1bhCmiDMdzdVBaj9ce86N2mUtB3+tbfbxJ9O3UfyiH/A8k7l88emyplQHB/koGKgPLID5WYlpA/be3P9dWKrGo2Y95sRrowWmlcZ022DHmirA47ReNFrfQZkLLkhRhyD+HBV+CXRE3VlS1ALcUkGome1TDieVMzU7KxcIb+VXLNpHOOqHIMFPfNCMPE2eMcZln6ahrQ8gvf+/b1XDcGaD6JXgiT8tuibBycgS2tM2BHsznRYiefb4/KGA9rtCyAVCRBDd2v35hjHqDqwlFeVzbbEaNE4Ur1oqgdGHDsPJ+dLVggA1oOi/65dNHJdB07AIH8fMnOtmDDdtVINCN5wMbIy5mDu7ES1E1gp1IEKjBgxN9IukFWVGGs0jEijjsbJjL/QOxZFg61aQmR7VoAHmEK2wUxYEgQSLNd1YLyiCZ+TkHi8xvKCw0zPE9pmXnZdllMQLLJqwxjYmWWTarJHDJfqU90IYWP10LecShBm4AgAGmTEuYEis7VMjFzdRyEVYB1Kw6DsEIAZzONZ1YBYGYwBm+WoxkfOSOC2mZTp7HMoaBI+tU9qBBBO7eiJWoS6KJ7MNU1Z5cU0wtWzt1xZWTA0rkLVpWvVOi4IMGCPdawEH0iZIfj1lx1uVoJyqpRYc9GyVIVMk7ArLJkZMA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Patrick Roy writes: > Hi Ackerley, > > On Thu, 2024-10-03 at 22:32 +0100, Ackerley Tng wrote: >> Elliot Berman writes: >> >>> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote: >>>> Since guest_memfd now supports mmap(), folios have to be prepared >>>> before they are faulted into userspace. >>>> >>>> When memory attributes are switched between shared and private, the >>>> up-to-date flags will be cleared. >>>> >>>> Use the folio's up-to-date flag to indicate being ready for the guest >>>> usage and can be used to mark whether the folio is ready for shared OR >>>> private use. >>> >>> Clearing the up-to-date flag also means that the page gets zero'd out >>> whenever it transitions between shared and private (either direction). >>> pKVM (Android) hypervisor policy can allow in-place conversion between >>> shared/private. >>> >>> I believe the important thing is that sev_gmem_prepare() needs to be >>> called prior to giving page to guest. In my series, I had made a >>> ->prepare_inaccessible() callback where KVM would only do this part. >>> When transitioning to inaccessible, only that callback would be made, >>> besides the bookkeeping. The folio zeroing happens once when allocating >>> the folio if the folio is initially accessible (faultable). >>> >>> From x86 CoCo perspective, I think it also makes sense to not zero >>> the folio when changing faultiblity from private to shared: >>> - If guest is sharing some data with host, you've wiped the data and >>> guest has to copy again. >>> - Or, if SEV/TDX enforces that page is zero'd between transitions, >>> Linux has duplicated the work that trusted entity has already done. >>> >>> Fuad and I can help add some details for the conversion. Hopefully we >>> can figure out some of the plan at plumbers this week. >> >> Zeroing the page prevents leaking host data (see function docstring for >> kvm_gmem_prepare_folio() introduced in [1]), so we definitely don't want >> to introduce a kernel data leak bug here. >> >> In-place conversion does require preservation of data, so for >> conversions, shall we zero depending on VM type? >> >> + Gunyah: don't zero since ->prepare_inaccessible() is a no-op >> + pKVM: don't zero >> + TDX: don't zero >> + SEV: AMD Architecture Programmers Manual 7.10.6 says there is no >> automatic encryption and implies no zeroing, hence perform zeroing >> + KVM_X86_SW_PROTECTED_VM: Doesn't have a formal definition so I guess >> we could require zeroing on transition? > > Maybe for KVM_X86_SW_PROTECTED_VM we could make zero-ing configurable > via some CREATE_GUEST_MEMFD flag, instead of forcing one specific > behavior. Sounds good to me, I can set up a flag in the next revision. > For the "non-CoCo with direct map entries removed" VMs that we at AWS > are going for, we'd like a VM type with host-controlled in-place > conversions which doesn't zero on transitions, so if > KVM_X86_SW_PROTECTED_VM ends up zeroing, we'd need to add another new VM > type for that. > > Somewhat related sidenote: For VMs that allow inplace conversions and do > not zero, we do not need to zap the stage-2 mappings on memory attribute > changes, right? > Here are some reasons for zapping I can think of: 1. When private pages are split/merged, zapping the stage-2 mappings on memory attribute changes allows the private pages to be re-faulted by KVM at smaller/larger granularity. 2. The rationale described here https://elixir.bootlin.com/linux/v6.11.2/source/arch/x86/kvm/mmu/mmu.c#L7482 ("Zapping SPTEs in this case ensures KVM will reassess whether or not a hugepage can be used for affected ranges.") probably refers to the existing implementation, when a different set of physical pages is used to back shared and private memory. When the same set of physical pages is used for both shared and private memory, then IIUC this rationale does not apply. 3. There's another rationale for zapping https://elixir.bootlin.com/linux/v6.11.2/source/virt/kvm/kvm_main.c#L2494 to do with read vs write mappings here. I don't fully understand this, does this rationale still apply? 4. Is zapping required if the pages get removed/added to kernel direct map? >> This way, the uptodate flag means that it has been prepared (as in >> sev_gmem_prepare()), and zeroed if required by VM type. >> >> Regarding flushing the dcache/tlb in your other question [2], if we >> don't use folio_zero_user(), can we relying on unmapping within core-mm >> to flush after shared use, and unmapping within KVM To flush after >> private use? >> >> Or should flush_dcache_folio() be explicitly called on kvm_gmem_fault()? >> >> clear_highpage(), used in the non-hugetlb (original) path, doesn't flush >> the dcache. Was that intended? >> >>> Thanks, >>> Elliot >>> >>>> >>>> >> >> [1] https://lore.kernel.org/all/20240726185157.72821-8-pbonzini@redhat.com/ >> [2] https://lore.kernel.org/all/diqz34ldszp3.fsf@ackerleytng-ctop.c.googlers.com/ > > Best, > Patrick