From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B2F0C5AD49 for ; Tue, 3 Jun 2025 00:54:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B29496B037B; Mon, 2 Jun 2025 20:54:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AB3366B037C; Mon, 2 Jun 2025 20:54:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A1B06B037D; Mon, 2 Jun 2025 20:54:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 786EA6B037B for ; Mon, 2 Jun 2025 20:54:37 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 1742B5FC6F for ; Tue, 3 Jun 2025 00:54:37 +0000 (UTC) X-FDA: 83512268994.01.382718F Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) by imf02.hostedemail.com (Postfix) with ESMTP id 6236380003 for ; Tue, 3 Jun 2025 00:54:34 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VuBygejV; spf=none (imf02.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 198.175.65.20) smtp.mailfrom=binbin.wu@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748912074; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=en/oopJ5sPuBPzRh2gOVMFPJJc/3S31HdrjGudD/6vo=; b=SMLIcw8guc4AOCJ2MhXn9f4n1s8v2Nv9avYL5NSAK9prV/dAfYBY2i4ZelQnYbhAho+2dn gUGfXRFMdipnCBeynKisewET0PfzaVuwZ91amPX7L7X3c/7dVaCMJIxvaomD0ZpzyvSAGG kCzhE7/Qc1B50WK7EKuIYz2o/eQD3X0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=VuBygejV; spf=none (imf02.hostedemail.com: domain of binbin.wu@linux.intel.com has no SPF policy when checking 198.175.65.20) smtp.mailfrom=binbin.wu@linux.intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748912074; a=rsa-sha256; cv=none; b=5uTNwZURXkR6uYWZtz504MPAQ+fksIIXfthZX72rBRRfcJ9ionSKyJr/+NfljpIVTz5Hyr +hDj93pfqIuzzBJxG45BawPwF1umNNVxT0af3Bt4TrN4gi5HCnXOxWTVH63WaWcDX7eYbk 4xo6hn0HP3lhWNPUcd3Gi09MUNhqRo0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1748912074; x=1780448074; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=7+gygiq5R8sUDs1GPqrVEdunqF5B0n8zpsyKw46Jr7A=; b=VuBygejV3ET4ujg5kY1OA8a+F9VMky7QXdUDz7oEW3UpVWKntKJnM5/Q I/h19bAIFbpgPgZXMXZhn2IQJgNe4UCdBuqgtUBVnjOTrS2tK/pkg77sI /gfV5kAT4FOPV5/DDmUEEXZ56gjmYMuIX+gHlR2qn20ATZaPwfpX21tUj ePu++Xpv+kYum9Vt2j0SG1MclsmKBgaQC+0/hrwhtdjYTNGpGRJoeCLBZ rKPXVlWcECplH4OPc0zxVrb6Qf8ASR2Th3rs3ObVX10CXRIelEuMPO/V6 b8/tbXhngMpVIIX8Re1AablIMh4W6GxWjPM/VlQ/ercsqKU4bYbZyaanR A==; X-CSE-ConnectionGUID: 1OaUxIrwReeATyL+vbkL0Q== X-CSE-MsgGUID: QbsVWlMzRgiLcQQEVvfG+w== X-IronPort-AV: E=McAfee;i="6700,10204,11451"; a="50631053" X-IronPort-AV: E=Sophos;i="6.16,204,1744095600"; d="scan'208";a="50631053" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2025 17:54:32 -0700 X-CSE-ConnectionGUID: /oRaHirXSbWuZ1E+mJ7q3w== X-CSE-MsgGUID: 0kNnjuy9RZyvuctCxrOqng== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,204,1744095600"; d="scan'208";a="175649909" Received: from unknown (HELO [10.238.0.239]) ([10.238.0.239]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Jun 2025 17:54:10 -0700 Message-ID: <923d57f1-55f1-411f-b659-9fe4fafa734a@linux.intel.com> Date: Tue, 3 Jun 2025 08:54:07 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v2 04/51] KVM: guest_memfd: Introduce KVM_GMEM_CONVERT_SHARED/PRIVATE ioctls To: Ackerley Tng Cc: kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, x86@kernel.org, linux-fsdevel@vger.kernel.org, aik@amd.com, ajones@ventanamicro.com, akpm@linux-foundation.org, amoorthy@google.com, anthony.yznaga@oracle.com, anup@brainfault.org, aou@eecs.berkeley.edu, bfoster@redhat.com, brauner@kernel.org, catalin.marinas@arm.com, chao.p.peng@intel.com, chenhuacai@kernel.org, dave.hansen@intel.com, david@redhat.com, dmatlack@google.com, dwmw@amazon.co.uk, erdemaktas@google.com, fan.du@intel.com, fvdl@google.com, graf@amazon.com, haibo1.xu@intel.com, hch@infradead.org, hughd@google.com, ira.weiny@intel.com, isaku.yamahata@intel.com, jack@suse.cz, james.morse@arm.com, jarkko@kernel.org, jgg@ziepe.ca, jgowans@amazon.com, jhubbard@nvidia.com, jroedel@suse.de, jthoughton@google.com, jun.miao@intel.com, kai.huang@intel.com, keirf@google.com, kent.overstreet@linux.dev, kirill.shutemov@intel.com, liam.merwick@oracle.com, maciej.wieczor-retman@intel.com, mail@maciej.szmigiero.name, maz@kernel.org, mic@digikod.net, michael.roth@amd.com, mpe@ellerman.id.au, muchun.song@linux.dev, nikunj@amd.com, nsaenz@amazon.es, oliver.upton@linux.dev, palmer@dabbelt.com, pankaj.gupta@amd.com, paul.walmsley@sifive.com, pbonzini@redhat.com, pdurrant@amazon.co.uk, peterx@redhat.com, pgonda@google.com, pvorel@suse.cz, qperret@google.com, quic_cvanscha@quicinc.com, quic_eberman@quicinc.com, quic_mnalajal@quicinc.com, quic_pderrin@quicinc.com, quic_pheragu@quicinc.com, quic_svaddagi@quicinc.com, quic_tsoni@quicinc.com, richard.weiyang@gmail.com, rick.p.edgecombe@intel.com, rientjes@google.com, roypat@amazon.co.uk, rppt@kernel.org, seanjc@google.com, shuah@kernel.org, steven.price@arm.com, steven.sistare@oracle.com, suzuki.poulose@arm.com, tabba@google.com, thomas.lendacky@amd.com, vannapurve@google.com, vbabka@suse.cz, viro@zeniv.linux.org.uk, vkuznets@redhat.com, wei.w.wang@intel.com, will@kernel.org, willy@infradead.org, xiaoyao.li@intel.com, yan.y.zhao@intel.com, yilun.xu@intel.com, yuzenghui@huawei.com, zhiquan1.li@intel.com References: Content-Language: en-US From: Binbin Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 6236380003 X-Stat-Signature: utjdn3iiazrk1ofc9ry63bmx5yijowxu X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748912074-186608 X-HE-Meta: U2FsdGVkX19J7n1h9z1o1b6zd7VhD8DfASh7djXrrzCEqaZ2y/kADLu0KUtflbdbzJpCG9k2uuGkQyRFeerh3ISMgduV9iSTF6n+bkyrGW6CV62vzOVrCg06dDctfuY/PfxktaYy1kY54vHWEraMdydGnysd8Gmwv2QHfFBuXsDk1zYg0Myd+Gc2L5rf8kmDBHFRSVokMbxQdXRw4LP9rbg89GGT2vhQY01gvEJ3igC4HwPHVkeGLn2zth2iKbQx4UejeKSLNEv7NjHg7Pj/NdykImWTaxPV4ZJMnkbGmph7Q+3cH4DTIMbHfr3rGdt92qNykg1XQzHAjurXaVOSZGwFGBvGV8xpEN2Zse4X6lp4UAExf+RDTIcoNLkvOtdnZjqwPit6DBV/EBlwmngPoPaRfrzeN3/3X6zZ1Ayx6JzRlA6HrXyLiG6Hb1a0W0rwlmjpvbyKMlDKHger//pzHefcTE4vBNLLn+BnMMRKurxgK2lGS2I3MXSFih0k+soQcBDG43aXKfrjn+W9ZgioFeC+VgaEQmVFbhXBnP/on718QBzJwAP6YizJ+nQ6McIC2VNlfiBtt6pztC3ESP9ucLHtxcOrN8bEJ5mo9RmTt54Zjs12FabZvDiZX1ZPc1ir8jbdlbEro1iIpTtMLnZLSXyylk50mGCofUMzrmvJdE98ygBhsQ3l8zyslgJTMq7w+iDxyh9eK6bgRO5sVBlTwML+RSPu+dpq8KBGaR9frq71fgRBRU9R6C8NFbqwbxAl2dplh4Z69+rtsYRhjCnr2XeTeJTbBvuH4/AaAWxY1C4e5LXsH2k/tbM5tz0M1mTyR/PERFHj+3wZyuFflaVHmSjvNUw3ewpGfrOasxz4ntGYwmUWW8VK7s/ZbQb6h5c+eMOj3XN40xzvYn4TwKkB1EcFRTMw7Kevu0OKfbMO4JizPk3RfgTNwf7M/VWF0+SHRzLaBGdy4CqCt77z/vW XIZmkv7c ea8x6Aj/NR4Z4mv6hVpRiv5WjPH5DfMK6zlORg3mf3GGIZXPk9y7Il5hPR7jo1nCeWhYoaU1eHspcdXQ3ET+O+25Kcrsf7vzAgTRlOrXlxttSuZAECLMVy3la9xrZcJRnjrVLd1TiVgcwUtiNPtpggV0sikp2Y9HSHRVU3c31GWObOVxJBZyUJ3b5I3uMX8V1dTnojiVMLC4KWP97dMYBcdc+IfDIfg3HNyt5wjX4jmf5prhxqBg+6mmfRgLa9T0Mcl9fnp/GjODe29VXr+FZCObcWzYyqOD8a4r2OPzixhrlSLRUNcHw8T+GeMTBPQk6h4fGa05g5tQvfqMx8pRz+AVSKvPmQ3XXumJ57z12D7+AF+ovkw9W/3+ufek60YzJ1WSF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 5/31/2025 4:10 AM, Ackerley Tng wrote: > Binbin Wu writes: > >> On 5/15/2025 7:41 AM, Ackerley Tng wrote: >> >> [...] >>> + >>> +static int kvm_gmem_convert_range(struct file *file, pgoff_t start, >>> + size_t nr_pages, bool shared, >>> + pgoff_t *error_index) >>> +{ >>> + struct conversion_work *work, *tmp, *rollback_stop_item; >>> + LIST_HEAD(work_list); >>> + struct inode *inode; >>> + enum shareability m; >>> + int ret; >>> + >>> + inode = file_inode(file); >>> + >>> + filemap_invalidate_lock(inode->i_mapping); >>> + >>> + m = shared ? SHAREABILITY_ALL : SHAREABILITY_GUEST; >>> + ret = kvm_gmem_convert_compute_work(inode, start, nr_pages, m, &work_list); >>> + if (ret || list_empty(&work_list)) >>> + goto out; >>> + >>> + list_for_each_entry(work, &work_list, list) >>> + kvm_gmem_convert_invalidate_begin(inode, work); >>> + >>> + list_for_each_entry(work, &work_list, list) { >>> + ret = kvm_gmem_convert_should_proceed(inode, work, shared, >>> + error_index); >> Since kvm_gmem_invalidate_begin() begins to handle shared memory, >> kvm_gmem_convert_invalidate_begin() will zap the table. >> The shared mapping could be zapped in kvm_gmem_convert_invalidate_begin() even >> when kvm_gmem_convert_should_proceed() returns error. >> The sequence is a bit confusing to me, at least in this patch so far. >> > It is true that zapping of pages from the guest page table will happen > before we figure out whether conversion is allowed. > > For a shared-to-private conversion, we will definitely unmap from the > host before checking if conversion is allowed, and there's no choice > there since conversion is allowed if there are no unexpected refcounts, > and the way to eliminate expected refcounts is to unmap from the host. > > Since we're unmapping before checking if conversion is allowed, I > thought it would be fine to also zap from guest page tables before > checking if conversion is allowed. > > Conversion is not meant to happen very regularly, and even if it is > unmapped or zapped, the next access will fault in the page anyway, so > there is a performance but not a functionality impact. Yes, it's OK for shared mapping. > > Hope that helps. It helped, thanks! > Is it still odd to zap before checking if conversion > should proceed? > >>> + if (ret) >>> + goto invalidate_end; >>> + } >>> + >>> + list_for_each_entry(work, &work_list, list) { >>> + rollback_stop_item = work; >>> + ret = kvm_gmem_shareability_apply(inode, work, m); >>> + if (ret) >>> + break; >>> + } >>> + >>> + if (ret) { >>> + m = shared ? SHAREABILITY_GUEST : SHAREABILITY_ALL; >>> + list_for_each_entry(work, &work_list, list) { >>> + if (work == rollback_stop_item) >>> + break; >>> + >>> + WARN_ON(kvm_gmem_shareability_apply(inode, work, m)); >>> + } >>> + } >>> + >>> +invalidate_end: >>> + list_for_each_entry(work, &work_list, list) >>> + kvm_gmem_convert_invalidate_end(inode, work); >>> +out: >>> + filemap_invalidate_unlock(inode->i_mapping); >>> + >>> + list_for_each_entry_safe(work, tmp, &work_list, list) { >>> + list_del(&work->list); >>> + kfree(work); >>> + } >>> + >>> + return ret; >>> +} >>> + >> [...] >>> @@ -186,15 +490,26 @@ static void kvm_gmem_invalidate_begin(struct kvm_gmem *gmem, pgoff_t start, >>> unsigned long index; >>> >>> xa_for_each_range(&gmem->bindings, index, slot, start, end - 1) { >>> + enum kvm_gfn_range_filter filter; >>> pgoff_t pgoff = slot->gmem.pgoff; >>> >>> + filter = KVM_FILTER_PRIVATE; >>> + if (kvm_gmem_memslot_supports_shared(slot)) { >>> + /* >>> + * Unmapping would also cause invalidation, but cannot >>> + * rely on mmu_notifiers to do invalidation via >>> + * unmapping, since memory may not be mapped to >>> + * userspace. >>> + */ >>> + filter |= KVM_FILTER_SHARED; >>> + } >>> + >>> struct kvm_gfn_range gfn_range = { >>> .start = slot->base_gfn + max(pgoff, start) - pgoff, >>> .end = slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, >>> .slot = slot, >>> .may_block = true, >>> - /* guest memfd is relevant to only private mappings. */ >>> - .attr_filter = KVM_FILTER_PRIVATE, >>> + .attr_filter = filter, >>> }; >>> >>> if (!found_memslot) { >>> @@ -484,11 +799,49 @@ EXPORT_SYMBOL_GPL(kvm_gmem_memslot_supports_shared); >>> #define kvm_gmem_mmap NULL >>> #endif /* CONFIG_KVM_GMEM_SHARED_MEM */ >>> >> [...]