From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57C27CAC5BE for ; Wed, 18 Sep 2024 05:52:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E17166B0082; Wed, 18 Sep 2024 01:52:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DEF366B0083; Wed, 18 Sep 2024 01:52:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB5E26B0085; Wed, 18 Sep 2024 01:52:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AF9596B0082 for ; Wed, 18 Sep 2024 01:52:03 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 2C22B120347 for ; Wed, 18 Sep 2024 05:52:03 +0000 (UTC) X-FDA: 82576788126.21.357B0CD Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf22.hostedemail.com (Postfix) with ESMTP id 76024C0007 for ; Wed, 18 Sep 2024 05:52:00 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NmvMj4Vn; spf=pass (imf22.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1726638663; a=rsa-sha256; cv=none; b=IpLSZhd454iT+dvxfHgRn9Ae5gLllAxVeCvp10XN138LHVeQq3xHcI1acHhS8BnunC/tpz TdXLCHLTnVCt479WfeLjlbnDsnZqiL/EvrwoD+IeTX9909JNCZDbuo9LoJucZlg0zoxg9k Q3eYGN7BMAOMey9YtSmA6CE7bq/y384= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=NmvMj4Vn; spf=pass (imf22.hostedemail.com: domain of rppt@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=rppt@kernel.org; dmarc=pass (policy=quarantine) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1726638663; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oac0LOMx9yqvkq742J+ofMHJf28bFVSASJO+1GrlstE=; b=Yuw+FTUYqlVIg/32786eKgZuIXxRr+WG8qg//8jC0dQrXQ8J0TR0+EJ+mr9g+8KOWHpBY8 BEFfVJsypkPxupMsayXgzveuqOn8c0KSlTd6YhURaTWd458LqiYA76YH2VNTVv+H8p5n8H CXiGZ+6Rx/YfHILZxpLoM1DWT7OPc3E= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 708AD5C58BA; Wed, 18 Sep 2024 05:51:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 10CC3C4CEC3; Wed, 18 Sep 2024 05:51:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1726638718; bh=k4p5yAWG1/S7eqbUX0zVSAQcPzHPeBPvL5Uur8qzQXg=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=NmvMj4Vno/SD/Lxb/VrmunICZNou7xy3jI904aeaU+2RWv/oQAeHzrsUZhlAxnK38 m19Py8rK9HSNtGAlkBKUrp/a5Tw7uyWu+tm2REdoni3IJUK7mstBogdzIOvNyOjAit ktuxvgC08XvjkcFIvy2iSLfzpfXlvZgasnTM18u0nLEzRP0Hd+O/rXbi0y3yhi8GGE ilhr1D8N3E+N/1LgwTLzW3gydvbcEd8tFK1r22AcTJZrYpjMQvDZUwLwMjV869m1ec AJnoOd/Lw+PV2i91v52ltZz0oRyFvpoMsqH5EAh3Cy//GmUX5EKhIJc9a40Q0beMUI 6HNAh4UYf6b7Q== Date: Wed, 18 Sep 2024 07:48:54 +0200 From: Mike Rapoport To: Patrick Roy Cc: seanjc@google.com, pbonzini@redhat.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, x86@kernel.org, hpa@zytor.com, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, quic_eberman@quicinc.com, dwmw@amazon.com, david@redhat.com, tabba@google.com, linux-mm@kvack.org, dmatlack@google.com, graf@amazon.com, jgowans@amazon.com, derekmn@amazon.com, kalyazin@amazon.com, xmarcalx@amazon.com Subject: Re: [RFC PATCH v2 01/10] kvm: gmem: Add option to remove gmem from direct map Message-ID: References: <20240910163038.1298452-1-roypat@amazon.co.uk> <20240910163038.1298452-2-roypat@amazon.co.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240910163038.1298452-2-roypat@amazon.co.uk> X-Stat-Signature: qainag7nqrt5sq54fw96ppszuonfmga3 X-Rspamd-Queue-Id: 76024C0007 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1726638720-451517 X-HE-Meta: U2FsdGVkX1/BwWtELwoy1JaZtKlhoVsS1+CVR+zoy4N21+Ms1dqzDrLJ5rW+R5EhF3FDxOz8mROCSETQaCX6yfq3P6Y9d/DPWA2NQVh8pGPEOSJ2z9x/Di9JojjkqiPxRBIlOa2xJKvb6nDFMOUvVfaUoAbWmWdaa0/C1RB51/9Dxz/7o15WYvxbbb5nuEQQnTbuwBDaxSDXRA+Orj9urE8RHfbehvX5sBsAgE3qkJ0GOf2W85Ddy+zFGgAwwJkPH/xPja95aA++mCUPJ/WYGg5bfXg1vPmYg/YRcQDl3JQkO7d1CkqQ8XEPy3iBCS2J5rOqJbuEdd/GirSTyZLFK3URQwn8JtWMiFOqkR6Uv7oCegWSw52b0jhslNsLNJ8xJnNnToD4lzce/6OTHb2h8wwJmVCWzPJfxupeTCEKg+g6el5M16BGHuOycdu6WIxcXBcMhvK3MPvNR+WIAi1ZmebB9ElDQRSTyOB6L7nqHxWyxJUFtu9fqIZPIDrRd3yNbUy+78srAx7nzroj5whg3HwuMcQPTKB+Y4+sCFgrxwONH8jKvQ16LB0JHMJ6qf8oJv3YOgEzcTEt9JHb+boRtuxLT43wDRiE40acf3S9Gn2i9nydXgM5+h+4CUUisq1QijZo5NIBwzIxnr8wXMH/E3F3h6hJT/9qnTv+t5d1ox6l7szQf2LNZFP784iuQ2baSLjOMbcaLXhJnKAqyVFK2nZqOATR1tmCk2QcljocmVxeIJ0oFuVV2FwuU39tYdsFH6SmXmiGsfUbCveKMDC74YkVBiEU8ki0aOAZ5jxbkcQHKPvkcHwbMZ63xYHIOYKld++Lhi1oIBVp51o4GIsawllm8iUwhurmx37zppdwgbzsfDbNIiZkJRHK0RNJvstAmT9itqEexLZ53vN8bFRYuIGcCeR+py8o3o78sqJzH2zKa8uTxb6m5q5x9M/EHeGivCOqAlNyAId5vnJIixc KGjwFpvs 8TQLzlyOoMDIvq1Vs21H5s/C1W8c9Km/kxXZYGnuOzJzHIgwGFZ+H1q8Jc1tJ5enKzCeeF8fWIM4z3YnRwVI2E8MmhBPF147KUabhoCt033npQv4R3EKwaCXN/bmxxxIMCMRzsW39FKDrq57UrfbnTBQndAlkWf8VcmH7Svhp1fnQI1eir2/g1bDs20+ZTAqwcpmf4LEdYxvlZ+Qg7nvH5uwhfyGY20fQJe/Zp9LM5jojK8PGERG/OQiIzZON/6U2XFvvPYSoEkU25YKKdxNhK4CdwQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Sep 10, 2024 at 05:30:27PM +0100, Patrick Roy wrote: > Add a flag to the KVM_CREATE_GUEST_MEMFD ioctl that causes gmem pfns > to be removed from the host kernel's direct map. Memory is removed > immediately after allocation and preparation of gmem folios (after > preparation, as the prepare callback might expect the direct map entry > to be present). Direct map entries are restored before > kvm_arch_gmem_invalidate is called (as ->invalidate_folio is called > before ->free_folio), for the same reason. > > Use the PG_private flag to indicate that a folio is part of gmem with > direct map removal enabled. While in this patch, PG_private does have a > meaning of "folio not in direct map", this will no longer be true in > follow up patches. Gmem folios might get temporarily reinserted into the > direct map, but the PG_private flag needs to remain set, as the folios > will have private data that needs to be freed independently of direct > map status. This is why kvm_gmem_folio_clear_private does not call > folio_clear_private. > > kvm_gmem_{set,clear}_folio_private must be called with the folio lock > held. > > To ensure that failures in kvm_gmem_{clear,set}_private do not cause > system instability due to leaving holes in the direct map, try to always > restore direct map entries on failure. Pages for which restoration of > direct map entries fails are marked as HWPOISON, to prevent the > kernel from ever touching them again. > > Signed-off-by: Patrick Roy > --- > include/uapi/linux/kvm.h | 2 + > virt/kvm/guest_memfd.c | 96 +++++++++++++++++++++++++++++++++++++--- > 2 files changed, 91 insertions(+), 7 deletions(-) > > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h > index 637efc0551453..81b0f4a236b8c 100644 > --- a/include/uapi/linux/kvm.h > +++ b/include/uapi/linux/kvm.h > @@ -1564,6 +1564,8 @@ struct kvm_create_guest_memfd { > __u64 reserved[6]; > }; > > +#define KVM_GMEM_NO_DIRECT_MAP (1ULL << 0) > + > #define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory) > > struct kvm_pre_fault_memory { > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index 1c509c3512614..2ed27992206f3 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -4,6 +4,7 @@ > #include > #include > #include > +#include > > #include "kvm_mm.h" > > @@ -49,8 +50,69 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol > return 0; > } > > +static bool kvm_gmem_test_no_direct_map(struct inode *inode) > +{ > + return ((unsigned long)inode->i_private & KVM_GMEM_NO_DIRECT_MAP) == KVM_GMEM_NO_DIRECT_MAP; > +} > + > +static int kvm_gmem_folio_set_private(struct folio *folio) > +{ > + unsigned long start, npages, i; > + int r; > + > + start = (unsigned long) folio_address(folio); > + npages = folio_nr_pages(folio); > + > + for (i = 0; i < npages; ++i) { > + r = set_direct_map_invalid_noflush(folio_page(folio, i)); > + if (r) > + goto out_remap; > + } I feels like we need a new helper that takes care of contiguous pages. arm64 already has set_memory_valid(), so it may be something like int set_direct_map_valid_noflush(struct page *p, unsigned nr, bool valid); > + flush_tlb_kernel_range(start, start + folio_size(folio)); > + folio_set_private(folio); > + return 0; > +out_remap: > + for (; i > 0; i--) { > + struct page *page = folio_page(folio, i - 1); > + > + if (WARN_ON_ONCE(set_direct_map_default_noflush(page))) { > + /* > + * Random holes in the direct map are bad, let's mark > + * these pages as corrupted memory so that the kernel > + * avoids ever touching them again. > + */ > + folio_set_hwpoison(folio); > + r = -EHWPOISON; > + } > + } > + return r; > +} > + -- Sincerely yours, Mike.