From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0628EC4332F
	for <linux-mm@archiver.kernel.org>; Thu, 10 Nov 2022 20:06:40 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 3C6916B0071; Thu, 10 Nov 2022 15:06:40 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 376D86B0072; Thu, 10 Nov 2022 15:06:40 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 217608E0001; Thu, 10 Nov 2022 15:06:40 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 0FE2C6B0071
	for <linux-mm@kvack.org>; Thu, 10 Nov 2022 15:06:40 -0500 (EST)
Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay06.hostedemail.com (Postfix) with ESMTP id D135FAC104
	for <linux-mm@kvack.org>; Thu, 10 Nov 2022 20:06:39 +0000 (UTC)
X-FDA: 80118615318.08.A17D802
Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com [209.85.210.172])
	by imf25.hostedemail.com (Postfix) with ESMTP id 6D66DA000B
	for <linux-mm@kvack.org>; Thu, 10 Nov 2022 20:06:39 +0000 (UTC)
Received: by mail-pf1-f172.google.com with SMTP id q9so3009175pfg.5
        for <linux-mm@kvack.org>; Thu, 10 Nov 2022 12:06:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=qAlpSiWPFJC6v5OKKunr5qvlqoVrXquKYjzcoe1SyVc=;
        b=s9HF6LHGTF6eKfCzsKX9FWUdVKeTMh/pkFaFPKYT0H/d4OcOkPXr+0PLIQfX6kQ8cC
         Q64JtYebCdaU4ruWo7vOG4vxzCvbf/2RKLE1KNtbitWQ8USo0uklGuWav/4tDaTlFs6W
         J6Y4kL0unkMwatSWRT7ksIKS8SVHxhP1XKVRgR1T1zWDYn6jIqZIsJGf7oOlihyXVFbd
         0fJb0UlwvmxM7pKZs3I6/RrzXuReRbcfrXYqhpPSmf2PKsS1vNFDHltDDL3yKYYAvjar
         0KJ9USRBIJHcvcJeqnyACzk9N+VJy0yc4DTHiyUqXB49s0LJfIJy8GxXJG4zJMQkeMSU
         tXrQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=qAlpSiWPFJC6v5OKKunr5qvlqoVrXquKYjzcoe1SyVc=;
        b=RsfXTc9q3lyEIthqdgM0QM0H+5xXt/JLjABrvVbCQVXlxiYbfQyItfrsynGg/l6yDF
         CBsrzeh/t61SslD24JGuTNSwPWkLKKUzQRuwnVUQ3jsiAjYyhLgop0Xf3OQ95IQEU3G9
         hhMWa8J1MjQoEkODBG6eIM57FhJd3oqewcdS5rT1oBqD7IGntoNXHKg4JPi/TRY5T5Zn
         3Hx/rjCIZIstig7jJEOEYXxB8Om3jOVAPWGV5k+ADCu9fwAsb4yDlYyhSJTw4rcR0IE5
         GPr6xwUUascztnt5vTstUvThHaRVOu/x00vow/eesBy2ihCj+2Aek6xeyl1z+FnJE6bG
         hMog==
X-Gm-Message-State: ACrzQf2RgsFU6ODobFyN14B/q9Bf471bbVEeZuKbrlN3uUCbwdljS2AC
	h83I2B651hJJ/pv8D1ExYyTlzQ==
X-Google-Smtp-Source: AMsMyM5mFyfKXM8lYQiQhk8wd0EOlRpGOMeyUwap0ARCZELHfNLQu3vKFpmelexnZtvDHcT8xi35cA==
X-Received: by 2002:a63:ff45:0:b0:46a:e818:b622 with SMTP id s5-20020a63ff45000000b0046ae818b622mr3140651pgk.550.1668110797941;
        Thu, 10 Nov 2022 12:06:37 -0800 (PST)
Received: from google.com (7.104.168.34.bc.googleusercontent.com. [34.168.104.7])
        by smtp.gmail.com with ESMTPSA id d15-20020a170902cecf00b001871461688esm69853plg.175.2022.11.10.12.06.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 10 Nov 2022 12:06:37 -0800 (PST)
Date: Thu, 10 Nov 2022 20:06:33 +0000
From: Sean Christopherson <seanjc@google.com>
To: Chao Peng <chao.p.peng@linux.intel.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-fsdevel@vger.kernel.org, linux-arch@vger.kernel.org,
	linux-api@vger.kernel.org, linux-doc@vger.kernel.org,
	qemu-devel@nongnu.org, Paolo Bonzini <pbonzini@redhat.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Hugh Dickins <hughd@google.com>, Jeff Layton <jlayton@kernel.org>,
	"J . Bruce Fields" <bfields@fieldses.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Shuah Khan <shuah@kernel.org>, Mike Rapoport <rppt@kernel.org>,
	Steven Price <steven.price@arm.com>,
	"Maciej S . Szmigiero" <mail@maciej.szmigiero.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Vishal Annapurve <vannapurve@google.com>,
	Yu Zhang <yu.c.zhang@linux.intel.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	luto@kernel.org, jun.nakajima@intel.com, dave.hansen@intel.com,
	ak@linux.intel.com, david@redhat.com, aarcange@redhat.com,
	ddutile@redhat.com, dhildenb@redhat.com,
	Quentin Perret <qperret@google.com>, tabba@google.com,
	Michael Roth <michael.roth@amd.com>, mhocko@suse.com,
	Muchun Song <songmuchun@bytedance.com>, wei.w.wang@intel.com
Subject: Re: [PATCH v9 4/8] KVM: Use gfn instead of hva for mmu_notifier_retry
Message-ID: <Y21ZyTdIHSe4HLkU@google.com>
References: <20221025151344.3784230-1-chao.p.peng@linux.intel.com>
 <20221025151344.3784230-5-chao.p.peng@linux.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20221025151344.3784230-5-chao.p.peng@linux.intel.com>
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1668110799;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=qAlpSiWPFJC6v5OKKunr5qvlqoVrXquKYjzcoe1SyVc=;
	b=s12MKeBYbY0O9Rxa5g+OjAfF0UpVL3T1lnbsWP5DbOft/RWJFteJ5897ez6yd63ZHNVoj+
	d74DZw5Cu7lh3dpiHO7eeu8xVWjfa0SUA6hdnMlez3nzi/SjJUX8l8TI0tLQrh5KADmESt
	uQpkgWSPyRhp4XSpyxTSpGU6VAxKoU4=
ARC-Authentication-Results: i=1;
	imf25.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=s9HF6LHG;
	spf=pass (imf25.hostedemail.com: domain of seanjc@google.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=seanjc@google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1668110799; a=rsa-sha256;
	cv=none;
	b=bJmIqaNF2F55kPm4qJlWtLfSzZiSybrpRRj10hNWpRsJDjjrhcMJmV+WW/wFT8X35uvFoE
	Zee9zAxZvY7dh2b9CIAoRhLVu2p61FTCwiMO3avRH47zJvxPHjB1iU42b3d0pz90mgxisk
	g7PkM/HNsTSOB3AtVh5d4BTcbssmCD0=
X-Rspamd-Queue-Id: 6D66DA000B
X-Rspam-User: 
X-Rspamd-Server: rspam08
Authentication-Results: imf25.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=s9HF6LHG;
	spf=pass (imf25.hostedemail.com: domain of seanjc@google.com designates 209.85.210.172 as permitted sender) smtp.mailfrom=seanjc@google.com;
	dmarc=pass (policy=reject) header.from=google.com
X-Stat-Signature: hi3p87dbpbiqajys5sfp84gg8xjksx1t
X-HE-Tag: 1668110799-386876
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Oct 25, 2022, Chao Peng wrote:
> @@ -715,15 +715,9 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
>  	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
>  }
>  
> -void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
> -			      unsigned long end)
> +static inline

Don't tag static functions with "inline" unless they're in headers, in which case
the inline is effectively required.  In pretty much every scenario, the compiler
can do a better job of optimizing inline vs. non-inline, i.e. odds are very good
the compiler would inline this helper anyways, and if not, there would likely be
a good reason not to inline it.

It'll be a moot point in this case (more below), but this would also reduce the
line length and avoid the wrap.

> void update_invalidate_range(struct kvm *kvm, gfn_t start,
> +							    gfn_t end)

I appreciate the effort to make this easier to read, but making such a big divergence
from the kernel's preferred formatting is often counter-productive, e.g. I blinked a
few times when first reading this code.

Again, moot point this time (still below ;-) ), but for future reference, better
options are to either let the line poke out or simply wrap early to get the
bundling of parameters that you want, e.g.

  static inline void update_invalidate_range(struct kvm *kvm, gfn_t start, gfn_t end)

or 

  static inline void update_invalidate_range(struct kvm *kvm,
					     gfn_t start, gfn_t end)

>  {
> -	/*
> -	 * The count increase must become visible at unlock time as no
> -	 * spte can be established without taking the mmu_lock and
> -	 * count is also read inside the mmu_lock critical section.
> -	 */
> -	kvm->mmu_invalidate_in_progress++;
>  	if (likely(kvm->mmu_invalidate_in_progress == 1)) {
>  		kvm->mmu_invalidate_range_start = start;
>  		kvm->mmu_invalidate_range_end = end;
> @@ -744,6 +738,28 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
>  	}
>  }
>  
> +static void mark_invalidate_in_progress(struct kvm *kvm, gfn_t start, gfn_t end)

Splitting the helpers this way yields a weird API overall, e.g. it's possible
(common, actually) to have an "end" without a "begin".

Taking the range in the "end" is also dangerous/misleading/imbalanced, because _if_
there are multiple ranges in a batch, each range would need to be unwound
independently, e.g. the invocation of the "end" helper in
kvm_mmu_notifier_invalidate_range_end() is flat out wrong, it just doesn't cause
problems because KVM doesn't (currently) try to unwind regions (and probably never
will, but that's beside the point).

Rather than shunt what is effectively the "begin" into a separate helper, provide
three separate APIs, e.g. begin, range_add, end.  That way, begin+end don't take a
range and thus are symmetrical, always paired, and can't screw up unwinding since
they don't have a range to unwind.

It'll require three calls in every case, but that's not the end of the world since
none of these flows are super hot paths.

> +{
> +	/*
> +	 * The count increase must become visible at unlock time as no
> +	 * spte can be established without taking the mmu_lock and
> +	 * count is also read inside the mmu_lock critical section.
> +	 */
> +	kvm->mmu_invalidate_in_progress++;

This should invalidate (ha!) mmu_invalidate_range_{start,end}, and then WARN in
mmu_invalidate_retry() if the range isn't valid.  And the "add" helper should
WARN if mmu_invalidate_in_progress == 0.

> +}
> +
> +static bool kvm_mmu_handle_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)

"handle" is waaaay too generic.  Just match kvm_unmap_gfn_range() and call it
kvm_mmu_unmap_gfn_range().  This is a local function so it's unlikely to collide
with arch code, now or in the future.

> +{
> +	update_invalidate_range(kvm, range->start, range->end);
> +	return kvm_unmap_gfn_range(kvm, range);
> +}

Overall, this?  Compile tested only...

---
 arch/x86/kvm/mmu/mmu.c   |  8 +++++---
 include/linux/kvm_host.h | 33 +++++++++++++++++++++------------
 virt/kvm/kvm_main.c      | 30 +++++++++++++++++++++---------
 3 files changed, 47 insertions(+), 24 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 93c389eaf471..d4b373e3e524 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4259,7 +4259,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
 		return true;
 
 	return fault->slot &&
-	       mmu_invalidate_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
+	       mmu_invalidate_retry_gfn(vcpu->kvm, mmu_seq, fault->gfn);
 }
 
 static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
@@ -6098,7 +6098,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 
 	write_lock(&kvm->mmu_lock);
 
-	kvm_mmu_invalidate_begin(kvm, gfn_start, gfn_end);
+	kvm_mmu_invalidate_begin(kvm);
+
+	kvm_mmu_invalidate_range_add(kvm, gfn_start, gfn_end);
 
 	flush = kvm_rmap_zap_gfn_range(kvm, gfn_start, gfn_end);
 
@@ -6112,7 +6114,7 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
 		kvm_flush_remote_tlbs_with_address(kvm, gfn_start,
 						   gfn_end - gfn_start);
 
-	kvm_mmu_invalidate_end(kvm, gfn_start, gfn_end);
+	kvm_mmu_invalidate_end(kvm);
 
 	write_unlock(&kvm->mmu_lock);
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e6e66c5e56f2..29aa6d6827cc 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -778,8 +778,8 @@ struct kvm {
 	struct mmu_notifier mmu_notifier;
 	unsigned long mmu_invalidate_seq;
 	long mmu_invalidate_in_progress;
-	unsigned long mmu_invalidate_range_start;
-	unsigned long mmu_invalidate_range_end;
+	gfn_t mmu_invalidate_range_start;
+	gfn_t mmu_invalidate_range_end;
 #endif
 	struct list_head devices;
 	u64 manual_dirty_log_protect;
@@ -1378,10 +1378,9 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
-			      unsigned long end);
-void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start,
-			    unsigned long end);
+void kvm_mmu_invalidate_begin(struct kvm *kvm);
+void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end);
+void kvm_mmu_invalidate_end(struct kvm *kvm);
 
 long kvm_arch_dev_ioctl(struct file *filp,
 			unsigned int ioctl, unsigned long arg);
@@ -1952,9 +1951,9 @@ static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq)
 	return 0;
 }
 
-static inline int mmu_invalidate_retry_hva(struct kvm *kvm,
+static inline int mmu_invalidate_retry_gfn(struct kvm *kvm,
 					   unsigned long mmu_seq,
-					   unsigned long hva)
+					   gfn_t gfn)
 {
 	lockdep_assert_held(&kvm->mmu_lock);
 	/*
@@ -1963,10 +1962,20 @@ static inline int mmu_invalidate_retry_hva(struct kvm *kvm,
 	 * that might be being invalidated. Note that it may include some false
 	 * positives, due to shortcuts when handing concurrent invalidations.
 	 */
-	if (unlikely(kvm->mmu_invalidate_in_progress) &&
-	    hva >= kvm->mmu_invalidate_range_start &&
-	    hva < kvm->mmu_invalidate_range_end)
-		return 1;
+	if (unlikely(kvm->mmu_invalidate_in_progress)) {
+		/*
+		 * Dropping mmu_lock after bumping mmu_invalidate_in_progress
+		 * but before updating the range is a KVM bug.
+		 */
+		if (WARN_ON_ONCE(kvm->mmu_invalidate_range_start == INVALID_GPA ||
+				 kvm->mmu_invalidate_range_end == INVALID_GPA))
+			return 1;
+
+		if (gfn >= kvm->mmu_invalidate_range_start &&
+		    gfn < kvm->mmu_invalidate_range_end)
+			return 1;
+	}
+
 	if (kvm->mmu_invalidate_seq != mmu_seq)
 		return 1;
 	return 0;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 43bbe4fde078..e9e03b979f77 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -540,9 +540,7 @@ static void kvm_mmu_notifier_invalidate_range(struct mmu_notifier *mn,
 
 typedef bool (*hva_handler_t)(struct kvm *kvm, struct kvm_gfn_range *range);
 
-typedef void (*on_lock_fn_t)(struct kvm *kvm, unsigned long start,
-			     unsigned long end);
-
+typedef void (*on_lock_fn_t)(struct kvm *kvm);
 typedef void (*on_unlock_fn_t)(struct kvm *kvm);
 
 struct kvm_hva_range {
@@ -628,7 +626,8 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
 				locked = true;
 				KVM_MMU_LOCK(kvm);
 				if (!IS_KVM_NULL_FN(range->on_lock))
-					range->on_lock(kvm, range->start, range->end);
+					range->on_lock(kvm);
+
 				if (IS_KVM_NULL_FN(range->handler))
 					break;
 			}
@@ -715,8 +714,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
 	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
 }
 
-void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
-			      unsigned long end)
+void kvm_mmu_invalidate_begin(struct kvm *kvm)
 {
 	/*
 	 * The count increase must become visible at unlock time as no
@@ -724,6 +722,15 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
 	 * count is also read inside the mmu_lock critical section.
 	 */
 	kvm->mmu_invalidate_in_progress++;
+
+	kvm->mmu_invalidate_range_start = INVALID_GPA;
+	kvm->mmu_invalidate_range_end = INVALID_GPA;
+}
+
+void kvm_mmu_invalidate_range_add(struct kvm *kvm, gfn_t start, gfn_t end)
+{
+	WARN_ON_ONCE(!kvm->mmu_invalidate_in_progress);
+
 	if (likely(kvm->mmu_invalidate_in_progress == 1)) {
 		kvm->mmu_invalidate_range_start = start;
 		kvm->mmu_invalidate_range_end = end;
@@ -744,6 +751,12 @@ void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
 	}
 }
 
+static bool kvm_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+	kvm_mmu_invalidate_range_add(kvm, range->start, range->end);
+	return kvm_unmap_gfn_range(kvm, range);
+}
+
 static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 					const struct mmu_notifier_range *range)
 {
@@ -752,7 +765,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 		.start		= range->start,
 		.end		= range->end,
 		.pte		= __pte(0),
-		.handler	= kvm_unmap_gfn_range,
+		.handler	= kvm_mmu_unmap_gfn_range,
 		.on_lock	= kvm_mmu_invalidate_begin,
 		.on_unlock	= kvm_arch_guest_memory_reclaimed,
 		.flush_on_ret	= true,
@@ -791,8 +804,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn,
 	return 0;
 }
 
-void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start,
-			    unsigned long end)
+void kvm_mmu_invalidate_end(struct kvm *kvm)
 {
 	/*
 	 * This sequence increase will notify the kvm page fault that

base-commit: d663b8a285986072428a6a145e5994bc275df994
--