From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Henry Huang <henry.hj@antgroup.com>,
linux-mm@kvack.org
Subject: Re: access_tracking_perf_test kvm selftest doesn't work when Multi-Gen LRU is in use
Date: Tue, 21 May 2024 16:29:54 -0700 [thread overview]
Message-ID: <Zk0uckIeAsb5ex4i@google.com> (raw)
In-Reply-To: <7a46456d6750ea682ba321ad09541fa81677b81a.camel@redhat.com>
On Wed, May 15, 2024, Maxim Levitsky wrote:
> Small note on why we started seeing this failure on RHEL 9 and only on some machines:
>
> - RHEL9 has MGLRU enabled, RHEL8 doesn't.
For a stopgap in KVM selftests, or possibly even a long term solution in case the
decision is that page_idle will simply have different behavior for MGLRU, couldn't
we tweak the test to not assert if MGRLU is enabled?
E.g. refactor get_module_param_integer() and/or get_module_param() to add
get_sysfs_value_integer() or so, and then do this?
diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 3c7defd34f56..1e759df36098 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -123,6 +123,11 @@ static void mark_page_idle(int page_idle_fd, uint64_t pfn)
"Set page_idle bits for PFN 0x%" PRIx64, pfn);
}
+static bool is_lru_gen_enabled(void)
+{
+ return !!get_sysfs_value_integer("/sys/kernel/mm/lru_gen/enabled");
+}
+
static void mark_vcpu_memory_idle(struct kvm_vm *vm,
struct memstress_vcpu_args *vcpu_args)
{
@@ -185,7 +190,8 @@ static void mark_vcpu_memory_idle(struct kvm_vm *vm,
*/
if (still_idle >= pages / 10) {
#ifdef __x86_64__
- TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR),
+ TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR) ||
+ is_lru_gen_enabled(),
"vCPU%d: Too many pages still idle (%lu out of %lu)",
vcpu_idx, still_idle, pages);
#endif
> - machine needs to have more than one NUMA node because NUMA balancing
> (enabled by default) tries apparently to write protect the primary PTEs
> of (all?) processes every few seconds, and that causes KVM to flush the secondary PTEs:
> (at least with new tdp mmu)
>
> access_tracking-3448 [091] ....1.. 1380.244666: handle_changed_spte <-tdp_mmu_set_spte
> access_tracking-3448 [091] ....1.. 1380.244667: <stack trace>
> => cdc_driver_init
> => handle_changed_spte
> => tdp_mmu_set_spte
> => tdp_mmu_zap_leafs
> => kvm_tdp_mmu_unmap_gfn_range
> => kvm_unmap_gfn_range
> => kvm_mmu_notifier_invalidate_range_start
> => __mmu_notifier_invalidate_range_start
> => change_p4d_range
> => change_protection
> => change_prot_numa
> => task_numa_work
> => task_work_run
> => exit_to_user_mode_prepare
> => syscall_exit_to_user_mode
> => do_syscall_64
> => entry_SYSCALL_64_after_hwframe
>
> It's a separate question, if the NUMA balancing should do this, or if NUMA
> balancing should be enabled by default,
FWIW, IMO, enabling NUMA balancing on a system whose primary purpose is to run VMs
is bad idea. NUMA balancing operates under the assumption that a !PRESENT #PF is
relatively cheap. When secondary MMUs are involved, that is simply not the case,
e.g. to honor the mmu_notifer event, KVM zaps _and_ does a remote TLB flush. Even
if we reworked KVM and/or the mmu_notifiers so that KVM didn't need to do such a
heavy operation, the cost of page fault VM-Exit is significantly higher than the
cost of a host #PF.
> because there are other reasons that can force KVM to invalidate the
> secondary mappings and trigger this issue.
Ya.
parent reply other threads:[~2024-05-21 23:30 UTC|newest]
Thread overview: expand[flat|nested] mbox.gz Atom feed
[parent not found: <7a46456d6750ea682ba321ad09541fa81677b81a.camel@redhat.com>]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Zk0uckIeAsb5ex4i@google.com \
--to=seanjc@google.com \
--cc=henry.hj@antgroup.com \
--cc=kvm@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mlevitsk@redhat.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox