Re: access_tracking_perf_test kvm selftest doesn't work when Multi-Gen LRU is in use

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Maxim Levitsky <mlevitsk@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	 Henry Huang <henry.hj@antgroup.com>,
	linux-mm@kvack.org
Subject: Re: access_tracking_perf_test kvm selftest doesn't work when Multi-Gen LRU  is in use
Date: Tue, 21 May 2024 16:29:54 -0700	[thread overview]
Message-ID: <Zk0uckIeAsb5ex4i@google.com> (raw)
In-Reply-To: <7a46456d6750ea682ba321ad09541fa81677b81a.camel@redhat.com>

On Wed, May 15, 2024, Maxim Levitsky wrote:
> Small note on why we started seeing this failure on RHEL 9 and only on some machines: 
> 
> 	- RHEL9 has MGLRU enabled, RHEL8 doesn't.

For a stopgap in KVM selftests, or possibly even a long term solution in case the
decision is that page_idle will simply have different behavior for MGLRU, couldn't
we tweak the test to not assert if MGRLU is enabled?

E.g. refactor get_module_param_integer() and/or get_module_param() to add
get_sysfs_value_integer() or so, and then do this?

diff --git a/tools/testing/selftests/kvm/access_tracking_perf_test.c b/tools/testing/selftests/kvm/access_tracking_perf_test.c
index 3c7defd34f56..1e759df36098 100644
--- a/tools/testing/selftests/kvm/access_tracking_perf_test.c
+++ b/tools/testing/selftests/kvm/access_tracking_perf_test.c
@@ -123,6 +123,11 @@ static void mark_page_idle(int page_idle_fd, uint64_t pfn)
                    "Set page_idle bits for PFN 0x%" PRIx64, pfn);
 }
 
+static bool is_lru_gen_enabled(void)
+{
+       return !!get_sysfs_value_integer("/sys/kernel/mm/lru_gen/enabled");
+}
+
 static void mark_vcpu_memory_idle(struct kvm_vm *vm,
                                  struct memstress_vcpu_args *vcpu_args)
 {
@@ -185,7 +190,8 @@ static void mark_vcpu_memory_idle(struct kvm_vm *vm,
         */
        if (still_idle >= pages / 10) {
 #ifdef __x86_64__
-               TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR),
+               TEST_ASSERT(this_cpu_has(X86_FEATURE_HYPERVISOR) ||
+                           is_lru_gen_enabled(),
                            "vCPU%d: Too many pages still idle (%lu out of %lu)",
                            vcpu_idx, still_idle, pages);
 #endif

> 	- machine needs to have more than one NUMA node because NUMA balancing 
> 	  (enabled by default) tries apparently to write protect the primary PTEs 
> 	  of (all?) processes every few seconds, and that causes KVM to flush the secondary PTEs:
> 	  (at least with new tdp mmu)
> 
> access_tracking-3448    [091] ....1..  1380.244666: handle_changed_spte <-tdp_mmu_set_spte
>  access_tracking-3448    [091] ....1..  1380.244667: <stack trace>
>  => cdc_driver_init
>  => handle_changed_spte
>  => tdp_mmu_set_spte
>  => tdp_mmu_zap_leafs
>  => kvm_tdp_mmu_unmap_gfn_range
>  => kvm_unmap_gfn_range
>  => kvm_mmu_notifier_invalidate_range_start
>  => __mmu_notifier_invalidate_range_start
>  => change_p4d_range
>  => change_protection
>  => change_prot_numa
>  => task_numa_work
>  => task_work_run
>  => exit_to_user_mode_prepare
>  => syscall_exit_to_user_mode
>  => do_syscall_64
>  => entry_SYSCALL_64_after_hwframe
> 
> It's a separate question, if the NUMA balancing should do this, or if NUMA
> balancing should be enabled by default,

FWIW, IMO, enabling NUMA balancing on a system whose primary purpose is to run VMs
is bad idea.  NUMA balancing operates under the assumption that a !PRESENT #PF is
relatively cheap.  When secondary MMUs are involved, that is simply not the case,
e.g. to honor the mmu_notifer event, KVM zaps _and_ does a remote TLB flush.  Even
if we reworked KVM and/or the mmu_notifiers so that KVM didn't need to do such a
heavy operation, the cost of page fault VM-Exit is significantly higher than the
cost of a host #PF.

> because there are other reasons that can force KVM to invalidate the
> secondary mappings and trigger this issue.

Ya.

          parent reply	other threads:[~2024-05-21 23:30 UTC|newest]

Thread overview: expand[flat|nested]  mbox.gz  Atom feed
 [parent not found: <7a46456d6750ea682ba321ad09541fa81677b81a.camel@redhat.com>]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zk0uckIeAsb5ex4i@google.com \
    --to=seanjc@google.com \
    --cc=henry.hj@antgroup.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox