From: Sean Christopherson <seanjc@google.com>
To: Friedrich Weber <f.weber@proxmox.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer
Date: Thu, 11 Jan 2024 08:00:05 -0800 [thread overview]
Message-ID: <ZaAQhc13IbWk5j5D@google.com> (raw)
In-Reply-To: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com>
On Thu, Jan 04, 2024, Friedrich Weber wrote:
> Hi,
>
> some of our (Proxmox VE) users have been reporting [1] that guests
> occasionally become unresponsive with high CPU usage for some time
> (varying between ~1 and more than 60 seconds). After that time, the
> guests come back and continue running fine. Windows guests seem most
> affected (not responding to pings during the hang, RDP sessions time
> out). But we also got reports about Linux guests. This issue was not
> present while we provided (host) kernel 5.15 and was first reported when
> we rolled out a kernel based on 6.2. The reports seem to concern NUMA
> hosts only. Users reported that the issue becomes easier to trigger the
> more memory is assigned to the guests. Setting mitigations=off was
> reported to alleviate (but not eliminate) the issue. The issue seems to
> disappear after disabling KSM.
>
> We can reproduce the issue with a Windows guest on a NUMA host, though
> only occasionally and not very reliably. Using a bpftrace script like
> [7] we found the hangs to correlate with long-running invocations of
> `task_numa_work` (more than 500ms), suggesting a connection to the NUMA
> balancer. Indeed, we can't reproduce the issue after disabling the NUMA
> balancer with `echo 0 > /proc/sys/kernel/numa_balancing` [2] and got a
> user confirming this fixes the issue for them [3].
>
> Since the Windows reproducer is not very stable, we tried to find a
> Linux guest reproducer and have found one (described below [0]) that
> triggers a very similar (hopefully the same) issue. The reproducer
> triggers the hangs also if the host is on current Linux 6.7-rc8
> (610a9b8f). A kernel bisect points to the following as the commit
> introducing the issue:
>
> f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and
> mmu_notifier unmap")
>
> which is why I cc'ed Sean and Paolo. Because of the possible KSM
> connection I cc'ed Andrew and linux-mm.
>
> Indeed, on f47e5bbb~1 = a80ced6e ("KVM: SVM: fix panic on out-of-bounds
> guest IRQ") the reproducer does not trigger the hang, and on f47e5bbb it
> triggers the hang.
>
> Currently I don't know enough about the KVM/KSM/NUMA balancer code to
> tell how the patch may trigger these issues. Any idea who we could ask
> about this, or how we could further debug this would be greatly appreciated!
This is a known issue. It's mostly a KVM bug[1][2] (fix posted[3]), but I suspect
that a bug in the dynamic preemption model logic[4] is also contributing to the
behavior by causing KVM to yield on preempt models where it really shouldn't.
[1] https://lore.kernel.org/all/ZNnPF4W26ZbAyGto@yzhao56-desk.sh.intel.com
[2] https://lore.kernel.org/all/bug-218259-28872@https.bugzilla.kernel.org%2F
[3] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com
[4] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com
next prev parent reply other threads:[~2024-01-11 16:00 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-04 13:42 Friedrich Weber
2024-01-11 12:43 ` Friedrich Weber
2024-01-11 16:00 ` Sean Christopherson [this message]
2024-01-12 16:08 ` Friedrich Weber
2024-01-16 15:37 ` Friedrich Weber
2024-01-16 17:20 ` Sean Christopherson
2024-01-17 13:09 ` Friedrich Weber
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZaAQhc13IbWk5j5D@google.com \
--to=seanjc@google.com \
--cc=akpm@linux-foundation.org \
--cc=f.weber@proxmox.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox