From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0B87C47077 for ; Tue, 16 Jan 2024 15:37:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 269A36B0087; Tue, 16 Jan 2024 10:37:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2186B6B0088; Tue, 16 Jan 2024 10:37:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E1596B0089; Tue, 16 Jan 2024 10:37:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id ED42F6B0087 for ; Tue, 16 Jan 2024 10:37:24 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C052F160645 for ; Tue, 16 Jan 2024 15:37:24 +0000 (UTC) X-FDA: 81685578408.24.6A6CB95 Received: from proxmox-new.maurer-it.com (proxmox-new.maurer-it.com [94.136.29.106]) by imf28.hostedemail.com (Postfix) with ESMTP id 90308C0014 for ; Tue, 16 Jan 2024 15:37:22 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of f.weber@proxmox.com designates 94.136.29.106 as permitted sender) smtp.mailfrom=f.weber@proxmox.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705419443; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yTgGGV1huFl6WhISznZihDgXIwBDxZbNAgkNsHL3EUA=; b=hwllpLC1pRTndKbjYxdl74c9OJv0wkDN8JUMixTTFaAabnt210S7FDcediBtYzVlhYcQWS IzJbCeoOL/EIGp/NYJFetuPvrgZFpKdO9I9WNPQ/t2Y0Zw6ZpwDMt2SAc1p5E8vmadZ7JB kBvS9QySy/PYaG+mgIiCkuA35iF1kZM= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of f.weber@proxmox.com designates 94.136.29.106 as permitted sender) smtp.mailfrom=f.weber@proxmox.com; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705419443; a=rsa-sha256; cv=none; b=dAT+/ENZnuc8fmr6cfTnPstNz4DFMOHrECROv7AaW9UDjpqoLkXjZIluQWHUYsojKLVe/R 0SXOL2oH8uEif1E5n3ho3BzxPdHSjSHeR4rDjtp68WWYMCkq4d07Vt04Xa/IQgVtOh5roC JCLNecR91EmBk2bG/Hy1DZwjJ0dKfmA= Received: from proxmox-new.maurer-it.com (localhost.localdomain [127.0.0.1]) by proxmox-new.maurer-it.com (Proxmox) with ESMTP id 495D549183; Tue, 16 Jan 2024 16:37:20 +0100 (CET) Message-ID: Date: Tue, 16 Jan 2024 16:37:19 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Temporary KVM guest hangs connected to KSM and NUMA balancer Content-Language: en-US To: Sean Christopherson Cc: kvm@vger.kernel.org, Paolo Bonzini , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org References: <832697b9-3652-422d-a019-8c0574a188ac@proxmox.com> From: Friedrich Weber In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 90308C0014 X-Rspam-User: X-Stat-Signature: tyerwb3gwkutfq8w51q8k87p7eocy8ed X-Rspamd-Server: rspam01 X-HE-Tag: 1705419442-561083 X-HE-Meta: U2FsdGVkX1+ojrDJIzqpCCBLbJDiBqS9TrhwLLy4+vdp3PJQoWfk6BzFBdxARxsT7yorSzxOVou8D+Eomjls4TLuckx+5hT19KD3dySvlOSqKl2/EeB2iHtIHclNCxjhki0Ts6EEUxUzKcxW3dYmnBC92wxr0niyYKs/F7VRki3H3RSamoKTaESXTvj/Jfb1giVNbjswSKjIL2a4Be3Iw+7c7U1Gr3aFdCd6J8VjULKeCnM1nLy3fek3JcCzYj4NkNpv6T2PX1elZ7MUTrkJYl+KZ9mq9RdyZC0+W/DmNHh1FgghGb+8W6lP1Oz/cgC6VcKlugjKDAoPXNSa+z/EOvff7ZyAKt8YUNZ+yY9KpmO2biEd5Qat90o6abPYj+OKzcmV/CQG+LIUUoPhSqUNuIGoXZkizu+zYLUmMQzJaUn+UEXrtxsBDZkGSMqw373n/fBr54WiUmHzCojq/SdN3vHuG39a7Ut5D7cbt22/c5X7aAdBkd8JGOzyN7cKBJDcNh4lG3MH5K0nbab9ih97K5uB+FTp+7EzhXwIck6SwloLSGmZhO9FUkU0oBdSvPjgCRvsyAMnRgyDLzM/mT0f2IdGZBPl57E0i/jENZFhuNQQsse9qlE0OM8XUN1kEvxhigK2CEYFAKZiYAba2fCF7+A4ivNtrnaylxCuGwfIvy5NOHtZf2Trm0fZfIjnKCfakIBiSBRXXyDU/ErulzhCldDO/x/OSTQUAb8He18yoA1bKs2MVCTwkGwE8z4HcxEkvdkdk7cmgopGhZ7dJPDqOOzXSKeKG3Gq3oJe0zpIEZsQNDMftp28te1IZUh9LzjpYUAQxune6iqQgbDoar1cRGGz38+Bk3OcAcatofoxpU9SEoR0fx0xTBUwEh7YYrGClhIrH9tLl3bfuhpYiXYKcWmuV1x9hPXhoUjwIFWMvifpH2dtd8YenObMxqIAMnV1QhDxsOsoLWjueruJwFG P8jxwYXj twhDoH/V64/yW8IbUUrEaPrbvFXAx0hGmwxKaCan5x/K2Gag8smlyOYzk2/4awMLUq1Nps1ljhnyT1VtMkUGfQyyptnZ/+/16VmQmCOyG2SNWk5lKCzA9BPAOBFfNsZqgvP6Qpf8V/8Iwmy4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Sean, On 11/01/2024 17:00, Sean Christopherson wrote: > This is a known issue. It's mostly a KVM bug[...] (fix posted[...]), but I suspect > that a bug in the dynamic preemption model logic[...] is also contributing to the > behavior by causing KVM to yield on preempt models where it really shouldn't. I tried the following variants now, each applied on top of 6.7 (0dd3ee31): * [1], the initial patch series mentioned in the bugreport ("[PATCH 0/2] KVM: Pre-check mmu_notifier retry on x86") * [2], its v2 that you linked above ("[PATCH v2] KVM: x86/mmu: Retry fault before acquiring mmu_lock if mapping is changing") * [3], the scheduler patch you linked above ("[PATCH] sched/core: Drop spinlocks on contention iff kernel is preemptible") * both [2] & [3] My kernel is PREEMPT_DYNAMIC and, according to /sys/kernel/debug/sched/preempt, defaults to preempt=voluntary. For case [3], I additionally tried manually switching to preempt=full. Provided I did not mess up, I get the following results for the reproducer I posted: * [1] (the initial patch series): no hangs * [2] (its v2): hangs * [3] (the scheduler patch) with preempt=voluntary: no hangs * [3] (the scheduler patch) with preempt=full: hangs * [2] & [3]: no hangs So it seems like: * [1] (the initial patch series) fixes the hangs, which is consistent with the feedback in the bugreport [4]. * But weirdly, its v2 [2] does not fix the hangs. * As long as I stay with preempt=voluntary, [3] (the scheduler patch) alone is already enough to fix the hangs in my case -- this I did not expect :) Does this make sense to you? Happy to double-check or run more tests if anything seems off. Best wishes, Friedrich [1] https://lore.kernel.org/all/20230825020733.2849862-1-seanjc@google.com/ [2] https://lore.kernel.org/all/20240110012045.505046-1-seanjc@google.com/ [3] https://lore.kernel.org/all/20240110214723.695930-1-seanjc@google.com/ [4] https://bugzilla.kernel.org/show_bug.cgi?id=218259#c6