From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1996C46CA1 for ; Mon, 18 Sep 2023 17:13:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 317726B03ED; Mon, 18 Sep 2023 13:13:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C7576B03EE; Mon, 18 Sep 2023 13:13:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18ED16B03EF; Mon, 18 Sep 2023 13:13:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 08BBE6B03ED for ; Mon, 18 Sep 2023 13:13:37 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id BDA0D120A96 for ; Mon, 18 Sep 2023 17:13:36 +0000 (UTC) X-FDA: 81250364832.18.5381045 Received: from mail-wm1-f49.google.com (mail-wm1-f49.google.com [209.85.128.49]) by imf30.hostedemail.com (Postfix) with ESMTP id ED68E80011 for ; Mon, 18 Sep 2023 17:13:34 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=GLkMHG4q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1695057215; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; b=Cc+ygf3DMve0airJFAd+9E0i7lgCU5hk8s1U1IqAlA/W7PQn6lQtpazQJxvboaKW0mAHpv YVkvVPOPkVW+mW/pJwrTvq7f0ckCgOjcIs7zJ4lh7oe7R0ZVQFVC6gujZK4FSO3tABO4f4 +a3eGpeBkP47pBl4/TSqUmG8QeILEhU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=GLkMHG4q; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of jannh@google.com designates 209.85.128.49 as permitted sender) smtp.mailfrom=jannh@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1695057215; a=rsa-sha256; cv=none; b=dXWxVyD4JBCzKDM9bCB6eFChZ3oXEKZETmWw2XO0Bl5EOWoEBQ1gDYpY2wgLpx5G/C+ua8 KyjcVWR6j1rcPdadgNfkZ0adIV0c6Qp1LqCHl1MhYwTMf95odGca1pTjDNOiqDc6JtibV5 FmMTNfc4NPA9BpbyCXnGTh5oGdKWdsc= Received: by mail-wm1-f49.google.com with SMTP id 5b1f17b1804b1-4047c6ec21dso4415e9.0 for ; Mon, 18 Sep 2023 10:13:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1695057213; x=1695662013; darn=kvack.org; h=cc:to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; b=GLkMHG4qxIEGfybo8hdbAUuxqvMUzH4VuHWDRc2QSoYviW2f4aYU/RViracFEhX8yb KfDfTYl9iSiuLxpoDU8NtULkcf4lZwr7HMSce8WzbWX6fhUUasWP/nuAxzIuIbB5V7Hz qzJlRKAbW9z1KTJu1DY/bOPMElcZb4cogiT388CKm3djEZv2cEkIDk9vlBwq15dz+err RQbgEAfkB8hlAWvl5EVhKBxgvKpmjq74hlf8tb7vr7oKHsObjTi/7cdxvYVz3PdGkRab dlPF8hA0ndnW70PFq+ZVGOT3KXg/9TipBx2v36DNi0Whj1WJRWA5umqbWWr/0nYrDdKY XnRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695057213; x=1695662013; h=cc:to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=8NCRoQtasHBhaxbBJof29twbnK3J3FpNXqEPensTrQU=; b=CZz6ZoEEN+WuB7NmdkT5UCcphfR034M+VSt5BP2gB3VXaCkefqg9vAZk5j5YPodhtd LRN+nKIB169n9Za+DXhm9cI8HYDXIV39GDURTANmw10WfzrLfxNtt9jg8/NUmbzcAXPx TwdFMWQk9XxZwEQi6B01Nat1S2FwGX8oqnEDp/E/Sz6FcwEPzy9AVymJaq/4vikRyUOc XrUHTh7zGpI10JdErrb2+w3qJLEKk1y+B6nOcj/jGuosWrTBk6mXL9nCJChJ8fmFCijl QGf7BYNhXyBJCiTpqVr77GTlQVlBXFeTrl53TziDNUISCQ/cWi8+a3gpkCthc58IFaol b3Lg== X-Gm-Message-State: AOJu0YxBom3h4Ktv0GkKonoDSmYa27gGSEw9LEE1flWn+xjEeODOfuLq sbyFhjehgkS/TAK69sU6Jg0Z153Zpo8Hbzjc6TRWlQ== X-Google-Smtp-Source: AGHT+IEkRFchjpZqPNvI+IkS94xt8vvX5NVo36F/WMbxfNTZHxp7itGvCJT+AIT8wUnE3U2ITQL++aALeT4I/3XmYoI= X-Received: by 2002:a05:600c:3acd:b0:400:c6de:6a20 with SMTP id d13-20020a05600c3acd00b00400c6de6a20mr181501wms.3.1695057213310; Mon, 18 Sep 2023 10:13:33 -0700 (PDT) MIME-Version: 1.0 From: Jann Horn Date: Mon, 18 Sep 2023 19:12:56 +0200 Message-ID: Subject: KVM nonblocking MMU notifier with KVM_GUEST_USES_PFN looks racy [but is currently unused] To: Paolo Bonzini , David Woodhouse Cc: kernel list , KVM list , Linux-MM , Michal Hocko , Sean Christopherson Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: ED68E80011 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ifim3wei95m3xut3ieicr9pfx8pugqbf X-HE-Tag: 1695057214-850346 X-HE-Meta: U2FsdGVkX1/12iLYzxbyfP7VQ4rYwEsiJh+DlTxw6I4VRe0My1kShJPgU8nls+ahiiRfHZB0VvI/I9W4wFbwRi2NQ8bqRuUeK6POactgK5gQFPSre/N3B4ca0WZAnfJvFV2QdiaIbFgpOfXsxe2OLtkjHnV7ZNzY/Jrj3Nq/yfd2Jj8nUvW7MCSuLMHaeSU5rF2ekY3P3murE3oFUKOepguL5aU79CtCtra0C04QMzwz+BkJsdn8Olr410s7obh+gZnFDEHJAfY8zPlRIWk87GzgiHsHBKbOzFyklF/Fbv97s0DW2Y5pJZu3Zjl6mSCW9epyAS+C0e5PiOXP/wooF2E6xXYgFJ6EA/lBzy1oNCqXKFAvoH0xoMa2KDiL/++cchcdn0Bm1L2KtJTpfOHMLSMveuU/0HqJ7MkUf6P6gqf0Y3O0WCrtjvVOOLPoBvxvPEWYCmRroaRZm6hcwIRA3QjcfuE/nq3HAX3dz8fr1u/kJ+UOLs8yya1Ojui7+323DSNd25sQQPmOMc2xj57iGCBSZeniTW7DbIVXKfJj5NRIdhwqWNyxKMRncy3Q0DI7Nh//apkyQvdnbFEc6NybdVHzXDJ2JCw1AZ3ZxkNeLWKUuQODd//lWAkySzp2forpB52Z8/IEBMV4zlrfUpy8JxyN2TwO2pJbmwmT5+rrtj+KQ+yRY+fSxli0vIIqvkM8bt93PXGBV44Gj9iJroNi2bwaQzl0pUqF5VCnXS932J7Ycs3BF4Gg+cQfT7MwWrAldmcJ3kxBnmtlKuOlL07EOktGTB1NUKOhD4GJtxCVjzpYMDLdqdSzdTPviPa7awFvokTmgmX2EWN4pl9rc9pZUfmbij9ac1bLh2IWGuXh9Pva9uoHGSLZr/Z2e13DFvgTnRUXTp/JpNViVxud8NfWHT71dHgWxZM9B9L3aJrxLzVY/hJKtvHbwfctvKEAcscKC5b+fqMTItEXSy9ywHU lQ0OR/CB WR0ZMek7BCr+HaMmYb7lXefnLizSTfCtehu8tvEgr37WdOAMUMu+MLha/BXkemfIo/Yoe4/69G2u3IAMAG1vpzF9OtGhPbe0k5+A+QXxV6H67j/EfWuPSsWznxhiqYsOteZw6vRn/Ols1oy808BEq2WxM8NQmgs9O2KYR9cuTMfCEztVirF7FEplWNjCvLYEOedASmccexbDqZ1B5WYVJ3gXYhfEzRKQjhlUV0UUqCWeQzppPSJIeY7ueYJl+/5sFWIhaHWnbQwc0obM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.001231, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi! I haven't tested this and might be missing something, but I think that the MMU notifier for KVM_GUEST_USES_PFN pfncache is currently a bit broken. Except that nothing seems to actually use KVM_GUEST_USES_PFN, so currently it's not actually a problem? gfn_to_pfn_cache_invalidate_start() contains the following: /* * If the OOM reaper is active, then all vCPUs should have * been stopped already, so perform the request without * KVM_REQUEST_WAIT and be sad if any needed to be IPI'd. */ if (!may_block) req &= ~KVM_REQUEST_WAIT; called = kvm_make_vcpus_request_mask(kvm, req, vcpu_bitmap); WARN_ON_ONCE(called && !may_block); The comment explains that we rely on OOM reaping only happening when a process is sufficiently far into being stopped that it is no longer executing vCPUs, but from what I can tell, that's not what the caller actually guarantees. Especially on the path from the process_mrelease() syscall (if we're dealing with a process whose mm is not shared with other processes), we only check that the target process has SIGNAL_GROUP_EXIT set. From what I can tell, that does imply that delivery of a fatal signal has begun, but doesn't even imply that the CPU running the target process has been IPI'd, let alone that the target process has died or anything like that. But I also don't see any reason why gfn_to_pfn_cache_invalidate_start() actually has to do anything special for non-blocking invalidation - from what I can tell, nothing in there can block, basically everything runs with preemption disabled. The first half of the function holds a spinlock; the second half is basically a call to kvm_make_vcpus_request_mask(), which disables preemption across the whole function with get_cpu()/put_cpu(). A synchronous IPI spins until the IPI has been acked but that doesn't count as sleeping. (And the rest of the OOM reaping code will do stuff like synchronous IPIs for its TLB flushes, too.) So maybe you/I can just rip out the special-casing of nonblocking mode from gfn_to_pfn_cache_invalidate_start() to fix this? Relevant call paths for the theoretical race: sys_kill prepare_kill_siginfo kill_something_info kill_proc_info rcu_read_lock kill_pid_info rcu_read_lock group_send_sig_info [PIDTYPE_TGID] do_send_sig_info lock_task_sighand [task->sighand->siglock] send_signal_locked __send_signal_locked prepare_signal legacy_queue signalfd_notify sigaddset(&pending->signal, sig) complete_signal signal->flags = SIGNAL_GROUP_EXIT [mrelease will work starting here] for each thread: sigaddset(&t->pending.signal, SIGKILL) signal_wake_up [IPI happens here] unlock_task_sighand [task->sighand->siglock] rcu_read_unlock rcu_read_unlock sys_process_mrelease find_lock_task_mm spin_lock(&p->alloc_lock) task_will_free_mem SIGNAL_GROUP_EXIT suffices PF_EXITING suffices if singlethreaded? task_unlock mmap_read_lock_killable __oom_reap_task_mm for each private non-PFNMAP/MIXED VMA: tlb_gather_mmu mmu_notifier_invalidate_range_start_nonblock __mmu_notifier_invalidate_range_start mn_hlist_invalidate_range_start kvm_mmu_notifier_invalidate_range_start [as ops->invalidate_range_start] gfn_to_pfn_cache_invalidate_start [loop over gfn_to_pfn_cache instances] if overlap and KVM_GUEST_USES_PFN [UNUSED]: evict_vcpus=true [if evict_vcpus] kvm_make_vcpus_request_mask __kvm_handle_hva_range unmap_page_range mmu_notifier_invalidate_range_end tlb_finish_mmu mmap_read_unlock