From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21749C369C2 for ; Mon, 28 Apr 2025 03:36:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CD5D06B0006; Sun, 27 Apr 2025 23:36:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C84286B0007; Sun, 27 Apr 2025 23:36:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B235A6B000A; Sun, 27 Apr 2025 23:36:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 93D066B0006 for ; Sun, 27 Apr 2025 23:36:39 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6750C5E8E0 for ; Mon, 28 Apr 2025 03:36:39 +0000 (UTC) X-FDA: 83382040518.12.4EF559C Received: from out-178.mta1.migadu.com (out-178.mta1.migadu.com [95.215.58.178]) by imf15.hostedemail.com (Postfix) with ESMTP id B257BA0006 for ; Mon, 28 Apr 2025 03:36:37 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Za8NqbMH; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745811397; a=rsa-sha256; cv=none; b=gHJECCIGtUMD9TQ6R8iUASJ/dmRtUFYIYOPgScJinR6Q8fGt34t49rbMRGzQY4aNWuF6lf MloSwmLDCCWcttrS9fJZSiO2nCqSuJAMT4rAIfeRJeALa51DL+yjFVl5nl7cOccMgm+zsp VBbJNOrXevyANYn7lwI6O85H1gjmizY= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Za8NqbMH; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf15.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.178 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745811397; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Yq8ss53Ru7Jhn38FBmHilN70R4el49iAxHid+VDVZdc=; b=VVuVK52opjWvTP25wqAtDibqib8M6BAPRw4KlRkxrKdImgpGb79CoL798ftF5lXNp1wokc ok16M+30ff7c2mQk5uAIC6ns7t/RcHpF4ntq/JZdcf3EWM7I4aRHNi7IkGUNjlcDR+Ta3G 9mIXmQ9FZC2GlVVqhfFxd1AYABOPNMw= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811396; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Yq8ss53Ru7Jhn38FBmHilN70R4el49iAxHid+VDVZdc=; b=Za8NqbMHlD/frdFKXQJ5cvdJMNbcCS/DRf5Jb1yVC4ngasstbn9VZgMIgtWblQFur+Y235 RN3ru9mjapI3Yt9K1ImzzZaqvTSrJu9HtMPrd8jZPA3hcXgk9TZ+vffZlihzedD7x2HEhK G7bxkMHe5bocYEvWA6a+Ku/DHw1wj0Q= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 01/12] mm: introduce a bpf hook for OOM handling Date: Mon, 28 Apr 2025 03:36:06 +0000 Message-ID: <20250428033617.3797686-2-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B257BA0006 X-Stat-Signature: 1k17rrbtfzdrod41az3ntqrmr7ppg8tn X-Rspam-User: X-HE-Tag: 1745811397-694208 X-HE-Meta: U2FsdGVkX1/vdQt15yMIU/W+peP9m838FqUyeewdj72citC+3LqZU/nPmGUsXLCBfxHF4Y+4qrL0dLLVjPRe2tKHk7soo3kQIFZmbKZKLZHNjffEfZlpOo9+7iqhKnWrzx5wA4y1mZwOf6FcPd/u4iNNSwHdSkBZZhi3t4+3afm9BNBaWlGIYf1eAXaT6Roi2fssB6ZTMtUvNpDsAa9t1JU/e8hKuJwMK8lJtBwXzXxwRh114LiTh9O1d1rDCs1Ywk/LUbUYFoFYsmsI/boa2xe4H+HNMKMNJR8WrbPdewT+fbDB/ddspYYlmjyYZj9jU3aqmTwmTogXSROWd43ouoHQ0UORnVKfl8bnFk+aJ+mYtTxKTIo82ksjA9sIMCJ6AA46S0EzQLLFobjPryHPj0Z5vkmW2etL2LIe44/npvtISnPjS4ch6SbP+a3B3HPnXGEHBjsR/Bw1SYVncd8qEMCI8Sc7XQluBlcErISVyAw6xAVjmTQwIiZxe/3+huXa3+iFDEMKVFk/xsfbyGgyFMsjFunFZEhtwgZXGfn+Bl79TUiLO1p2P1qNRxKty3PZa70MwjqQ4P5Wi/4zs+7R9X1Jd5TiLRK7zh3OvgzQi3YU8SsLEyEyYwE4+CQdtIZ+6u2N6PYL2JuW741M+Mev9f8GaRjnhMmR4VHUXTQhrfqFRekIkAt69yiAUDjPHzHNFBLeaCoxFV/OGAQgQYa4SUpAO2heLpNOnZVLo3hoNXFAzX9GNhYdOBzpEY9TqmDMcFD/94IXgxYY3SQrcvJ+wwIgjPdOwm3YDcJiFr/l/ZAgMNZqoXh758hnS4vdrBs9zThH0ezMacMlaH4CnpsSS5BuSYvVlofB6fTDJp74RvzvXQavjn62d2Ed2Q6T4bYnCN5DnkwaIYTDVR8Oev8VSdfT/tgMx9jn8NFYnj+QRl+QQXucGaAh4unu73KeiIwGEzrSYY+d+2MHS0h3EV5 09IUDtQ/ adhk3u+2zZqOmSD3Y9qR61MmHLYoWJvskcCy1vB8TtX5bAJVcYJQlo3Ae00FnlxpKw5Zzie4nxMjy+YC7soO1vzZgb1bsc9hvpK3KGOzLjtTa6TUP9CaPunvcLPbWBpi9ZhzAr6BMo4ZMxRtibxN07/1eeTH5CtaFu1Unt5QNkB9nmHk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Introduce a bpf hook for implementing custom OOM handling policies. The hook is int bpf_handle_out_of_memory(struct oom_control *oc) function, which expected to return 1 if it was able to free some memory and 0 otherwise. In the latter case it's guaranteed that the in-kernel OOM killer will be invoked. Otherwise the kernel also checks the bpf_memory_freed field of the oom_control structure, which is expected to be set by kfuncs suitable for releasing memory. It's a safety mechanism which prevents a bpf program to claim forward progress without actually releasing memory. The hook program is sleepable to enable using iterators, e.g. cgroup iterators. The hook is executed just before the kernel victim task selection algorithm, so all heuristics and sysctls like panic on oom, sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task are respected. Signed-off-by: Roman Gushchin --- include/linux/oom.h | 5 ++++ mm/oom_kill.c | 68 +++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) diff --git a/include/linux/oom.h b/include/linux/oom.h index 1e0fc6931ce9..cc14aac9742c 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -51,6 +51,11 @@ struct oom_control { /* Used to print the constraint info. */ enum oom_constraint constraint; + +#ifdef CONFIG_BPF_SYSCALL + /* Used by the bpf oom implementation to mark the forward progress */ + bool bpf_memory_freed; +#endif }; extern struct mutex oom_lock; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 25923cfec9c6..d00776b63c0a 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -45,6 +45,7 @@ #include #include #include +#include #include #include "internal.h" @@ -1100,6 +1101,30 @@ int unregister_oom_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_oom_notifier); +#ifdef CONFIG_BPF_SYSCALL +int bpf_handle_out_of_memory(struct oom_control *oc); + +/* + * Returns true if the bpf oom program returns 1 and some memory was + * freed. + */ +static bool bpf_handle_oom(struct oom_control *oc) +{ + if (WARN_ON_ONCE(oc->chosen)) + oc->chosen = NULL; + + oc->bpf_memory_freed = false; + + return bpf_handle_out_of_memory(oc) && oc->bpf_memory_freed; +} + +#else +static inline bool bpf_handle_oom(struct oom_control *oc) +{ + return 0; +} +#endif + /** * out_of_memory - kill the "best" process when we run out of memory * @oc: pointer to struct oom_control @@ -1161,6 +1186,13 @@ bool out_of_memory(struct oom_control *oc) return true; } + /* + * Let bpf handle the OOM first. If it was able to free up some memory, + * bail out. Otherwise fall back to the kernel OOM killer. + */ + if (bpf_handle_oom(oc)) + return true; + select_bad_process(oc); /* Found nothing?!?! */ if (!oc->chosen) { @@ -1264,3 +1296,39 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) return -ENOSYS; #endif /* CONFIG_MMU */ } + +#ifdef CONFIG_BPF_SYSCALL + +__bpf_hook_start(); + +/* + * Bpf hook to customize the oom handling policy. + */ +__weak noinline int bpf_handle_out_of_memory(struct oom_control *oc) +{ + return 0; +} + +__bpf_hook_end(); + +BTF_KFUNCS_START(bpf_oom_hooks) +BTF_ID_FLAGS(func, bpf_handle_out_of_memory, KF_SLEEPABLE) +BTF_KFUNCS_END(bpf_oom_hooks) + +static const struct btf_kfunc_id_set bpf_oom_hook_set = { + .owner = THIS_MODULE, + .set = &bpf_oom_hooks, +}; +static int __init bpf_oom_init(void) +{ + int err; + + err = register_btf_fmodret_id_set(&bpf_oom_hook_set); + if (err) + pr_warn("error while registering bpf oom hooks: %d", err); + + return err; +} +late_initcall(bpf_oom_init); + +#endif -- 2.49.0.901.g37484f566f-goog