From: Roman Gushchin <roman.gushchin@linux.dev>
To: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
Alexei Starovoitov <ast@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Suren Baghdasaryan <surenb@google.com>,
David Rientjes <rientjes@google.com>,
Josh Don <joshdon@google.com>,
Chuyi Zhou <zhouchuyi@bytedance.com>,
cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org,
Roman Gushchin <roman.gushchin@linux.dev>
Subject: [PATCH rfc 01/12] mm: introduce a bpf hook for OOM handling
Date: Mon, 28 Apr 2025 03:36:06 +0000 [thread overview]
Message-ID: <20250428033617.3797686-2-roman.gushchin@linux.dev> (raw)
In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev>
Introduce a bpf hook for implementing custom OOM handling policies.
The hook is int bpf_handle_out_of_memory(struct oom_control *oc)
function, which expected to return 1 if it was able to free some
memory and 0 otherwise. In the latter case it's guaranteed that
the in-kernel OOM killer will be invoked. Otherwise the kernel
also checks the bpf_memory_freed field of the oom_control structure,
which is expected to be set by kfuncs suitable for releasing memory.
It's a safety mechanism which prevents a bpf program to claim
forward progress without actually releasing memory.
The hook program is sleepable to enable using iterators, e.g.
cgroup iterators.
The hook is executed just before the kernel victim task selection
algorithm, so all heuristics and sysctls like panic on oom,
sysctl_oom_kill_allocating_task and sysctl_oom_kill_allocating_task
are respected.
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
---
include/linux/oom.h | 5 ++++
mm/oom_kill.c | 68 +++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/include/linux/oom.h b/include/linux/oom.h
index 1e0fc6931ce9..cc14aac9742c 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -51,6 +51,11 @@ struct oom_control {
/* Used to print the constraint info. */
enum oom_constraint constraint;
+
+#ifdef CONFIG_BPF_SYSCALL
+ /* Used by the bpf oom implementation to mark the forward progress */
+ bool bpf_memory_freed;
+#endif
};
extern struct mutex oom_lock;
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 25923cfec9c6..d00776b63c0a 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -45,6 +45,7 @@
#include <linux/mmu_notifier.h>
#include <linux/cred.h>
#include <linux/nmi.h>
+#include <linux/bpf.h>
#include <asm/tlb.h>
#include "internal.h"
@@ -1100,6 +1101,30 @@ int unregister_oom_notifier(struct notifier_block *nb)
}
EXPORT_SYMBOL_GPL(unregister_oom_notifier);
+#ifdef CONFIG_BPF_SYSCALL
+int bpf_handle_out_of_memory(struct oom_control *oc);
+
+/*
+ * Returns true if the bpf oom program returns 1 and some memory was
+ * freed.
+ */
+static bool bpf_handle_oom(struct oom_control *oc)
+{
+ if (WARN_ON_ONCE(oc->chosen))
+ oc->chosen = NULL;
+
+ oc->bpf_memory_freed = false;
+
+ return bpf_handle_out_of_memory(oc) && oc->bpf_memory_freed;
+}
+
+#else
+static inline bool bpf_handle_oom(struct oom_control *oc)
+{
+ return 0;
+}
+#endif
+
/**
* out_of_memory - kill the "best" process when we run out of memory
* @oc: pointer to struct oom_control
@@ -1161,6 +1186,13 @@ bool out_of_memory(struct oom_control *oc)
return true;
}
+ /*
+ * Let bpf handle the OOM first. If it was able to free up some memory,
+ * bail out. Otherwise fall back to the kernel OOM killer.
+ */
+ if (bpf_handle_oom(oc))
+ return true;
+
select_bad_process(oc);
/* Found nothing?!?! */
if (!oc->chosen) {
@@ -1264,3 +1296,39 @@ SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags)
return -ENOSYS;
#endif /* CONFIG_MMU */
}
+
+#ifdef CONFIG_BPF_SYSCALL
+
+__bpf_hook_start();
+
+/*
+ * Bpf hook to customize the oom handling policy.
+ */
+__weak noinline int bpf_handle_out_of_memory(struct oom_control *oc)
+{
+ return 0;
+}
+
+__bpf_hook_end();
+
+BTF_KFUNCS_START(bpf_oom_hooks)
+BTF_ID_FLAGS(func, bpf_handle_out_of_memory, KF_SLEEPABLE)
+BTF_KFUNCS_END(bpf_oom_hooks)
+
+static const struct btf_kfunc_id_set bpf_oom_hook_set = {
+ .owner = THIS_MODULE,
+ .set = &bpf_oom_hooks,
+};
+static int __init bpf_oom_init(void)
+{
+ int err;
+
+ err = register_btf_fmodret_id_set(&bpf_oom_hook_set);
+ if (err)
+ pr_warn("error while registering bpf oom hooks: %d", err);
+
+ return err;
+}
+late_initcall(bpf_oom_init);
+
+#endif
--
2.49.0.901.g37484f566f-goog
next prev parent reply other threads:[~2025-04-28 3:36 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-28 3:36 [PATCH rfc 00/12] mm: BPF OOM Roman Gushchin
2025-04-28 3:36 ` Roman Gushchin [this message]
2025-04-28 3:36 ` [PATCH rfc 02/12] bpf: mark struct oom_control's memcg field as TRUSTED_OR_NULL Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 03/12] bpf: treat fmodret tracing program's arguments as trusted Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 04/12] mm: introduce bpf_oom_kill_process() bpf kfunc Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 05/12] mm: introduce bpf kfuncs to deal with memcg pointers Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 06/12] mm: introduce bpf_get_root_mem_cgroup() bpf kfunc Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 07/12] bpf: selftests: introduce read_cgroup_file() helper Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 08/12] bpf: selftests: bpf OOM handler test Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 09/12] sched: psi: bpf hook to handle psi events Roman Gushchin
2025-04-28 6:11 ` kernel test robot
2025-04-30 0:28 ` Suren Baghdasaryan
2025-04-30 0:58 ` Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 10/12] mm: introduce bpf_out_of_memory() bpf kfunc Roman Gushchin
2025-04-29 11:46 ` Michal Hocko
2025-04-29 21:31 ` Roman Gushchin
2025-04-30 7:27 ` Michal Hocko
2025-04-30 14:53 ` Roman Gushchin
2025-05-05 8:08 ` Michal Hocko
2025-04-28 3:36 ` [PATCH rfc 11/12] bpf: selftests: introduce open_cgroup_file() helper Roman Gushchin
2025-04-28 3:36 ` [PATCH rfc 12/12] bpf: selftests: psi handler test Roman Gushchin
2025-04-28 10:43 ` [PATCH rfc 00/12] mm: BPF OOM Matt Bobrowski
2025-04-28 17:24 ` Roman Gushchin
2025-04-29 1:56 ` Kumar Kartikeya Dwivedi
2025-04-29 15:42 ` Roman Gushchin
2025-05-02 17:26 ` Song Liu
2025-04-29 11:42 ` Michal Hocko
2025-04-29 14:44 ` Roman Gushchin
2025-04-29 21:56 ` Suren Baghdasaryan
2025-04-29 22:17 ` Roman Gushchin
2025-04-29 23:01 ` Suren Baghdasaryan
2025-04-29 22:44 ` Suren Baghdasaryan
2025-04-29 23:01 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250428033617.3797686-2-roman.gushchin@linux.dev \
--to=roman.gushchin@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=joshdon@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=shakeel.butt@linux.dev \
--cc=surenb@google.com \
--cc=zhouchuyi@bytedance.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox