From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3747C369D8 for ; Mon, 28 Apr 2025 03:37:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1243A6B0027; Sun, 27 Apr 2025 23:37:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0D5776B0028; Sun, 27 Apr 2025 23:37:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E44EB6B0029; Sun, 27 Apr 2025 23:37:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BF62B6B0027 for ; Sun, 27 Apr 2025 23:37:00 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 7EF09121200 for ; Mon, 28 Apr 2025 03:37:00 +0000 (UTC) X-FDA: 83382041400.11.B8FE934 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf19.hostedemail.com (Postfix) with ESMTP id C878F1A000A for ; Mon, 28 Apr 2025 03:36:58 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=b4qM1b0v; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745811419; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jDg9cWJDtBRFaLTipYCw/9zZK5hSUuCOB7afaAZtBZU=; b=kJW1yjyJkh4mk1KL9Pe4KzN7irqtb/ODa2K10ZlLyw1ZPnugXrM49vL6hzdXF4d8JI63EU 5MCqR2dkJUCyIgIbJbqqSbN+nYjfRSw/SjLSqJQ7aDR5JivDnBgwsLtSfa10VYPBGar7dF UkwudGqskkKq3uwACpGsvoY+o80t5n0= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=b4qM1b0v; spf=pass (imf19.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745811419; a=rsa-sha256; cv=none; b=K4uAjYHzvIwiyWeyeJnAwC8wxULfePTBbU5UnYhrNg2ALkVPcPu3NYc6I1sj2FluJe8Jkd 3eeGDx0+XJ+Y807P2FZF8sfRecD+ovDx180NsZpzvFiJi2VB6AV53eojIuk9WajKc9ocdQ goarZpwp7c0Re5LgkmCdZe8LCb/F+ss= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1745811417; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jDg9cWJDtBRFaLTipYCw/9zZK5hSUuCOB7afaAZtBZU=; b=b4qM1b0vHDaVKMDQkiweM3oerzIpfr+kT4u23fRV2Aj7X2MD0OkhU9s8ac2wjFSF/r0x5x uyPxww42H+ZN2hHtpOstR51Iv3G7vORPb5dxhqBlHfHO1OTT0Tfxwa7Qg/GgBrFPVV2/p6 BhQtCsd6CgUO88vCscH04YN+7nozXf0= From: Roman Gushchin To: linux-kernel@vger.kernel.org Cc: Andrew Morton , Alexei Starovoitov , Johannes Weiner , Michal Hocko , Shakeel Butt , Suren Baghdasaryan , David Rientjes , Josh Don , Chuyi Zhou , cgroups@vger.kernel.org, linux-mm@kvack.org, bpf@vger.kernel.org, Roman Gushchin Subject: [PATCH rfc 08/12] bpf: selftests: bpf OOM handler test Date: Mon, 28 Apr 2025 03:36:13 +0000 Message-ID: <20250428033617.3797686-9-roman.gushchin@linux.dev> In-Reply-To: <20250428033617.3797686-1-roman.gushchin@linux.dev> References: <20250428033617.3797686-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: C878F1A000A X-Stat-Signature: s4omara6hfr3xncosx4kkecw6pcpfn96 X-HE-Tag: 1745811418-722045 X-HE-Meta: U2FsdGVkX1/fgGQ/6RRycYXaR16nZUzXSPpQpJnvPRoMI+yahyjGdiV9MxXN+SGKL3fs0Sfu+Mmn7kBJUmxM8JTjr4UG/kkrTsH1pOWLwfaPSWsEO2/ZStwaIROZ0LQiqre5O2Sa0neT9GqCM7NIl4NTuYhAA79eoCRG5ITBSai7gSyEQppeNBEw6m3OsQwmcRrVNPHRc/eX9QU1F2kuDIxSGgUzd1dTmXLblVQ1VLL0zO/zdmcF2XrwQmx7vkwdFhRCxLPFSPEHctC9k6oI+DWtHuW1s1hve/q/i7MgWx4MAcic4qdijQ/4e75V1lI2w64N24/SOTrw1NU52s5Q29PBR57ccUJw4flBXFMaH/adnkSdUyO2QUqJTM5v3H0x5dwVSgqVihBphkR3VkR3nuVF4xxFU82sijO2BG/RQteyn6sQVy+1Y0Y5FxnMG4r1APWJlpSu+lI4LAHy9hcDUL+5tz48p+CfdBfCs2koYcI/PGkWxnTQ8uAbzcmH4Tvo9t707g3uvJ1V8pDCq7G8DpWZmkxK6aN8TVFjrWMpsZsjmHB2SOHsJWVQJy1ozVQWTmxgkLQ5I2iIaABG9063YZRxhsEm3nJtbn34nZdy8/clo1NZYv26EQfqqHs6pN5RbjuHDt2m/1q7fHMUMAGvu2BElXn0TlYf0GNPn0teI3lcL9qXME0lqlZgyFjsz3wrhcmUCzKA3XQ2r70D0rmZype4jeJ/TMxQFIsuW1amiFl+wpEedqfVf5EmAaYrxrE4xOkLGQHNdzvSjie+pCgQhB0FvhEe7jSfgiTHvwGX3K6ypefiv+qn6/xRKv8JCyAWpfcJF/EKXA0nwbd2LyORqVYK0c1IaonOGHdoCU6lcQTpp+hQGZB/8w3Oxk9Jx68c/R8oIokIt0SSXIeMDMQreFnUE9h+5Rm4X2CbURiPurHS6EZYXkKY++VNNv3hdghH8tRQcUaIqxPnyQqV1tt Z7KjS0vJ omHQ/oC5qXZ8hFej8XpTwxfEg1niP7Ewm2vEEe8Szr615vxduDZ9bWTTDXMeMSnSUgpWnBp2P74dy0ObdcHd96BVhBO3+do8mnLukvO8vsP4mdfzV4GET5ZyRpcgfGpKsHfQZSvxDNi3G+ltSPZG+wTkVjwdghWYOBWohBGLi2fzT0Js= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Implement a pseudo-realistic test for the OOM handling functionality. The OOM handling policy which is implemented in bpf is to kill all tasks belonging to the biggest leaf cgroup, which doesn't contain unkillable tasks (tasks with oom_score_adj set to -1000). Pagecache size is excluded from the accounting. The test creates a hierarchy of memory cgroups, causes an OOM at the top level, checks that the expected process will be killed and checks memcg's oom statistics. Signed-off-by: Roman Gushchin --- tools/testing/selftests/bpf/prog_tests/oom.c | 227 +++++++++++++++++++ tools/testing/selftests/bpf/progs/test_oom.c | 103 +++++++++ 2 files changed, 330 insertions(+) create mode 100644 tools/testing/selftests/bpf/prog_tests/oom.c create mode 100644 tools/testing/selftests/bpf/progs/test_oom.c diff --git a/tools/testing/selftests/bpf/prog_tests/oom.c b/tools/testing/selftests/bpf/prog_tests/oom.c new file mode 100644 index 000000000000..224c25334385 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/oom.c @@ -0,0 +1,227 @@ +// SPDX-License-Identifier: GPL-2.0-only +#define _GNU_SOURCE + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "cgroup_helpers.h" +#include "test_oom.skel.h" + +struct cgroup_desc { + const char *path; + int fd; + unsigned long long id; + int pid; + size_t target; + size_t max; + int oom_score_adj; + bool victim; +}; + +#define MB (1024 * 1024) +#define OOM_SCORE_ADJ_MIN (-1000) +#define OOM_SCORE_ADJ_MAX 1000 + +static struct cgroup_desc cgroups[] = { + { .path = "/oom_test", .max = 80 * MB}, + { .path = "/oom_test/cg1", .target = 10 * MB, + .oom_score_adj = OOM_SCORE_ADJ_MAX }, + { .path = "/oom_test/cg2", .target = 40 * MB, + .oom_score_adj = OOM_SCORE_ADJ_MIN }, + { .path = "/oom_test/cg3" }, + { .path = "/oom_test/cg3/cg4", .target = 30 * MB, + .victim = true }, + { .path = "/oom_test/cg3/cg5", .target = 20 * MB }, +}; + +static int spawn_task(struct cgroup_desc *desc) +{ + char *ptr; + int pid; + + pid = fork(); + if (pid < 0) + return pid; + + if (pid > 0) { + /* parent */ + desc->pid = pid; + return 0; + } + + /* child */ + if (desc->oom_score_adj) { + char buf[64]; + int fd = open("/proc/self/oom_score_adj", O_WRONLY); + + if (fd < 0) + return -1; + + snprintf(buf, sizeof(buf), "%d", desc->oom_score_adj); + write(fd, buf, sizeof(buf)); + close(fd); + } + + ptr = (char *)malloc(desc->target); + if (!ptr) + return -ENOMEM; + + memset(ptr, 'a', desc->target); + + while (1) + sleep(1000); + + return 0; +} + +static void setup_environment(void) +{ + int i, err; + + err = setup_cgroup_environment(); + if (!ASSERT_OK(err, "setup_cgroup_environment")) + goto cleanup; + + for (i = 0; i < ARRAY_SIZE(cgroups); i++) { + cgroups[i].fd = create_and_get_cgroup(cgroups[i].path); + if (!ASSERT_GE(cgroups[i].fd, 0, "create_and_get_cgroup")) + goto cleanup; + + cgroups[i].id = get_cgroup_id(cgroups[i].path); + if (!ASSERT_GT(cgroups[i].id, 0, "get_cgroup_id")) + goto cleanup; + + if (i == 0) { + /* Freeze the top-level cgroup */ + err = write_cgroup_file(cgroups[i].path, "cgroup.freeze", "1"); + if (!ASSERT_OK(err, "freeze cgroup")) + goto cleanup; + } + + if (!cgroups[i].target) { + /* Recursively enable the memory controller */ + err = write_cgroup_file(cgroups[i].path, "cgroup.subtree_control", + "+memory"); + if (!ASSERT_OK(err, "enable memory controller")) + goto cleanup; + } + + if (cgroups[i].max) { + char buf[256]; + + snprintf(buf, sizeof(buf), "%lu", cgroups[i].max); + err = write_cgroup_file(cgroups[i].path, "memory.max", buf); + if (!ASSERT_OK(err, "set memory.max")) + goto cleanup; + + snprintf(buf, sizeof(buf), "0"); + write_cgroup_file(cgroups[i].path, "memory.swap.max", buf); + + } + + if (cgroups[i].target) { + char buf[256]; + + err = spawn_task(&cgroups[i]); + if (!ASSERT_OK(err, "spawn task")) + goto cleanup; + + snprintf(buf, sizeof(buf), "%d", cgroups[i].pid); + err = write_cgroup_file(cgroups[i].path, "cgroup.procs", buf); + if (!ASSERT_OK(err, "put child into a cgroup")) + goto cleanup; + } + } + + return; + +cleanup: + cleanup_cgroup_environment(); +} + +static int run_and_wait_for_oom(void) +{ + int ret = -1; + bool first = true; + char buf[4096] = {}; + size_t size; + + ret = write_cgroup_file(cgroups[0].path, "cgroup.freeze", "0"); + if (!ASSERT_OK(ret, "freeze cgroup")) + return -1; + + for (;;) { + int i, status; + pid_t pid = wait(&status); + + if (pid == -1) { + if (errno == EINTR) + continue; + /* ECHILD */ + break; + } + + if (!first) + continue; + + first = false; + + for (i = 0; i < ARRAY_SIZE(cgroups); i++) { + if (!ASSERT_OK(cgroups[i].victim != + (pid == cgroups[i].pid), + "correct process was killed")) { + ret = -1; + break; + } + + if (!cgroups[i].victim) + continue; + + size = read_cgroup_file(cgroups[i].path, + "memory.events", + buf, sizeof(buf)); + if (!ASSERT_OK(size <= 0, "read memory.events")) { + ret = -1; + break; + } + + if (!ASSERT_OK(strstr(buf, "oom_kill 1") == NULL, + "oom_kill count check")) { + ret = -1; + break; + } + } + + for (i = 0; i < ARRAY_SIZE(cgroups); i++) + if (cgroups[i].pid && cgroups[i].pid != pid) + kill(cgroups[i].pid, SIGKILL); + } + + return ret; +} + +void test_oom(void) +{ + struct test_oom *skel; + int err; + + skel = test_oom__open_and_load(); + err = test_oom__attach(skel); + if (!ASSERT_OK(err, "test_oom__attach")) + goto cleanup; + + setup_environment(); + + run_and_wait_for_oom(); + + cleanup_cgroup_environment(); +cleanup: + test_oom__destroy(skel); +} diff --git a/tools/testing/selftests/bpf/progs/test_oom.c b/tools/testing/selftests/bpf/progs/test_oom.c new file mode 100644 index 000000000000..a8224d7c3fed --- /dev/null +++ b/tools/testing/selftests/bpf/progs/test_oom.c @@ -0,0 +1,103 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include "vmlinux.h" +#include +#include +#include "bpf_misc.h" +#include "bpf_experimental.h" + +char _license[] SEC("license") = "GPL"; + +#define OOM_SCORE_ADJ_MIN (-1000) + +void bpf_rcu_read_lock(void) __ksym; +void bpf_rcu_read_unlock(void) __ksym; +struct task_struct *bpf_task_acquire(struct task_struct *p) __ksym; +void bpf_task_release(struct task_struct *p) __ksym; +struct mem_cgroup *bpf_get_root_mem_cgroup(void) __ksym; +struct mem_cgroup *bpf_get_mem_cgroup(struct cgroup_subsys_state *css) __ksym; +void bpf_put_mem_cgroup(struct mem_cgroup *memcg) __ksym; +int bpf_oom_kill_process(struct oom_control *oc, struct task_struct *task, + const char *message__str) __ksym; + +bool mem_cgroup_killable(struct mem_cgroup *memcg) +{ + struct task_struct *task; + bool ret = true; + + bpf_for_each(css_task, task, &memcg->css, CSS_TASK_ITER_PROCS) + if (task->signal->oom_score_adj == OOM_SCORE_ADJ_MIN) + return false; + + return ret; +} + +/* + * Find the largest leaf cgroup (ignoring page cache) without unkillable tasks + * and kill all belonging tasks. + */ +SEC("fmod_ret.s/bpf_handle_out_of_memory") +int BPF_PROG(test_bpf_out_of_memory, struct oom_control *oc) +{ + struct task_struct *task; + struct mem_cgroup *root_memcg = oc->memcg; + struct mem_cgroup *memcg, *victim = NULL; + struct cgroup_subsys_state *css_pos; + unsigned long usage, max_usage = 0; + unsigned long pagecache = 0; + int ret = 0; + + if (root_memcg) + root_memcg = bpf_get_mem_cgroup(&root_memcg->css); + else + root_memcg = bpf_get_root_mem_cgroup(); + + if (!root_memcg) + return 0; + + bpf_rcu_read_lock(); + bpf_for_each(css, css_pos, &root_memcg->css, BPF_CGROUP_ITER_DESCENDANTS_POST) { + if (css_pos->cgroup->nr_descendants + css_pos->cgroup->nr_dying_descendants) + continue; + + memcg = bpf_get_mem_cgroup(css_pos); + if (!memcg) + continue; + + usage = bpf_mem_cgroup_usage(memcg, false); + pagecache = bpf_mem_cgroup_page_state(memcg, NR_FILE_PAGES); + + if (usage > pagecache) + usage -= pagecache; + else + usage = 0; + + if ((usage > max_usage) && mem_cgroup_killable(memcg)) { + max_usage = usage; + if (victim) + bpf_put_mem_cgroup(victim); + victim = bpf_get_mem_cgroup(&memcg->css); + } + + bpf_put_mem_cgroup(memcg); + } + bpf_rcu_read_unlock(); + + if (!victim) + goto exit; + + bpf_for_each(css_task, task, &victim->css, CSS_TASK_ITER_PROCS) { + struct task_struct *t = bpf_task_acquire(task); + + if (t) { + bpf_oom_kill_process(oc, task, "bpf oom test"); + bpf_task_release(t); + ret = 1; + } + } + + bpf_put_mem_cgroup(victim); +exit: + bpf_put_mem_cgroup(root_memcg); + + return ret; +} -- 2.49.0.901.g37484f566f-goog