linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jingxiang Zeng <jingxiangzeng.cas@gmail.com>
To: akpm@linux-foundation.org
Cc: linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, hannes@cmpxchg.org,
	mhocko@kernel.org, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev,
	kasong@tencent.com, Zeng Jingxiang <linuszeng@tencent.com>
Subject: [RFC 4/5] mm/memcontrol: allow memsw account in cgroup v2
Date: Wed, 19 Mar 2025 14:41:47 +0800	[thread overview]
Message-ID: <20250319064148.774406-5-jingxiangzeng.cas@gmail.com> (raw)
In-Reply-To: <20250319064148.774406-1-jingxiangzeng.cas@gmail.com>

From: Zeng Jingxiang <linuszeng@tencent.com>

memsw account is a very useful knob for container memory
overcommitting: It's a great abstraction of the "expected total
memory usage" of a container, so containers can't allocate too
much memory using SWAP, but still be able to SWAP out.

For a simple example, with memsw.limit == memory.limit, containers
can't exceed their original memory limit, even with SWAP enabled, they
get OOM killed as how they used to, but the host is now able to
offload cold pages.

Similar ability seems absent with V2: With memory.swap.max == 0, the
host can't use SWAP to reclaim container memory at all. But with a
value larger than that, containers are able to overuse memory, causing
delayed OOM kill, thrashing, CPU/Memory usage ratio could be heavily
out of balance, especially with compress SWAP backends.

This patch restores the semantics of memory.swap.max to be consistent
with memory.memsw.limit_in_bytes and the semantics of
memory.swap.current to be consistent with memory.memsw.usage_in_bytes
when MEMSW_ACCOUNT_ON_DFL config or cgroup.memsw_account_on_dfl
startup parameter is enabled.

Signed-off-by: Zeng Jingxiang <linuszeng@tencent.com>
---
 mm/memcontrol-v1.c |  2 +-
 mm/memcontrol-v1.h |  4 +++-
 mm/memcontrol.c    | 29 +++++++++++++++++++----------
 3 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c
index c1feb3945350..3344d5e25822 100644
--- a/mm/memcontrol-v1.c
+++ b/mm/memcontrol-v1.c
@@ -1436,7 +1436,7 @@ void memcg1_oom_finish(struct mem_cgroup *memcg, bool locked)
 
 static DEFINE_MUTEX(memcg_max_mutex);
 
-static int mem_cgroup_resize_max(struct mem_cgroup *memcg,
+int mem_cgroup_resize_max(struct mem_cgroup *memcg,
 				 unsigned long max, bool memsw)
 {
 	bool enlarge = false;
diff --git a/mm/memcontrol-v1.h b/mm/memcontrol-v1.h
index 6358464bb416..7f7ef9f6d03e 100644
--- a/mm/memcontrol-v1.h
+++ b/mm/memcontrol-v1.h
@@ -36,10 +36,12 @@ struct mem_cgroup *mem_cgroup_id_get_online(struct mem_cgroup *memcg);
 /* Cgroup v1-specific declarations */
 #ifdef CONFIG_MEMCG_V1
 
+int mem_cgroup_resize_max(struct mem_cgroup *memcg,
+				 unsigned long max, bool memsw);
 /* Whether legacy memory+swap accounting is active */
 static inline bool do_memsw_account(void)
 {
-	return !cgroup_subsys_on_dfl(memory_cgrp_subsys);
+	return !cgroup_subsys_on_dfl(memory_cgrp_subsys) || do_memsw_account_on_dfl();
 }
 
 unsigned long memcg_events_local(struct mem_cgroup *memcg, int event);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 623ebf610946..d85699fa8a90 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -5205,9 +5205,12 @@ static ssize_t swap_max_write(struct kernfs_open_file *of,
 	if (err)
 		return err;
 
-	xchg(&memcg->swap.max, max);
+	if (do_memsw_account_on_dfl())
+		err = mem_cgroup_resize_max(memcg, max, true);
+	else
+		xchg(&memcg->swap.max, max);
 
-	return nbytes;
+	return err ?: nbytes;
 }
 
 static int swap_events_show(struct seq_file *m, void *v)
@@ -5224,24 +5227,28 @@ static int swap_events_show(struct seq_file *m, void *v)
 	return 0;
 }
 
-static struct cftype swap_files[] = {
+static struct cftype swap_files_v1[] = {
 	{
 		.name = "swap.current",
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = swap_current_read,
 	},
-	{
-		.name = "swap.high",
-		.flags = CFTYPE_NOT_ON_ROOT,
-		.seq_show = swap_high_show,
-		.write = swap_high_write,
-	},
 	{
 		.name = "swap.max",
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = swap_max_show,
 		.write = swap_max_write,
 	},
+	{ }	/* terminate */
+};
+
+static struct cftype swap_files[] = {
+	{
+		.name = "swap.high",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.seq_show = swap_high_show,
+		.write = swap_high_write,
+	},
 	{
 		.name = "swap.max.effective",
 		.flags = CFTYPE_NOT_ON_ROOT,
@@ -5473,7 +5480,9 @@ static int __init mem_cgroup_swap_init(void)
 	if (mem_cgroup_disabled())
 		return 0;
 
-	WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files));
+	WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files_v1));
+	if (!do_memsw_account_on_dfl())
+		WARN_ON(cgroup_add_dfl_cftypes(&memory_cgrp_subsys, swap_files));
 #ifdef CONFIG_MEMCG_V1
 	WARN_ON(cgroup_add_legacy_cftypes(&memory_cgrp_subsys, memsw_files));
 #endif
-- 
2.41.1



  parent reply	other threads:[~2025-03-19  6:42 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-19  6:41 [RFC 0/5] add option to restore swap account to cgroupv1 mode Jingxiang Zeng
2025-03-19  6:41 ` [RFC 1/5] Kconfig: add SWAP_CHARGE_V1_MODE config Jingxiang Zeng
2025-03-19 19:29   ` Shakeel Butt
2025-03-19 19:31     ` Shakeel Butt
2025-03-19  6:41 ` [RFC 2/5] memcontrol: add boot option to enable memsw account on dfl Jingxiang Zeng
2025-03-19 19:34   ` Shakeel Butt
2025-03-19 22:30     ` Roman Gushchin
2025-03-20  8:43       ` jingxiang zeng
2025-03-20 14:28       ` Johannes Weiner
2025-03-20 15:16         ` Roman Gushchin
2025-03-20 15:33         ` Shakeel Butt
2025-04-02 13:40           ` Michal Koutný
2025-04-03  7:47             ` [External] " Zhongkun He
2025-04-03  9:16               ` jingxiang zeng
2025-04-11 16:57                 ` Michal Koutný
2025-04-16  8:29                   ` jingxiang zeng
2025-05-05 18:29                     ` Michal Koutný
2025-03-20  8:51     ` jingxiang zeng
2025-03-19  6:41 ` [RFC 3/5] mm/memcontrol: do not scan anon pages if memsw limit is hit Jingxiang Zeng
2025-03-19 19:36   ` Shakeel Butt
2025-03-20  8:40     ` jingxiang zeng
2025-03-19  6:41 ` Jingxiang Zeng [this message]
2025-03-19  6:41 ` [RFC 5/5] Docs/cgroup-v2: add cgroup.memsw_account_on_dfl Documentation Jingxiang Zeng
2025-03-19 19:27 ` [RFC 0/5] add option to restore swap account to cgroupv1 mode Shakeel Butt
2025-03-19 19:38 ` Johannes Weiner
2025-03-19 19:51   ` Shakeel Butt
2025-03-20  8:09   ` jingxiang zeng
2025-03-20 15:08     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250319064148.774406-5-jingxiangzeng.cas@gmail.com \
    --to=jingxiangzeng.cas@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kasong@tencent.com \
    --cc=linuszeng@tencent.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox