linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] mm: zswap: add per-memcg stat for incompressible pages
@ 2026-02-06  7:22 Jiayuan Chen
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
  2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
  0 siblings, 2 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-02-06  7:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Jiayuan Chen, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Nhat Pham,
	Chengming Zhou, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-kselftest

In containerized environments, knowing which cgroup is contributing
incompressible pages to zswap is essential for effective resource
management. This series adds a new per-memcg stat 'zswap_incomp' to
track incompressible pages, along with a selftest.

Patch 1: Add the per-memcg zswap_incomp stat and documentation
Patch 2: Add selftest for the new stat

Changes v1 -> v2:
https://lore.kernel.org/linux-mm/20260205053013.25134-1-jiayuan.chen@linux.dev/

- Rename zswpraw/MEMCG_ZSWAP_RAW to zswap_incomp/MEMCG_ZSWAP_INCOMP
  (Shakeel Butt, Yosry Ahmed)
- Drop zswap_is_incomp() helper, keep opencode (size == PAGE_SIZE) with
  comments explaining the incompressibility check (Yosry Ahmed)
- Add documentation in cgroup-v2.rst (Nhat Pham, SeongJae Park)
- Add selftest as a separate patch (Nhat Pham)
- Add reference link to Chris Li's discussion on the need for per-memcg
  incompressible page tracking (Nhat Pham)

Jiayuan Chen (2):
  mm: zswap: add per-memcg stat for incompressible pages
  selftests/cgroup: add test for zswap incompressible pages

 Documentation/admin-guide/cgroup-v2.rst     |  5 ++
 include/linux/memcontrol.h                  |  1 +
 mm/memcontrol.c                             |  8 ++
 tools/testing/selftests/cgroup/test_zswap.c | 96 +++++++++++++++++++++
 4 files changed, 110 insertions(+)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/2] mm: zswap: add per-memcg stat for incompressible pages
  2026-02-06  7:22 [PATCH v2 0/2] mm: zswap: add per-memcg stat for incompressible pages Jiayuan Chen
@ 2026-02-06  7:22 ` Jiayuan Chen
  2026-02-06 15:19   ` Yosry Ahmed
                     ` (3 more replies)
  2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
  1 sibling, 4 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-02-06  7:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Jiayuan Chen, Nhat Pham, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Chengming Zhou,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-kselftest

From: Jiayuan Chen <jiayuan.chen@shopee.com>

The global zswap_stored_incompressible_pages counter was added in commit
dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
to track how many pages are stored in raw (uncompressed) form in zswap.
However, in containerized environments, knowing which cgroup is
contributing incompressible pages is essential for effective resource
management [1].

Add a new memcg stat 'zswap_incomp' to track incompressible pages per
cgroup. This helps administrators and orchestrators to:

1. Identify workloads that produce incompressible data (e.g., encrypted
   data, already-compressed media, random data) and may not benefit from
   zswap.

2. Make informed decisions about workload placement - moving
   incompressible workloads to nodes with larger swap backing devices
   rather than relying on zswap.

3. Debug zswap efficiency issues at the cgroup level without needing to
   correlate global stats with individual cgroups.

While the compression ratio can be estimated from existing stats
(zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
"uniformly poor compression" and "a few completely incompressible pages
mixed with highly compressible ones". The zswap_incomp stat provides
direct visibility into the latter case.

[1]: https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
Acked-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 5 +++++
 include/linux/memcontrol.h              | 1 +
 mm/memcontrol.c                         | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7f5b59d95fce..78a329414615 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1737,6 +1737,11 @@ The following nested keys are defined.
 	  zswpwb
 		Number of pages written from zswap to swap.
 
+	  zswap_incomp
+		Number of incompressible pages currently stored in zswap
+		without compression. These pages could not be compressed to
+		a size smaller than PAGE_SIZE, so they are stored as-is.
+
 	  thp_fault_alloc (npn)
 		Number of transparent hugepages which were allocated to satisfy
 		a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b6c82c8f73e1..d8ec05dd5d43 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -39,6 +39,7 @@ enum memcg_stat_item {
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
+	MEMCG_ZSWAP_INCOMP,
 	MEMCG_NR_STAT,
 };
 
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 007413a53b45..a6b6cf5f1aeb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] = {
 	MEMCG_KMEM,
 	MEMCG_ZSWAP_B,
 	MEMCG_ZSWAPPED,
+	MEMCG_ZSWAP_INCOMP,
 };
 
 #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
@@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[] = {
 #ifdef CONFIG_ZSWAP
 	{ "zswap",			MEMCG_ZSWAP_B			},
 	{ "zswapped",			MEMCG_ZSWAPPED			},
+	{ "zswap_incomp",		MEMCG_ZSWAP_INCOMP		},
 #endif
 	{ "file_mapped",		NR_FILE_MAPPED			},
 	{ "file_dirty",			NR_FILE_DIRTY			},
@@ -5458,6 +5460,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
 	memcg = obj_cgroup_memcg(objcg);
 	mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
 	mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
+	/* size == PAGE_SIZE means compression failed, page is incompressible */
+	if (size == PAGE_SIZE)
+		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, 1);
 	rcu_read_unlock();
 }
 
@@ -5481,6 +5486,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
 	memcg = obj_cgroup_memcg(objcg);
 	mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
 	mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
+	/* size == PAGE_SIZE means compression failed, page is incompressible */
+	if (size == PAGE_SIZE)
+		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, -1);
 	rcu_read_unlock();
 }
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] selftests/cgroup: add test for zswap incompressible pages
  2026-02-06  7:22 [PATCH v2 0/2] mm: zswap: add per-memcg stat for incompressible pages Jiayuan Chen
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
@ 2026-02-06  7:22 ` Jiayuan Chen
  2026-02-06 18:13   ` Shakeel Butt
                     ` (2 more replies)
  1 sibling, 3 replies; 11+ messages in thread
From: Jiayuan Chen @ 2026-02-06  7:22 UTC (permalink / raw)
  To: linux-mm
  Cc: Jiayuan Chen, Tejun Heo, Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Nhat Pham,
	Chengming Zhou, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-kselftest

From: Jiayuan Chen <jiayuan.chen@shopee.com>

Add test_zswap_incompressible() to verify that the zswap_incomp memcg
stat correctly tracks incompressible pages.

The test allocates memory filled with random data from /dev/urandom,
which cannot be effectively compressed by zswap. When this data is
swapped out to zswap, it should be stored as-is and tracked by the
zswap_incomp counter.

The test verifies that:
1. Pages are swapped out to zswap (zswpout increases)
2. Incompressible pages are tracked (zswap_incomp increases)

test:
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo Y > /sys/module/zswap/parameters/enabled

./test_zswap
 TAP version 13
 1..8
 ok 1 test_zswap_usage
 ok 2 test_swapin_nozswap
 ok 3 test_zswapin
 ok 4 test_zswap_writeback_enabled
 ok 5 test_zswap_writeback_disabled
 ok 6 test_no_kmem_bypass
 ok 7 test_no_invasive_cgroup_shrink
 ok 8 test_zswap_incompressible
 Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
 tools/testing/selftests/cgroup/test_zswap.c | 96 +++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 64ebc3f3f203..8cb8a131357d 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -5,6 +5,7 @@
 #include <unistd.h>
 #include <stdio.h>
 #include <signal.h>
+#include <fcntl.h>
 #include <sys/sysinfo.h>
 #include <string.h>
 #include <sys/wait.h>
@@ -574,6 +575,100 @@ static int test_no_kmem_bypass(const char *root)
 	return ret;
 }
 
+static int allocate_random_and_wait(const char *cgroup, void *arg)
+{
+	size_t size = (size_t)arg;
+	char *mem;
+	int fd;
+	ssize_t n;
+
+	mem = malloc(size);
+	if (!mem)
+		return -1;
+
+	/* Fill with random data from /dev/urandom - incompressible */
+	fd = open("/dev/urandom", O_RDONLY);
+	if (fd < 0) {
+		free(mem);
+		return -1;
+	}
+
+	for (size_t i = 0; i < size; ) {
+		n = read(fd, mem + i, size - i);
+		if (n <= 0)
+			break;
+		i += n;
+	}
+	close(fd);
+
+	/* Touch all pages to ensure they're faulted in */
+	for (size_t i = 0; i < size; i += 4096)
+		mem[i] = mem[i];
+
+	/* Keep memory alive for parent to reclaim and check stats */
+	pause();
+	free(mem);
+	return 0;
+}
+
+static long get_zswap_incomp(const char *cgroup)
+{
+	return cg_read_key_long(cgroup, "memory.stat", "zswap_incomp ");
+}
+
+/*
+ * Test that incompressible pages (random data) are tracked by zswap_incomp.
+ *
+ * Since incompressible pages stored in zswap are charged at full PAGE_SIZE
+ * (no memory savings), we cannot rely on memory.max pressure to push them
+ * into zswap. Instead, we allocate random data within memory.max, then use
+ * memory.reclaim to proactively push pages into zswap while checking the stat
+ * before the child exits (zswap_incomp is a gauge that decreases on free).
+ */
+static int test_zswap_incompressible(const char *root)
+{
+	int ret = KSFT_FAIL;
+	char *test_group;
+	long zswap_incomp;
+	pid_t child_pid;
+	int child_status;
+
+	test_group = cg_name(root, "zswap_incompressible_test");
+	if (!test_group)
+		goto out;
+	if (cg_create(test_group))
+		goto out;
+	if (cg_write(test_group, "memory.max", "32M"))
+		goto out;
+
+	child_pid = cg_run_nowait(test_group, allocate_random_and_wait,
+				  (void *)MB(4));
+	if (child_pid < 0)
+		goto out;
+
+	/* Wait for child to finish allocating */
+	usleep(500000);
+
+	/* Proactively reclaim to push random pages into zswap */
+	cg_write_numeric(test_group, "memory.reclaim", MB(4));
+
+	zswap_incomp = get_zswap_incomp(test_group);
+	if (zswap_incomp <= 0) {
+		ksft_print_msg("zswap_incomp not increased: %ld\n", zswap_incomp);
+		goto out_kill;
+	}
+
+	ret = KSFT_PASS;
+
+out_kill:
+	kill(child_pid, SIGTERM);
+	waitpid(child_pid, &child_status, 0);
+out:
+	cg_destroy(test_group);
+	free(test_group);
+	return ret;
+}
+
 #define T(x) { x, #x }
 struct zswap_test {
 	int (*fn)(const char *root);
@@ -586,6 +681,7 @@ struct zswap_test {
 	T(test_zswap_writeback_disabled),
 	T(test_no_kmem_bypass),
 	T(test_no_invasive_cgroup_shrink),
+	T(test_zswap_incompressible),
 };
 #undef T
 
-- 
2.43.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm: zswap: add per-memcg stat for incompressible pages
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
@ 2026-02-06 15:19   ` Yosry Ahmed
  2026-02-06 17:52   ` Shakeel Butt
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Yosry Ahmed @ 2026-02-06 15:19 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Jiayuan Chen, Nhat Pham, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Chengming Zhou, Shuah Khan, cgroups,
	linux-doc, linux-kernel, linux-kselftest

On Fri, Feb 06, 2026 at 03:22:15PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> The global zswap_stored_incompressible_pages counter was added in commit
> dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
> to track how many pages are stored in raw (uncompressed) form in zswap.
> However, in containerized environments, knowing which cgroup is
> contributing incompressible pages is essential for effective resource
> management [1].
> 
> Add a new memcg stat 'zswap_incomp' to track incompressible pages per
> cgroup. This helps administrators and orchestrators to:
> 
> 1. Identify workloads that produce incompressible data (e.g., encrypted
>    data, already-compressed media, random data) and may not benefit from
>    zswap.
> 
> 2. Make informed decisions about workload placement - moving
>    incompressible workloads to nodes with larger swap backing devices
>    rather than relying on zswap.
> 
> 3. Debug zswap efficiency issues at the cgroup level without needing to
>    correlate global stats with individual cgroups.
> 
> While the compression ratio can be estimated from existing stats
> (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
> "uniformly poor compression" and "a few completely incompressible pages
> mixed with highly compressible ones". The zswap_incomp stat provides
> direct visibility into the latter case.
> 
> [1]: https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
> Acked-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 5 +++++
>  include/linux/memcontrol.h              | 1 +
>  mm/memcontrol.c                         | 8 ++++++++
>  3 files changed, 14 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 7f5b59d95fce..78a329414615 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1737,6 +1737,11 @@ The following nested keys are defined.
>  	  zswpwb
>  		Number of pages written from zswap to swap.
>  
> +	  zswap_incomp
> +		Number of incompressible pages currently stored in zswap
> +		without compression. These pages could not be compressed to
> +		a size smaller than PAGE_SIZE, so they are stored as-is.
> +
>  	  thp_fault_alloc (npn)
>  		Number of transparent hugepages which were allocated to satisfy
>  		a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index b6c82c8f73e1..d8ec05dd5d43 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -39,6 +39,7 @@ enum memcg_stat_item {
>  	MEMCG_KMEM,
>  	MEMCG_ZSWAP_B,
>  	MEMCG_ZSWAPPED,
> +	MEMCG_ZSWAP_INCOMP,
>  	MEMCG_NR_STAT,
>  };
>  
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..a6b6cf5f1aeb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] = {
>  	MEMCG_KMEM,
>  	MEMCG_ZSWAP_B,
>  	MEMCG_ZSWAPPED,
> +	MEMCG_ZSWAP_INCOMP,
>  };
>  
>  #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> @@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[] = {
>  #ifdef CONFIG_ZSWAP
>  	{ "zswap",			MEMCG_ZSWAP_B			},
>  	{ "zswapped",			MEMCG_ZSWAPPED			},
> +	{ "zswap_incomp",		MEMCG_ZSWAP_INCOMP		},
>  #endif
>  	{ "file_mapped",		NR_FILE_MAPPED			},
>  	{ "file_dirty",			NR_FILE_DIRTY			},
> @@ -5458,6 +5460,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
>  	memcg = obj_cgroup_memcg(objcg);
>  	mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
>  	mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
> +	/* size == PAGE_SIZE means compression failed, page is incompressible */

I think the comment is not very useful, but maybe not worth sending a
new version. Otherwise LGTM:

Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>

> +	if (size == PAGE_SIZE)
> +		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, 1);
>  	rcu_read_unlock();
>  }
>  
> @@ -5481,6 +5486,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
>  	memcg = obj_cgroup_memcg(objcg);
>  	mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
>  	mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
> +	/* size == PAGE_SIZE means compression failed, page is incompressible */
> +	if (size == PAGE_SIZE)
> +		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, -1);
>  	rcu_read_unlock();
>  }
>  
> -- 
> 2.43.0
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm: zswap: add per-memcg stat for incompressible pages
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
  2026-02-06 15:19   ` Yosry Ahmed
@ 2026-02-06 17:52   ` Shakeel Butt
  2026-02-07  1:21   ` SeongJae Park
  2026-02-09  2:20   ` Chengming Zhou
  3 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2026-02-06 17:52 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Jiayuan Chen, Nhat Pham, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Yosry Ahmed, Chengming Zhou, Shuah Khan, cgroups,
	linux-doc, linux-kernel, linux-kselftest

On Fri, Feb 06, 2026 at 03:22:15PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> The global zswap_stored_incompressible_pages counter was added in commit
> dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
> to track how many pages are stored in raw (uncompressed) form in zswap.
> However, in containerized environments, knowing which cgroup is
> contributing incompressible pages is essential for effective resource
> management [1].
> 
> Add a new memcg stat 'zswap_incomp' to track incompressible pages per
> cgroup. This helps administrators and orchestrators to:
> 
> 1. Identify workloads that produce incompressible data (e.g., encrypted
>    data, already-compressed media, random data) and may not benefit from
>    zswap.
> 
> 2. Make informed decisions about workload placement - moving
>    incompressible workloads to nodes with larger swap backing devices
>    rather than relying on zswap.
> 
> 3. Debug zswap efficiency issues at the cgroup level without needing to
>    correlate global stats with individual cgroups.
> 
> While the compression ratio can be estimated from existing stats
> (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
> "uniformly poor compression" and "a few completely incompressible pages
> mixed with highly compressible ones". The zswap_incomp stat provides
> direct visibility into the latter case.
> 
> [1]: https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
> Acked-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] selftests/cgroup: add test for zswap incompressible pages
  2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
@ 2026-02-06 18:13   ` Shakeel Butt
  2026-02-06 22:50   ` Nhat Pham
  2026-02-07  1:35   ` SeongJae Park
  2 siblings, 0 replies; 11+ messages in thread
From: Shakeel Butt @ 2026-02-06 18:13 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Jiayuan Chen, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Muchun Song,
	Andrew Morton, Yosry Ahmed, Nhat Pham, Chengming Zhou,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-kselftest

On Fri, Feb 06, 2026 at 03:22:16PM +0800, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> Add test_zswap_incompressible() to verify that the zswap_incomp memcg
> stat correctly tracks incompressible pages.
> 
> The test allocates memory filled with random data from /dev/urandom,
> which cannot be effectively compressed by zswap. When this data is
> swapped out to zswap, it should be stored as-is and tracked by the
> zswap_incomp counter.
> 
> The test verifies that:
> 1. Pages are swapped out to zswap (zswpout increases)
> 2. Incompressible pages are tracked (zswap_incomp increases)
> 
> test:
> dd if=/dev/zero of=/swapfile bs=1M count=2048
> chmod 600 /swapfile
> mkswap /swapfile
> swapon /swapfile
> echo Y > /sys/module/zswap/parameters/enabled
> 
> ./test_zswap
>  TAP version 13
>  1..8
>  ok 1 test_zswap_usage
>  ok 2 test_swapin_nozswap
>  ok 3 test_zswapin
>  ok 4 test_zswap_writeback_enabled
>  ok 5 test_zswap_writeback_disabled
>  ok 6 test_no_kmem_bypass
>  ok 7 test_no_invasive_cgroup_shrink
>  ok 8 test_zswap_incompressible
>  Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
> 
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] selftests/cgroup: add test for zswap incompressible pages
  2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
  2026-02-06 18:13   ` Shakeel Butt
@ 2026-02-06 22:50   ` Nhat Pham
  2026-02-07  1:35   ` SeongJae Park
  2 siblings, 0 replies; 11+ messages in thread
From: Nhat Pham @ 2026-02-06 22:50 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: linux-mm, Jiayuan Chen, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Chengming Zhou,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-kselftest

On Thu, Feb 5, 2026 at 11:22 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> Add test_zswap_incompressible() to verify that the zswap_incomp memcg
> stat correctly tracks incompressible pages.
>
> The test allocates memory filled with random data from /dev/urandom,
> which cannot be effectively compressed by zswap. When this data is
> swapped out to zswap, it should be stored as-is and tracked by the
> zswap_incomp counter.
>
> The test verifies that:
> 1. Pages are swapped out to zswap (zswpout increases)
> 2. Incompressible pages are tracked (zswap_incomp increases)
>
> test:
> dd if=/dev/zero of=/swapfile bs=1M count=2048
> chmod 600 /swapfile
> mkswap /swapfile
> swapon /swapfile
> echo Y > /sys/module/zswap/parameters/enabled
>
> ./test_zswap
>  TAP version 13
>  1..8
>  ok 1 test_zswap_usage
>  ok 2 test_swapin_nozswap
>  ok 3 test_zswapin
>  ok 4 test_zswap_writeback_enabled
>  ok 5 test_zswap_writeback_disabled
>  ok 6 test_no_kmem_bypass
>  ok 7 test_no_invasive_cgroup_shrink
>  ok 8 test_zswap_incompressible
>  Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
>  tools/testing/selftests/cgroup/test_zswap.c | 96 +++++++++++++++++++++
>  1 file changed, 96 insertions(+)
>
> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
> index 64ebc3f3f203..8cb8a131357d 100644
> --- a/tools/testing/selftests/cgroup/test_zswap.c
> +++ b/tools/testing/selftests/cgroup/test_zswap.c
> @@ -5,6 +5,7 @@
>  #include <unistd.h>
>  #include <stdio.h>
>  #include <signal.h>
> +#include <fcntl.h>
>  #include <sys/sysinfo.h>
>  #include <string.h>
>  #include <sys/wait.h>
> @@ -574,6 +575,100 @@ static int test_no_kmem_bypass(const char *root)
>         return ret;
>  }
>
> +static int allocate_random_and_wait(const char *cgroup, void *arg)
> +{
> +       size_t size = (size_t)arg;
> +       char *mem;
> +       int fd;
> +       ssize_t n;
> +
> +       mem = malloc(size);
> +       if (!mem)
> +               return -1;
> +
> +       /* Fill with random data from /dev/urandom - incompressible */
> +       fd = open("/dev/urandom", O_RDONLY);
> +       if (fd < 0) {
> +               free(mem);
> +               return -1;
> +       }
> +
> +       for (size_t i = 0; i < size; ) {
> +               n = read(fd, mem + i, size - i);
> +               if (n <= 0)
> +                       break;
> +               i += n;
> +       }
> +       close(fd);
> +
> +       /* Touch all pages to ensure they're faulted in */
> +       for (size_t i = 0; i < size; i += 4096)
> +               mem[i] = mem[i];
> +
> +       /* Keep memory alive for parent to reclaim and check stats */
> +       pause();
> +       free(mem);
> +       return 0;
> +}
> +
> +static long get_zswap_incomp(const char *cgroup)
> +{
> +       return cg_read_key_long(cgroup, "memory.stat", "zswap_incomp ");
> +}
> +
> +/*
> + * Test that incompressible pages (random data) are tracked by zswap_incomp.
> + *
> + * Since incompressible pages stored in zswap are charged at full PAGE_SIZE
> + * (no memory savings), we cannot rely on memory.max pressure to push them
> + * into zswap. Instead, we allocate random data within memory.max, then use
> + * memory.reclaim to proactively push pages into zswap while checking the stat
> + * before the child exits (zswap_incomp is a gauge that decreases on free).

I wonder if we can do MADV_PAGEOUT? Anyway, I'm fine with memory.reclaim too.

> + */
> +static int test_zswap_incompressible(const char *root)
> +{
> +       int ret = KSFT_FAIL;
> +       char *test_group;
> +       long zswap_incomp;
> +       pid_t child_pid;
> +       int child_status;
> +
> +       test_group = cg_name(root, "zswap_incompressible_test");
> +       if (!test_group)
> +               goto out;
> +       if (cg_create(test_group))
> +               goto out;
> +       if (cg_write(test_group, "memory.max", "32M"))
> +               goto out;
> +
> +       child_pid = cg_run_nowait(test_group, allocate_random_and_wait,
> +                                 (void *)MB(4));
> +       if (child_pid < 0)
> +               goto out;
> +
> +       /* Wait for child to finish allocating */
> +       usleep(500000);
> +
> +       /* Proactively reclaim to push random pages into zswap */
> +       cg_write_numeric(test_group, "memory.reclaim", MB(4));
> +
> +       zswap_incomp = get_zswap_incomp(test_group);
> +       if (zswap_incomp <= 0) {
> +               ksft_print_msg("zswap_incomp not increased: %ld\n", zswap_incomp);
> +               goto out_kill;
> +       }
> +
> +       ret = KSFT_PASS;
> +
> +out_kill:
> +       kill(child_pid, SIGTERM);
> +       waitpid(child_pid, &child_status, 0);
> +out:
> +       cg_destroy(test_group);
> +       free(test_group);
> +       return ret;
> +}

LGTM :)

Acked-by: Nhat Pham <nphamcs@gmail.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm: zswap: add per-memcg stat for incompressible pages
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
  2026-02-06 15:19   ` Yosry Ahmed
  2026-02-06 17:52   ` Shakeel Butt
@ 2026-02-07  1:21   ` SeongJae Park
  2026-02-09  2:20   ` Chengming Zhou
  3 siblings, 0 replies; 11+ messages in thread
From: SeongJae Park @ 2026-02-07  1:21 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: SeongJae Park, linux-mm, Jiayuan Chen, Nhat Pham, Tejun Heo,
	Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Chengming Zhou,
	Shuah Khan, cgroups, linux-doc, linux-kernel, linux-kselftest

On Fri,  6 Feb 2026 15:22:15 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:

> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> The global zswap_stored_incompressible_pages counter was added in commit
> dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
> to track how many pages are stored in raw (uncompressed) form in zswap.
> However, in containerized environments, knowing which cgroup is
> contributing incompressible pages is essential for effective resource
> management [1].
> 
> Add a new memcg stat 'zswap_incomp' to track incompressible pages per
> cgroup. This helps administrators and orchestrators to:
> 
> 1. Identify workloads that produce incompressible data (e.g., encrypted
>    data, already-compressed media, random data) and may not benefit from
>    zswap.
> 
> 2. Make informed decisions about workload placement - moving
>    incompressible workloads to nodes with larger swap backing devices
>    rather than relying on zswap.
> 
> 3. Debug zswap efficiency issues at the cgroup level without needing to
>    correlate global stats with individual cgroups.
> 
> While the compression ratio can be estimated from existing stats
> (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
> "uniformly poor compression" and "a few completely incompressible pages
> mixed with highly compressible ones". The zswap_incomp stat provides
> direct visibility into the latter case.

Sounds like a useful new stat, thank you for making this!

> 
> [1]: https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
> Acked-by: Nhat Pham <nphamcs@gmail.com>

Nit.  It would look better to have one line before tags lines, or use 'Link:'
tag with '# [1]' like trailing comment for the link.

> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Reviewed-by: SeongJae Park <sj@kernel.org>


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] selftests/cgroup: add test for zswap incompressible pages
  2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
  2026-02-06 18:13   ` Shakeel Butt
  2026-02-06 22:50   ` Nhat Pham
@ 2026-02-07  1:35   ` SeongJae Park
  2026-02-08 18:49     ` JP Kobryn
  2 siblings, 1 reply; 11+ messages in thread
From: SeongJae Park @ 2026-02-07  1:35 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: SeongJae Park, linux-mm, Jiayuan Chen, Tejun Heo,
	Johannes Weiner, Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Nhat Pham,
	Chengming Zhou, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-kselftest

On Fri,  6 Feb 2026 15:22:16 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:

> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> Add test_zswap_incompressible() to verify that the zswap_incomp memcg
> stat correctly tracks incompressible pages.
> 
> The test allocates memory filled with random data from /dev/urandom,
> which cannot be effectively compressed by zswap. When this data is
> swapped out to zswap, it should be stored as-is and tracked by the
> zswap_incomp counter.
> 
> The test verifies that:
> 1. Pages are swapped out to zswap (zswpout increases)
> 2. Incompressible pages are tracked (zswap_incomp increases)
> 
> test:
> dd if=/dev/zero of=/swapfile bs=1M count=2048
> chmod 600 /swapfile
> mkswap /swapfile
> swapon /swapfile
> echo Y > /sys/module/zswap/parameters/enabled
> 
> ./test_zswap
>  TAP version 13
>  1..8
>  ok 1 test_zswap_usage
>  ok 2 test_swapin_nozswap
>  ok 3 test_zswapin
>  ok 4 test_zswap_writeback_enabled
>  ok 5 test_zswap_writeback_disabled
>  ok 6 test_no_kmem_bypass
>  ok 7 test_no_invasive_cgroup_shrink
>  ok 8 test_zswap_incompressible
>  Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0

Nice test.  This is also testing the functionality of zswap's incompressible
page handling.

> 
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Reviewed-by: SeongJae Park <sj@kernel.org>

> ---
>  tools/testing/selftests/cgroup/test_zswap.c | 96 +++++++++++++++++++++
>  1 file changed, 96 insertions(+)
> 
> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
> index 64ebc3f3f203..8cb8a131357d 100644
> --- a/tools/testing/selftests/cgroup/test_zswap.c
> +++ b/tools/testing/selftests/cgroup/test_zswap.c
> @@ -5,6 +5,7 @@
>  #include <unistd.h>
>  #include <stdio.h>
>  #include <signal.h>
> +#include <fcntl.h>
>  #include <sys/sysinfo.h>
>  #include <string.h>
>  #include <sys/wait.h>
> @@ -574,6 +575,100 @@ static int test_no_kmem_bypass(const char *root)
>  	return ret;
>  }
>  
> +static int allocate_random_and_wait(const char *cgroup, void *arg)
> +{
> +	size_t size = (size_t)arg;
> +	char *mem;
> +	int fd;
> +	ssize_t n;
> +
> +	mem = malloc(size);
> +	if (!mem)
> +		return -1;
> +
> +	/* Fill with random data from /dev/urandom - incompressible */
> +	fd = open("/dev/urandom", O_RDONLY);
> +	if (fd < 0) {
> +		free(mem);
> +		return -1;
> +	}
> +
> +	for (size_t i = 0; i < size; ) {
> +		n = read(fd, mem + i, size - i);
> +		if (n <= 0)
> +			break;
> +		i += n;
> +	}
> +	close(fd);
> +
> +	/* Touch all pages to ensure they're faulted in */
> +	for (size_t i = 0; i < size; i += 4096)

Nit.  I show test_zswapin() is using PAGE_SIZE.  Maybe the above code can also
use it?

> +		mem[i] = mem[i];
> +
> +	/* Keep memory alive for parent to reclaim and check stats */
> +	pause();
> +	free(mem);
> +	return 0;
> +}
> +
> +static long get_zswap_incomp(const char *cgroup)
> +{
> +	return cg_read_key_long(cgroup, "memory.stat", "zswap_incomp ");
> +}
> +
> +/*
> + * Test that incompressible pages (random data) are tracked by zswap_incomp.
> + *
> + * Since incompressible pages stored in zswap are charged at full PAGE_SIZE
> + * (no memory savings), we cannot rely on memory.max pressure to push them
> + * into zswap. Instead, we allocate random data within memory.max, then use
> + * memory.reclaim to proactively push pages into zswap while checking the stat
> + * before the child exits (zswap_incomp is a gauge that decreases on free).
> + */
> +static int test_zswap_incompressible(const char *root)
> +{
> +	int ret = KSFT_FAIL;
> +	char *test_group;
> +	long zswap_incomp;
> +	pid_t child_pid;
> +	int child_status;
> +
> +	test_group = cg_name(root, "zswap_incompressible_test");
> +	if (!test_group)
> +		goto out;
> +	if (cg_create(test_group))
> +		goto out;
> +	if (cg_write(test_group, "memory.max", "32M"))
> +		goto out;
> +
> +	child_pid = cg_run_nowait(test_group, allocate_random_and_wait,
> +				  (void *)MB(4));
> +	if (child_pid < 0)
> +		goto out;
> +
> +	/* Wait for child to finish allocating */
> +	usleep(500000);

We might be better to revisit here in future to avoid racy test results.  But
this seems good enough for now.

> +
> +	/* Proactively reclaim to push random pages into zswap */
> +	cg_write_numeric(test_group, "memory.reclaim", MB(4));
> +
> +	zswap_incomp = get_zswap_incomp(test_group);
> +	if (zswap_incomp <= 0) {
> +		ksft_print_msg("zswap_incomp not increased: %ld\n", zswap_incomp);
> +		goto out_kill;
> +	}
> +
> +	ret = KSFT_PASS;
> +
> +out_kill:
> +	kill(child_pid, SIGTERM);
> +	waitpid(child_pid, &child_status, 0);
> +out:
> +	cg_destroy(test_group);
> +	free(test_group);
> +	return ret;
> +}
> +
>  #define T(x) { x, #x }
>  struct zswap_test {
>  	int (*fn)(const char *root);
> @@ -586,6 +681,7 @@ struct zswap_test {
>  	T(test_zswap_writeback_disabled),
>  	T(test_no_kmem_bypass),
>  	T(test_no_invasive_cgroup_shrink),
> +	T(test_zswap_incompressible),
>  };
>  #undef T
>  
> -- 
> 2.43.0


Thanks,
SJ

[...]


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] selftests/cgroup: add test for zswap incompressible pages
  2026-02-07  1:35   ` SeongJae Park
@ 2026-02-08 18:49     ` JP Kobryn
  0 siblings, 0 replies; 11+ messages in thread
From: JP Kobryn @ 2026-02-08 18:49 UTC (permalink / raw)
  To: SeongJae Park, Jiayuan Chen
  Cc: linux-mm, Jiayuan Chen, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Nhat Pham,
	Chengming Zhou, Shuah Khan, cgroups, linux-doc, linux-kernel,
	linux-kselftest

On 2/6/26 5:35 PM, SeongJae Park wrote:
> On Fri,  6 Feb 2026 15:22:16 +0800 Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
> 
>> From: Jiayuan Chen <jiayuan.chen@shopee.com>
[...]
>> diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
>> index 64ebc3f3f203..8cb8a131357d 100644
>> --- a/tools/testing/selftests/cgroup/test_zswap.c
>> +++ b/tools/testing/selftests/cgroup/test_zswap.c
>> @@ -5,6 +5,7 @@
>>   #include <unistd.h>
>>   #include <stdio.h>
>>   #include <signal.h>
>> +#include <fcntl.h>
>>   #include <sys/sysinfo.h>
>>   #include <string.h>
>>   #include <sys/wait.h>
>> @@ -574,6 +575,100 @@ static int test_no_kmem_bypass(const char *root)
>>   	return ret;
>>   }
>>   
>> +static int allocate_random_and_wait(const char *cgroup, void *arg)
>> +{
>> +	size_t size = (size_t)arg;
>> +	char *mem;
>> +	int fd;
>> +	ssize_t n;
>> +
>> +	mem = malloc(size);
>> +	if (!mem)
>> +		return -1;
>> +
>> +	/* Fill with random data from /dev/urandom - incompressible */
>> +	fd = open("/dev/urandom", O_RDONLY);
>> +	if (fd < 0) {
>> +		free(mem);
>> +		return -1;
>> +	}
>> +
>> +	for (size_t i = 0; i < size; ) {
>> +		n = read(fd, mem + i, size - i);
>> +		if (n <= 0)
>> +			break;
>> +		i += n;
>> +	}
>> +	close(fd);
>> +
>> +	/* Touch all pages to ensure they're faulted in */
>> +	for (size_t i = 0; i < size; i += 4096)
> 
> Nit.  I show test_zswapin() is using PAGE_SIZE.  Maybe the above code can also
> use it?
> 
>> +		mem[i] = mem[i];
>> +
>> +	/* Keep memory alive for parent to reclaim and check stats */
>> +	pause();
>> +	free(mem);
>> +	return 0;
>> +}
>> +
>> +static long get_zswap_incomp(const char *cgroup)
>> +{
>> +	return cg_read_key_long(cgroup, "memory.stat", "zswap_incomp ");
>> +}
>> +
>> +/*
>> + * Test that incompressible pages (random data) are tracked by zswap_incomp.
>> + *
>> + * Since incompressible pages stored in zswap are charged at full PAGE_SIZE
>> + * (no memory savings), we cannot rely on memory.max pressure to push them
>> + * into zswap. Instead, we allocate random data within memory.max, then use
>> + * memory.reclaim to proactively push pages into zswap while checking the stat
>> + * before the child exits (zswap_incomp is a gauge that decreases on free).
>> + */
>> +static int test_zswap_incompressible(const char *root)
>> +{
>> +	int ret = KSFT_FAIL;
>> +	char *test_group;
>> +	long zswap_incomp;
>> +	pid_t child_pid;
>> +	int child_status;
>> +
>> +	test_group = cg_name(root, "zswap_incompressible_test");
>> +	if (!test_group)
>> +		goto out;
>> +	if (cg_create(test_group))
>> +		goto out;
>> +	if (cg_write(test_group, "memory.max", "32M"))
>> +		goto out;
>> +
>> +	child_pid = cg_run_nowait(test_group, allocate_random_and_wait,
>> +				  (void *)MB(4));
>> +	if (child_pid < 0)
>> +		goto out;
>> +
>> +	/* Wait for child to finish allocating */
>> +	usleep(500000);
> 
> We might be better to revisit here in future to avoid racy test results.  But
> this seems good enough for now.

How about using some form of synchronization like an eventfd? The parent
can wait here for the child to write the event and avoid the race with
the arbitrary sleep.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 1/2] mm: zswap: add per-memcg stat for incompressible pages
  2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
                     ` (2 preceding siblings ...)
  2026-02-07  1:21   ` SeongJae Park
@ 2026-02-09  2:20   ` Chengming Zhou
  3 siblings, 0 replies; 11+ messages in thread
From: Chengming Zhou @ 2026-02-09  2:20 UTC (permalink / raw)
  To: Jiayuan Chen, linux-mm
  Cc: Jiayuan Chen, Nhat Pham, Tejun Heo, Johannes Weiner,
	Michal Koutný,
	Jonathan Corbet, Michal Hocko, Roman Gushchin, Shakeel Butt,
	Muchun Song, Andrew Morton, Yosry Ahmed, Shuah Khan, cgroups,
	linux-doc, linux-kernel, linux-kselftest

On 2026/2/6 15:22, Jiayuan Chen wrote:
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
> 
> The global zswap_stored_incompressible_pages counter was added in commit
> dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
> to track how many pages are stored in raw (uncompressed) form in zswap.
> However, in containerized environments, knowing which cgroup is
> contributing incompressible pages is essential for effective resource
> management [1].
> 
> Add a new memcg stat 'zswap_incomp' to track incompressible pages per
> cgroup. This helps administrators and orchestrators to:
> 
> 1. Identify workloads that produce incompressible data (e.g., encrypted
>     data, already-compressed media, random data) and may not benefit from
>     zswap.
> 
> 2. Make informed decisions about workload placement - moving
>     incompressible workloads to nodes with larger swap backing devices
>     rather than relying on zswap.
> 
> 3. Debug zswap efficiency issues at the cgroup level without needing to
>     correlate global stats with individual cgroups.
> 
> While the compression ratio can be estimated from existing stats
> (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
> "uniformly poor compression" and "a few completely incompressible pages
> mixed with highly compressible ones". The zswap_incomp stat provides
> direct visibility into the latter case.
> 
> [1]: https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
> Acked-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>

Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev>

Thanks!

> ---
>   Documentation/admin-guide/cgroup-v2.rst | 5 +++++
>   include/linux/memcontrol.h              | 1 +
>   mm/memcontrol.c                         | 8 ++++++++
>   3 files changed, 14 insertions(+)
> 
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 7f5b59d95fce..78a329414615 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1737,6 +1737,11 @@ The following nested keys are defined.
>   	  zswpwb
>   		Number of pages written from zswap to swap.
>   
> +	  zswap_incomp
> +		Number of incompressible pages currently stored in zswap
> +		without compression. These pages could not be compressed to
> +		a size smaller than PAGE_SIZE, so they are stored as-is.
> +
>   	  thp_fault_alloc (npn)
>   		Number of transparent hugepages which were allocated to satisfy
>   		a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index b6c82c8f73e1..d8ec05dd5d43 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -39,6 +39,7 @@ enum memcg_stat_item {
>   	MEMCG_KMEM,
>   	MEMCG_ZSWAP_B,
>   	MEMCG_ZSWAPPED,
> +	MEMCG_ZSWAP_INCOMP,
>   	MEMCG_NR_STAT,
>   };
>   
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..a6b6cf5f1aeb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] = {
>   	MEMCG_KMEM,
>   	MEMCG_ZSWAP_B,
>   	MEMCG_ZSWAPPED,
> +	MEMCG_ZSWAP_INCOMP,
>   };
>   
>   #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> @@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[] = {
>   #ifdef CONFIG_ZSWAP
>   	{ "zswap",			MEMCG_ZSWAP_B			},
>   	{ "zswapped",			MEMCG_ZSWAPPED			},
> +	{ "zswap_incomp",		MEMCG_ZSWAP_INCOMP		},
>   #endif
>   	{ "file_mapped",		NR_FILE_MAPPED			},
>   	{ "file_dirty",			NR_FILE_DIRTY			},
> @@ -5458,6 +5460,9 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
>   	memcg = obj_cgroup_memcg(objcg);
>   	mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
>   	mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
> +	/* size == PAGE_SIZE means compression failed, page is incompressible */
> +	if (size == PAGE_SIZE)
> +		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, 1);
>   	rcu_read_unlock();
>   }
>   
> @@ -5481,6 +5486,9 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
>   	memcg = obj_cgroup_memcg(objcg);
>   	mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
>   	mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
> +	/* size == PAGE_SIZE means compression failed, page is incompressible */
> +	if (size == PAGE_SIZE)
> +		mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, -1);
>   	rcu_read_unlock();
>   }
>   


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-02-09  2:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-06  7:22 [PATCH v2 0/2] mm: zswap: add per-memcg stat for incompressible pages Jiayuan Chen
2026-02-06  7:22 ` [PATCH v2 1/2] " Jiayuan Chen
2026-02-06 15:19   ` Yosry Ahmed
2026-02-06 17:52   ` Shakeel Butt
2026-02-07  1:21   ` SeongJae Park
2026-02-09  2:20   ` Chengming Zhou
2026-02-06  7:22 ` [PATCH v2 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
2026-02-06 18:13   ` Shakeel Butt
2026-02-06 22:50   ` Nhat Pham
2026-02-07  1:35   ` SeongJae Park
2026-02-08 18:49     ` JP Kobryn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox