* [PATCH mm-new v3 1/2] mm: zswap: add per-memcg stat for incompressible pages
2026-02-13 7:18 [PATCH mm-new v3 0/2] mm: zswap: add per-memcg stat for incompressible pages Jiayuan Chen
@ 2026-02-13 7:18 ` Jiayuan Chen
2026-02-13 7:18 ` [PATCH mm-new v3 2/2] selftests/cgroup: add test for zswap " Jiayuan Chen
1 sibling, 0 replies; 3+ messages in thread
From: Jiayuan Chen @ 2026-02-13 7:18 UTC (permalink / raw)
To: linux-mm
Cc: jiayuan.chen, Jiayuan Chen, Nhat Pham, Shakeel Butt, Yosry Ahmed,
SeongJae Park, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Michal Hocko, Roman Gushchin, Muchun Song,
Andrew Morton, Chengming Zhou, Shuah Khan, cgroups, linux-doc,
linux-kernel, linux-kselftest
From: Jiayuan Chen <jiayuan.chen@shopee.com>
The global zswap_stored_incompressible_pages counter was added in commit
dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
to track how many pages are stored in raw (uncompressed) form in zswap.
However, in containerized environments, knowing which cgroup is
contributing incompressible pages is essential for effective resource
management [1].
Add a new memcg stat 'zswap_incomp' to track incompressible pages per
cgroup. This helps administrators and orchestrators to:
1. Identify workloads that produce incompressible data (e.g., encrypted
data, already-compressed media, random data) and may not benefit from
zswap.
2. Make informed decisions about workload placement - moving
incompressible workloads to nodes with larger swap backing devices
rather than relying on zswap.
3. Debug zswap efficiency issues at the cgroup level without needing to
correlate global stats with individual cgroups.
While the compression ratio can be estimated from existing stats
(zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
"uniformly poor compression" and "a few completely incompressible pages
mixed with highly compressible ones". The zswap_incomp stat provides
direct visibility into the latter case.
[1] https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/
Acked-by: Nhat Pham <nphamcs@gmail.com>
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Yosry Ahmed <yosry.ahmed@linux.dev>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
Documentation/admin-guide/cgroup-v2.rst | 5 +++++
include/linux/memcontrol.h | 1 +
mm/memcontrol.c | 6 ++++++
3 files changed, 12 insertions(+)
diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 7f5b59d95fce..78a329414615 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1737,6 +1737,11 @@ The following nested keys are defined.
zswpwb
Number of pages written from zswap to swap.
+ zswap_incomp
+ Number of incompressible pages currently stored in zswap
+ without compression. These pages could not be compressed to
+ a size smaller than PAGE_SIZE, so they are stored as-is.
+
thp_fault_alloc (npn)
Number of transparent hugepages which were allocated to satisfy
a page fault. This counter is not present when CONFIG_TRANSPARENT_HUGEPAGE
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index b6c82c8f73e1..d8ec05dd5d43 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -39,6 +39,7 @@ enum memcg_stat_item {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+ MEMCG_ZSWAP_INCOMP,
MEMCG_NR_STAT,
};
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 007413a53b45..4c5672169753 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] = {
MEMCG_KMEM,
MEMCG_ZSWAP_B,
MEMCG_ZSWAPPED,
+ MEMCG_ZSWAP_INCOMP,
};
#define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
@@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[] = {
#ifdef CONFIG_ZSWAP
{ "zswap", MEMCG_ZSWAP_B },
{ "zswapped", MEMCG_ZSWAPPED },
+ { "zswap_incomp", MEMCG_ZSWAP_INCOMP },
#endif
{ "file_mapped", NR_FILE_MAPPED },
{ "file_dirty", NR_FILE_DIRTY },
@@ -5458,6 +5460,8 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
memcg = obj_cgroup_memcg(objcg);
mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
+ if (size == PAGE_SIZE)
+ mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, 1);
rcu_read_unlock();
}
@@ -5481,6 +5485,8 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
memcg = obj_cgroup_memcg(objcg);
mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
+ if (size == PAGE_SIZE)
+ mod_memcg_state(memcg, MEMCG_ZSWAP_INCOMP, -1);
rcu_read_unlock();
}
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread* [PATCH mm-new v3 2/2] selftests/cgroup: add test for zswap incompressible pages
2026-02-13 7:18 [PATCH mm-new v3 0/2] mm: zswap: add per-memcg stat for incompressible pages Jiayuan Chen
2026-02-13 7:18 ` [PATCH mm-new v3 1/2] " Jiayuan Chen
@ 2026-02-13 7:18 ` Jiayuan Chen
1 sibling, 0 replies; 3+ messages in thread
From: Jiayuan Chen @ 2026-02-13 7:18 UTC (permalink / raw)
To: linux-mm
Cc: jiayuan.chen, Jiayuan Chen, Shakeel Butt, Nhat Pham,
SeongJae Park, Tejun Heo, Johannes Weiner, Michal Koutný,
Jonathan Corbet, Michal Hocko, Roman Gushchin, Muchun Song,
Andrew Morton, Yosry Ahmed, Chengming Zhou, Shuah Khan, cgroups,
linux-doc, linux-kernel, linux-kselftest
From: Jiayuan Chen <jiayuan.chen@shopee.com>
Add test_zswap_incompressible() to verify that the zswap_incomp memcg
stat correctly tracks incompressible pages.
The test allocates memory filled with random data from /dev/urandom,
which cannot be effectively compressed by zswap. When this data is
swapped out to zswap, it should be stored as-is and tracked by the
zswap_incomp counter.
The test verifies that:
1. Pages are swapped out to zswap (zswpout increases)
2. Incompressible pages are tracked (zswap_incomp increases)
test:
dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo Y > /sys/module/zswap/parameters/enabled
./test_zswap
TAP version 13
1..8
ok 1 test_zswap_usage
ok 2 test_swapin_nozswap
ok 3 test_zswapin
ok 4 test_zswap_writeback_enabled
ok 5 test_zswap_writeback_disabled
ok 6 test_no_kmem_bypass
ok 7 test_no_invasive_cgroup_shrink
ok 8 test_zswap_incompressible
Totals: pass:8 fail:0 xfail:0 xpass:0 skip:0 error:0
Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Acked-by: Nhat Pham <nphamcs@gmail.com>
Reviewed-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
tools/testing/selftests/cgroup/test_zswap.c | 136 ++++++++++++++++++++
1 file changed, 136 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_zswap.c b/tools/testing/selftests/cgroup/test_zswap.c
index 64ebc3f3f203..a7bdcdd09d62 100644
--- a/tools/testing/selftests/cgroup/test_zswap.c
+++ b/tools/testing/selftests/cgroup/test_zswap.c
@@ -5,6 +5,8 @@
#include <unistd.h>
#include <stdio.h>
#include <signal.h>
+#include <errno.h>
+#include <fcntl.h>
#include <sys/sysinfo.h>
#include <string.h>
#include <sys/wait.h>
@@ -574,6 +576,139 @@ static int test_no_kmem_bypass(const char *root)
return ret;
}
+struct incomp_child_args {
+ size_t size;
+ int pipefd[2];
+ int madvise_ret;
+ int madvise_errno;
+};
+
+static int allocate_random_and_wait(const char *cgroup, void *arg)
+{
+ struct incomp_child_args *values = arg;
+ size_t size = values->size;
+ char *mem;
+ int fd;
+ ssize_t n;
+
+ close(values->pipefd[0]);
+
+ mem = mmap(NULL, size, PROT_READ | PROT_WRITE,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ if (mem == MAP_FAILED)
+ return -1;
+
+ /* Fill with random data from /dev/urandom - incompressible */
+ fd = open("/dev/urandom", O_RDONLY);
+ if (fd < 0) {
+ munmap(mem, size);
+ return -1;
+ }
+
+ for (size_t i = 0; i < size; ) {
+ n = read(fd, mem + i, size - i);
+ if (n <= 0)
+ break;
+ i += n;
+ }
+ close(fd);
+
+ /* Touch all pages to ensure they're faulted in */
+ for (size_t i = 0; i < size; i += PAGE_SIZE)
+ mem[i] = mem[i];
+
+ /* Use MADV_PAGEOUT to push pages into zswap */
+ values->madvise_ret = madvise(mem, size, MADV_PAGEOUT);
+ values->madvise_errno = errno;
+
+ /* Notify parent that allocation and pageout are done */
+ write(values->pipefd[1], "x", 1);
+ close(values->pipefd[1]);
+
+ /* Keep memory alive for parent to check stats */
+ pause();
+ munmap(mem, size);
+ return 0;
+}
+
+static long get_zswap_incomp(const char *cgroup)
+{
+ return cg_read_key_long(cgroup, "memory.stat", "zswap_incomp ");
+}
+
+/*
+ * Test that incompressible pages (random data) are tracked by zswap_incomp.
+ *
+ * The child process allocates random data within memory.max, then uses
+ * MADV_PAGEOUT to push pages into zswap. The parent waits on a pipe for
+ * the child to finish, then checks the zswap_incomp stat before the child
+ * exits (zswap_incomp is a gauge that decreases on free).
+ */
+static int test_zswap_incompressible(const char *root)
+{
+ int ret = KSFT_FAIL;
+ struct incomp_child_args *values;
+ char *test_group;
+ long zswap_incomp;
+ pid_t child_pid;
+ int child_status;
+ char buf;
+
+ values = mmap(0, sizeof(struct incomp_child_args), PROT_READ |
+ PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+ if (values == MAP_FAILED)
+ return KSFT_FAIL;
+
+ if (pipe(values->pipefd)) {
+ munmap(values, sizeof(struct incomp_child_args));
+ return KSFT_FAIL;
+ }
+
+ test_group = cg_name(root, "zswap_incompressible_test");
+ if (!test_group)
+ goto out;
+ if (cg_create(test_group))
+ goto out;
+ if (cg_write(test_group, "memory.max", "32M"))
+ goto out;
+
+ values->size = MB(4);
+ child_pid = cg_run_nowait(test_group, allocate_random_and_wait, values);
+ if (child_pid < 0)
+ goto out;
+
+ close(values->pipefd[1]);
+
+ /* Wait for child to finish allocating and pageout */
+ read(values->pipefd[0], &buf, 1);
+ close(values->pipefd[0]);
+
+ zswap_incomp = get_zswap_incomp(test_group);
+ if (zswap_incomp <= 0) {
+ long zswpout = get_zswpout(test_group);
+ long zswapped = cg_read_key_long(test_group, "memory.stat", "zswapped ");
+ long zswap_b = cg_read_key_long(test_group, "memory.stat", "zswap ");
+
+ ksft_print_msg("zswap_incomp not increased: %ld\n", zswap_incomp);
+ ksft_print_msg("debug: zswpout=%ld zswapped=%ld zswap_b=%ld\n",
+ zswpout, zswapped, zswap_b);
+ ksft_print_msg("debug: madvise ret=%d errno=%d\n",
+ values->madvise_ret, values->madvise_errno);
+ goto out_kill;
+ }
+
+ ret = KSFT_PASS;
+
+out_kill:
+ kill(child_pid, SIGTERM);
+ waitpid(child_pid, &child_status, 0);
+out:
+ cg_destroy(test_group);
+ free(test_group);
+ munmap(values, sizeof(struct incomp_child_args));
+ return ret;
+}
+
#define T(x) { x, #x }
struct zswap_test {
int (*fn)(const char *root);
@@ -586,6 +721,7 @@ struct zswap_test {
T(test_zswap_writeback_disabled),
T(test_no_kmem_bypass),
T(test_no_invasive_cgroup_shrink),
+ T(test_zswap_incompressible),
};
#undef T
--
2.43.0
^ permalink raw reply [flat|nested] 3+ messages in thread