linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Jiayuan Chen <jiayuan.chen@linux.dev>
Cc: linux-mm@kvack.org, Jiayuan Chen <jiayuan.chen@shopee.com>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	 Chengming Zhou <chengming.zhou@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Nick Terrell <terrelln@fb.com>, David Sterba <dsterba@suse.com>,
	cgroups@vger.kernel.org,  linux-kernel@vger.kernel.org,
	Chris Li <chrisl@kernel.org>
Subject: Re: [PATCH v1] mm: zswap: add per-memcg stat for incompressible pages
Date: Thu, 5 Feb 2026 09:31:54 -0800	[thread overview]
Message-ID: <CAKEwX=PMQ1aYWr36XKG7oup3diBXb5vjV=fGZeTmYcx+ebmMtQ@mail.gmail.com> (raw)
In-Reply-To: <20260205053013.25134-1-jiayuan.chen@linux.dev>

On Wed, Feb 4, 2026 at 9:31 PM Jiayuan Chen <jiayuan.chen@linux.dev> wrote:
>
> From: Jiayuan Chen <jiayuan.chen@shopee.com>
>
> The global zswap_stored_incompressible_pages counter was added in commit
> dca4437a5861 ("mm/zswap: store <PAGE_SIZE compression failed page as-is")
> to track how many pages are stored in raw (uncompressed) form in zswap.
> However, in containerized environments, knowing which cgroup is
> contributing incompressible pages is essential for effective resource
> management.
>
> Add a new memcg stat 'zswpraw' to track incompressible pages per cgroup.
> This helps administrators and orchestrators to:
>
> 1. Identify workloads that produce incompressible data (e.g., encrypted
>    data, already-compressed media, random data) and may not benefit from
>    zswap.
>
> 2. Make informed decisions about workload placement - moving
>    incompressible workloads to nodes with larger swap backing devices
>    rather than relying on zswap.
>
> 3. Debug zswap efficiency issues at the cgroup level without needing to
>    correlate global stats with individual cgroups.
>
> While the compression ratio can be estimated from existing stats
> (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between
> "uniformly poor compression" and "a few completely incompressible pages
> mixed with highly compressible ones". The zswpraw stat provides direct
> visibility into the latter case.

I personally agree. This is especially useful for multi-tenants
setups, where different workloads can have different compressibility,
which can muddy the waters, and might prefer different swapping
treatment (disk swapping, zswapping, zswap + disk swap through zswap
shrinker). It might also give us data to extend zswap (zswap
compressibility-based rejection, or different compression levels).

Naming is a bit off though, but I'm not a native English speaker :)
I think Chris Li pointed the out the necessity of per-memcg counters too:

https://lore.kernel.org/linux-mm/CAF8kJuONDFj4NAksaR4j_WyDbNwNGYLmTe-o76rqU17La=nkOw@mail.gmail.com/

Can you add this to the patch changelog in later versions? :)

+ Chris Li
>
> Changes
> -------
>
> 1. Add zswap_is_raw() helper (include/linux/zswap.h)
>    - Abstract the PAGE_SIZE comparison logic for identifying raw entries
>    - Keep the incompressible check in one place for maintainability
>
> 2. Add MEMCG_ZSWAP_RAW stat definition (include/linux/memcontrol.h,
>    mm/memcontrol.c)
>    - Add MEMCG_ZSWAP_RAW to memcg_stat_item enum
>    - Register in memcg_stat_items[] and memory_stats[] arrays
>    - Export as "zswpraw" in memory.stat
>
> 3. Update statistics accounting (mm/memcontrol.c, mm/zswap.c)
>    - Track MEMCG_ZSWAP_RAW in obj_cgroup_charge/uncharge_zswap()
>    - Use zswap_is_raw() helper in zswap.c for consistency
>
> Test
> ----
>
> I wrote a simple test program[1] that allocates memory and compresses it
> with zstd, so kernel zswap cannot compress further.
>
>   $ cgcreate -g memory:test
>   $ cgexec -g memory:test ./test_zswpraw &
>   $ cat /sys/fs/cgroup/test/memory.stat | grep zswp
>   zswpraw 0
>   zswpin 0
>   zswpout 0
>   zswpwb 0
>
>   $ echo "100M" > /sys/fs/cgroup/test/memory.reclaim
>   $ cat /sys/fs/cgroup/test/memory.stat | grep zswp
>   zswpraw 104800256
>   zswpin 0
>   zswpout 51222
>   zswpwb 0
>
>   $ pkill test_zswpraw
>   $ cat /sys/fs/cgroup/test/memory.stat | grep zswp
>   zswpraw 0
>   zswpin 1
>   zswpout 51222
>   zswpwb 0
>
> [1] https://gist.github.com/mrpre/00432c6154250326994fbeaf62e0e6f1

Would be nice if some versions of this can be turned into a selftest :)

Instead of reading zstd data, you can read from /dev/urandom. I found
those to be incompressible, usually.

Feel free to send this as a follow-up patch, but would love to see it :)

>
> Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> ---
>  include/linux/memcontrol.h | 1 +
>  include/linux/zswap.h      | 9 +++++++++
>  mm/memcontrol.c            | 6 ++++++
>  mm/zswap.c                 | 6 +++---
>  4 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index b6c82c8f73e1..83d1328f81d1 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -39,6 +39,7 @@ enum memcg_stat_item {
>         MEMCG_KMEM,
>         MEMCG_ZSWAP_B,
>         MEMCG_ZSWAPPED,
> +       MEMCG_ZSWAP_RAW,
>         MEMCG_NR_STAT,
>  };
>
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index 30c193a1207e..94f84b154b71 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -7,6 +7,15 @@
>
>  struct lruvec;
>
> +/*
> + * Check if a zswap entry is stored in raw (uncompressed) form.
> + * This happens when compression doesn't reduce the size.
> + */
> +static inline bool zswap_is_raw(size_t size)
> +{
> +       return size == PAGE_SIZE;
> +}
> +
>  extern atomic_long_t zswap_stored_pages;
>
>  #ifdef CONFIG_ZSWAP
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 007413a53b45..32fb801530a3 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] = {
>         MEMCG_KMEM,
>         MEMCG_ZSWAP_B,
>         MEMCG_ZSWAPPED,
> +       MEMCG_ZSWAP_RAW,

not sure how I feel about the naming, but I don't have a recommendation :)

>  };
>
>  #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items)
> @@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[] = {
>  #ifdef CONFIG_ZSWAP
>         { "zswap",                      MEMCG_ZSWAP_B                   },
>         { "zswapped",                   MEMCG_ZSWAPPED                  },
> +       { "zswpraw",                    MEMCG_ZSWAP_RAW                 },
>  #endif
>         { "file_mapped",                NR_FILE_MAPPED                  },
>         { "file_dirty",                 NR_FILE_DIRTY                   },
> @@ -5458,6 +5460,8 @@ void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size)
>         memcg = obj_cgroup_memcg(objcg);
>         mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
>         mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
> +       if (zswap_is_raw(size))
> +               mod_memcg_state(memcg, MEMCG_ZSWAP_RAW, 1);
>         rcu_read_unlock();
>  }
>
> @@ -5481,6 +5485,8 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
>         memcg = obj_cgroup_memcg(objcg);
>         mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size);
>         mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1);
> +       if (zswap_is_raw(size))
> +               mod_memcg_state(memcg, MEMCG_ZSWAP_RAW, -1);
>         rcu_read_unlock();
>  }
>
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 3d2d59ac3f9c..54ab4d126f64 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -723,7 +723,7 @@ static void zswap_entry_free(struct zswap_entry *entry)
>                 obj_cgroup_uncharge_zswap(entry->objcg, entry->length);
>                 obj_cgroup_put(entry->objcg);
>         }
> -       if (entry->length == PAGE_SIZE)
> +       if (zswap_is_raw(entry->length))
>                 atomic_long_dec(&zswap_stored_incompressible_pages);
>         zswap_entry_cache_free(entry);
>         atomic_long_dec(&zswap_stored_pages);
> @@ -941,7 +941,7 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio)
>         zs_obj_read_sg_begin(pool->zs_pool, entry->handle, input, entry->length);
>
>         /* zswap entries of length PAGE_SIZE are not compressed. */
> -       if (entry->length == PAGE_SIZE) {
> +       if (zswap_is_raw(entry->length)) {
>                 WARN_ON_ONCE(input->length != PAGE_SIZE);
>                 memcpy_from_sglist(kmap_local_folio(folio, 0), input, 0, PAGE_SIZE);
>                 dlen = PAGE_SIZE;
> @@ -1448,7 +1448,7 @@ static bool zswap_store_page(struct page *page,
>                 obj_cgroup_charge_zswap(objcg, entry->length);
>         }
>         atomic_long_inc(&zswap_stored_pages);
> -       if (entry->length == PAGE_SIZE)
> +       if (zswap_is_raw(entry->length))
>                 atomic_long_inc(&zswap_stored_incompressible_pages);
>
>         /*
> --
> 2.43.0
>

Those nits aside, LGTM.
Acked-by: Nhat Pham <nphamcs@gmail.com>

I'll leave the naming suggestion to Yosry and Johannes ;)


  reply	other threads:[~2026-02-05 17:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-05  5:30 Jiayuan Chen
2026-02-05 17:31 ` Nhat Pham [this message]
2026-02-05 17:45   ` Nhat Pham
2026-02-06  2:04     ` Jiayuan Chen
2026-02-05 21:33 ` Shakeel Butt
2026-02-06  2:04   ` Jiayuan Chen
2026-02-06  0:39 ` Yosry Ahmed
2026-02-06  2:05   ` Jiayuan Chen
2026-02-06  2:21 ` SeongJae Park
2026-02-06  2:33   ` Yosry Ahmed
2026-02-06  2:53     ` Jiayuan Chen
2026-02-06  4:12       ` Yosry Ahmed
2026-02-06  2:47   ` Jiayuan Chen
2026-02-06  3:15     ` SeongJae Park
2026-02-06  4:11     ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=PMQ1aYWr36XKG7oup3diBXb5vjV=fGZeTmYcx+ebmMtQ@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=dsterba@suse.com \
    --cc=hannes@cmpxchg.org \
    --cc=jiayuan.chen@linux.dev \
    --cc=jiayuan.chen@shopee.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=terrelln@fb.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox