From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D3B4EE8B364 for ; Fri, 6 Feb 2026 02:48:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2F4476B0092; Thu, 5 Feb 2026 21:48:06 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BAD96B0093; Thu, 5 Feb 2026 21:48:06 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DEE06B0096; Thu, 5 Feb 2026 21:48:06 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 0D7C16B0092 for ; Thu, 5 Feb 2026 21:48:06 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 936B31B2784 for ; Fri, 6 Feb 2026 02:48:05 +0000 (UTC) X-FDA: 84412497330.01.6B58C98 Received: from out-171.mta1.migadu.com (out-171.mta1.migadu.com [95.215.58.171]) by imf08.hostedemail.com (Postfix) with ESMTP id B80DD160005 for ; Fri, 6 Feb 2026 02:48:03 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IMPBFxSu; spf=pass (imf08.hostedemail.com: domain of jiayuan.chen@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=jiayuan.chen@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1770346084; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z8cUYDug+KA/z/+qG82tQKkyvE+6vqzDr2LKkma6cU4=; b=p5FvLXZDwH1McRlSLZWqvzcgbF12wg5xoqXQCcdEuv6NTxoK4SPkUJ/At4ozg0FeiIkRt1 V8GQF0OG6kSjPgmByzQrxA9VvNevExId52wCOPx7vPMu2Sv7wIUq7t2Zy2V3t/3q8/HDSa AD6ivTbfiACs197v/9m7oaxYTvNLD18= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=IMPBFxSu; spf=pass (imf08.hostedemail.com: domain of jiayuan.chen@linux.dev designates 95.215.58.171 as permitted sender) smtp.mailfrom=jiayuan.chen@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1770346084; a=rsa-sha256; cv=none; b=LXhWInX+cNy79TcyL6GTjKID/OJ339NKDr1a8MLbePQXFri8EqZwfzefAwBDHGoVJSuv2e XKUlrDXnMze+5c7TF6JfFygjxP3sH39t9cIy8Iu1Gv5on7Z9nprhKHDQ6bT/2dAd/pl8Fo 5DvbTFRPnS/X75zC9+KL1GU/gEqGg/Y= MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1770346081; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z8cUYDug+KA/z/+qG82tQKkyvE+6vqzDr2LKkma6cU4=; b=IMPBFxSu+8uQOW4JZAmYU378vjSSckXrZngrEjjc1dar8KyBJyMASGc9I43eGOFSz5SJo8 4DyVZaP8DtARAOTh71nbxYIhoXIw0i1dceOTE4MvOA5bLdmlgL14cCMN1GBcIwmqd7/p0E drJzC+os8uGwbIuik+nmamkyTkB4c/s= Date: Fri, 06 Feb 2026 02:47:58 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "Jiayuan Chen" Message-ID: TLS-Required: No Subject: Re: [PATCH v1] mm: zswap: add per-memcg stat for incompressible pages To: "SeongJae Park" Cc: "SeongJae Park" , linux-mm@kvack.org, "Jiayuan Chen" , "Johannes Weiner" , "Michal Hocko" , "Roman Gushchin" , "Shakeel Butt" , "Muchun Song" , "Yosry Ahmed" , "Nhat Pham" , "Chengming Zhou" , "Andrew Morton" , "Nick Terrell" , "David Sterba" , cgroups@vger.kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20260206022152.67992-1-sj@kernel.org> References: <20260206022152.67992-1-sj@kernel.org> X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam12 X-Stat-Signature: ujr5f1oa1x3fubs7jo63akrgqotwenw6 X-Rspamd-Queue-Id: B80DD160005 X-Rspam-User: X-HE-Tag: 1770346083-35791 X-HE-Meta: U2FsdGVkX18ekJj7Udyv7qjZ3cVFuLynCIDtJzacMoaTEt1pfV4Rb+QlGwgyGkhZIOiNR/ya3dEl9lUCeb0rOCfsjvna8yRqfdJ3LwIBLAYp6s9LVqoE2tcHyLUVK1je5FMIoQLm/v8ReW4IPF6sNAQ928HR6o0uGFEeesZ2q1s8ljNSNKkSnIvLDhXa1CATintaldcfbqY6bvmupixi+sVBJQLMz3maOZygZvJ77acQ2IZooA7/3ylqIGbEce6mSZXXP/86giAxXcPkVpaFOS9rS/iZMdfsARNztQNU39f6uNTHJShaDVAC73k1m46akTM6/zvUj+xQYt/tLiAOCslhxQ1Vjq3U4BAOS0BE20ZiVYh5N6O7jLzA1XBa5n4NRNtIDFZj37l7eHHsI+OfJlDB/QAeU7AjveYbhjzqYxaSlOyQwsRbt3t8nsW0/b/YVsUYWFkjiGvAE8L5U6Qez0sICuTEfoU4OSPrxP/8JL5qMAF/YcjwFR71DfpBot9s2aUp1lkpQDpr2FCSN+twNxfDRQKwruQxKBj3tbBNWEnHkhOQchRNW/qmwKlcy9FzBtvWZvCQ6aIFdAon79TkPi3pnnbUBc42t0kXqtvny0Svd0PP5rAK3KWRhqGAX9HXa/eUr0iavBSNe4G3PlIk46Wk5emDZnm2I0SwDgWONojQcrvG5ceDGr2KbH3CheUDvfZiKhX3jeYv6c36wclvF1S1vbZ6rwqQ22xXgZwr/T/4CTq946VU7GyiTV27YGc4GGMjtY4Cj3y6CcjEI6vOGo1RXzm7c3ssdyV2cMrIGVIvzoBKOLr9ZuTkfYBi+gD9Eppq06K3vhSG5XlRBq/F9ubQvnqlMQ/LvrovRfDr7fmx4WS2CZPnlsiLppnNTLIEgFZs8Dcyb8lEHZbwQ0v2OL18I2OpCnjl0pnRpgN8xCQUJBQmqBzFU/WPSq5rW7uzcUXhXmHdSN5NFOSqy9M Hf8Qs8HO qEcGwxSliDHZHRQbkJNodZVuMKGuL7pkIei5V/kZMyyigp5InJSY4U76e35hY9737IVPdjXooUSGK2cLx9SXME7E0vNR0bmWmIwROrUbBy2v13KF4zJJovagIBiOj/bnO/A854vTJcJyaT8v+VCyeb/Ue13czrFVBLmTmJDMPE36y+QQTS+N+dJh99XhDp2jrulHS/hGoqcF6GhvrfwF8cipCWzctbyBZuoSRmOthM2sx1m3+mWGANq0khlB+KgT6LoL4RahhCJ58GVxwGtiZv/rzU8uH58crMUQaYdbYw93lWDbrJwQmg4NKhe9ZiREf0X8B X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: February 6, 2026 at 10:21, "SeongJae Park" wrote: >=20 >=20On Thu, 5 Feb 2026 13:30:12 +0800 Jiayuan Chen wrote: >=20 >=20>=20 >=20> From: Jiayuan Chen > >=20=20 >=20> The global zswap_stored_incompressible_pages counter was added in = commit > > dca4437a5861 ("mm/zswap: store > to track how many pages are stored in raw (uncompressed) form in zsw= ap. > > However, in containerized environments, knowing which cgroup is > > contributing incompressible pages is essential for effective resourc= e > > management. > >=20=20 >=20> Add a new memcg stat 'zswpraw' to track incompressible pages per c= group. > > This helps administrators and orchestrators to: > >=20=20 >=20> 1. Identify workloads that produce incompressible data (e.g., encr= ypted > > data, already-compressed media, random data) and may not benefit fro= m > > zswap. > >=20=20 >=20> 2. Make informed decisions about workload placement - moving > > incompressible workloads to nodes with larger swap backing devices > > rather than relying on zswap. > >=20=20 >=20> 3. Debug zswap efficiency issues at the cgroup level without needi= ng to > > correlate global stats with individual cgroups. > >=20=20 >=20> While the compression ratio can be estimated from existing stats > > (zswap / zswapped * PAGE_SIZE), this doesn't distinguish between > > "uniformly poor compression" and "a few completely incompressible pa= ges > > mixed with highly compressible ones". The zswpraw stat provides dire= ct > > visibility into the latter case. > >=20=20 >=20> Changes > > ------- > >=20=20 >=20> 1. Add zswap_is_raw() helper (include/linux/zswap.h) > > - Abstract the PAGE_SIZE comparison logic for identifying raw entrie= s > > - Keep the incompressible check in one place for maintainability > >=20=20 >=20> 2. Add MEMCG_ZSWAP_RAW stat definition (include/linux/memcontrol.h= , > > mm/memcontrol.c) > > - Add MEMCG_ZSWAP_RAW to memcg_stat_item enum > > - Register in memcg_stat_items[] and memory_stats[] arrays > > - Export as "zswpraw" in memory.stat > >=20=20 >=20> 3. Update statistics accounting (mm/memcontrol.c, mm/zswap.c) > > - Track MEMCG_ZSWAP_RAW in obj_cgroup_charge/uncharge_zswap() > > - Use zswap_is_raw() helper in zswap.c for consistency > >=20=20 >=20> Test > > ---- > >=20=20 >=20> I wrote a simple test program[1] that allocates memory and compres= ses it > > with zstd, so kernel zswap cannot compress further. > >=20=20 >=20> $ cgcreate -g memory:test > > $ cgexec -g memory:test ./test_zswpraw & > > $ cat /sys/fs/cgroup/test/memory.stat | grep zswp > > zswpraw 0 > > zswpin 0 > > zswpout 0 > > zswpwb 0 > >=20=20 >=20> $ echo "100M" > /sys/fs/cgroup/test/memory.reclaim > > $ cat /sys/fs/cgroup/test/memory.stat | grep zswp > > zswpraw 104800256 > > zswpin 0 > > zswpout 51222 > > zswpwb 0 > >=20=20 >=20> $ pkill test_zswpraw > > $ cat /sys/fs/cgroup/test/memory.stat | grep zswp > > zswpraw 0 > > zswpin 1 > > zswpout 51222 > > zswpwb 0 > >=20=20 >=20> [1] https://gist.github.com/mrpre/00432c6154250326994fbeaf62e0e6f1 > >=20=20 >=20> Signed-off-by: Jiayuan Chen > > --- > > include/linux/memcontrol.h | 1 + > > include/linux/zswap.h | 9 +++++++++ > > mm/memcontrol.c | 6 ++++++ > > mm/zswap.c | 6 +++--- > > 4 files changed, 19 insertions(+), 3 deletions(-) > >=20 >=20As others also mentioned, the documentation of the new stat would be = needed. >=20 >=20>=20 >=20> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.= h > > index b6c82c8f73e1..83d1328f81d1 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -39,6 +39,7 @@ enum memcg_stat_item { > > MEMCG_KMEM, > > MEMCG_ZSWAP_B, > > MEMCG_ZSWAPPED, > > + MEMCG_ZSWAP_RAW, > > MEMCG_NR_STAT, > > }; > >=20=20 >=20> diff --git a/include/linux/zswap.h b/include/linux/zswap.h > > index 30c193a1207e..94f84b154b71 100644 > > --- a/include/linux/zswap.h > > +++ b/include/linux/zswap.h > > @@ -7,6 +7,15 @@ > >=20=20 >=20> struct lruvec; > >=20=20 >=20> +/* > > + * Check if a zswap entry is stored in raw (uncompressed) form. > > + * This happens when compression doesn't reduce the size. > > + */ > > +static inline bool zswap_is_raw(size_t size) > > +{ > > + return size =3D=3D PAGE_SIZE; > > +} > > + > >=20 >=20No strong opinion, but I'm not really sure if the helper is needed, b= ecause it > feels quite simple logic: >=20 >=20 "If an object is compressed and the size is same to the original one= , the > object is incompressible." >=20 >=20I also feel the function name bit odd, given the type of the paramete= r. Based > on the function name and the comment, I'd expect it to receive a zswap_= entry > object. I understand it is better to receive a size_t, to be called fro= m > obj_cgroup_[un]charge_zswap(), though. Even in the case, I think the na= me can > be better (e.g., zswap_compression_failed() or zswap_was_incompressible= () ?), > or at least the coment can be more kindly explain the fact that the par= ameter > is the size of object after the compression attempt. >=20 >=20>=20 >=20> extern atomic_long_t zswap_stored_pages; > >=20=20 >=20> #ifdef CONFIG_ZSWAP > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 007413a53b45..32fb801530a3 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -341,6 +341,7 @@ static const unsigned int memcg_stat_items[] =3D= { > > MEMCG_KMEM, > > MEMCG_ZSWAP_B, > > MEMCG_ZSWAPPED, > > + MEMCG_ZSWAP_RAW, > > }; > >=20 >=20No strong opinion, but I think Shakeel's suggestion of other names is > reasonable. >=20 >=20>=20 >=20> #define NR_MEMCG_NODE_STAT_ITEMS ARRAY_SIZE(memcg_node_stat_items) > > @@ -1346,6 +1347,7 @@ static const struct memory_stat memory_stats[]= =3D { > > #ifdef CONFIG_ZSWAP > > { "zswap", MEMCG_ZSWAP_B }, > > { "zswapped", MEMCG_ZSWAPPED }, > > + { "zswpraw", MEMCG_ZSWAP_RAW }, > >=20 >=20Ditto. >=20 >=20>=20 >=20> #endif > > { "file_mapped", NR_FILE_MAPPED }, > > { "file_dirty", NR_FILE_DIRTY }, > > @@ -5458,6 +5460,8 @@ void obj_cgroup_charge_zswap(struct obj_cgroup= *objcg, size_t size) > > memcg =3D obj_cgroup_memcg(objcg); > > mod_memcg_state(memcg, MEMCG_ZSWAP_B, size); > > mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1); > > + if (zswap_is_raw(size)) > > + mod_memcg_state(memcg, MEMCG_ZSWAP_RAW, 1); > >=20 >=20I understand the helper function is better to receive size_t rather t= han > zswap_entry for this. >=20 >=20>=20 >=20> rcu_read_unlock(); > > } > >=20=20 >=20> @@ -5481,6 +5485,8 @@ void obj_cgroup_uncharge_zswap(struct obj_cg= roup *objcg, size_t size) > > memcg =3D obj_cgroup_memcg(objcg); > > mod_memcg_state(memcg, MEMCG_ZSWAP_B, -size); > > mod_memcg_state(memcg, MEMCG_ZSWAPPED, -1); > > + if (zswap_is_raw(size)) > > + mod_memcg_state(memcg, MEMCG_ZSWAP_RAW, -1); > > rcu_read_unlock(); > > } > >=20=20 >=20> diff --git a/mm/zswap.c b/mm/zswap.c > > index 3d2d59ac3f9c..54ab4d126f64 100644 > > --- a/mm/zswap.c > > +++ b/mm/zswap.c > > @@ -723,7 +723,7 @@ static void zswap_entry_free(struct zswap_entry = *entry) > > obj_cgroup_uncharge_zswap(entry->objcg, entry->length); > > obj_cgroup_put(entry->objcg); > > } > > - if (entry->length =3D=3D PAGE_SIZE) > > + if (zswap_is_raw(entry->length)) > > atomic_long_dec(&zswap_stored_incompressible_pages); > > zswap_entry_cache_free(entry); > > atomic_long_dec(&zswap_stored_pages); > > @@ -941,7 +941,7 @@ static bool zswap_decompress(struct zswap_entry = *entry, struct folio *folio) > > zs_obj_read_sg_begin(pool->zs_pool, entry->handle, input, entry->len= gth); > >=20=20 >=20> /* zswap entries of length PAGE_SIZE are not compressed. */ > > - if (entry->length =3D=3D PAGE_SIZE) { > > + if (zswap_is_raw(entry->length)) { > > WARN_ON_ONCE(input->length !=3D PAGE_SIZE); > > memcpy_from_sglist(kmap_local_folio(folio, 0), input, 0, PAGE_SIZE); > > dlen =3D PAGE_SIZE; > >=20 >=20Below this part, I show 'dlen =3D=3D PAGE_SIZE'. Should it also be co= nverted to > use the helper function? >=20 The=20dlen variable represents the decompressed (plaintext) size.=0D Since we compress individual pages, the decompressed output should always be PAGE_SIZE in normal cases. This check validates whether decompression produced the expected result, = not whether the entry is incompressible. Using zswap_is_incomp() here would be semantically incorrect - the helper= is meant to check if an entry was stored without compression (i.e., compression failed to reduce = size), while dlen =3D=3D PAGE_SIZE verifies the output of decompression is valid.