[PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Minchan Kim <minchan@kernel.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Jens Axboe <axboe@kernel.dk>, Yosry Ahmed <yosry.ahmed@linux.dev>,
	Nhat Pham <hoangnhat.pham@linux.dev>,
	Nhat Pham <nphamcs@gmail.com>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
Date: Thu, 26 Feb 2026 11:29:26 -0800	[thread overview]
Message-ID: <20260226192936.3190275-4-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <20260226192936.3190275-1-joshua.hahnjy@gmail.com>

Introduce an array of struct obj_cgroup pointers to zpdesc to keep track
of compressed objects' memcg ownership.

The 8 bytes required to add the array in struct zpdesc brings its size
up from 56 bytes to 64 bytes. However, in the current implementation,
struct zpdesc lays on top of struct page[1]. This allows the increased
size to remain invisible to the outside, since 64 bytes are used for
struct zpdesc anyways.

The newly added obj_cgroup array pointer overlays page->memcg_data,
which causes problems for functions that try to perform page charging by
checking the zeroness of page->memcg_data. To make sure that the
backing zpdesc's obj_cgroup ** is not interpreted as a mem_cgroup *,
follow SLUB's lead and use the MEMCG_DATA_OBJEXTS bit to tag the pointer.

Consumers of zsmalloc that do not perform memcg accounting (i.e. zram)
are completely unaffected by this patch, as the array to track the
obj_cgroup pointers are only allocated in the zswap path.

This patch temporarily increases the memory used by zswap by 8 bytes
per zswap_entry, since the obj_cgroup pointer is duplicated in the
zpdesc and in zswap_entry. In the following patches, we will redirect
memory charging operations to use the zpdesc's obj_cgroup instead, and
remove the pointer from zswap_entry. This will leave no net memory usage
increase for both zram and zswap.

In this patch, allocate / free the objcg pointer array for the zswap
path, and handle partial object migration and full zpdesc migration.

[1] In the (near) future, struct zpdesc may no longer overlay struct
page as we shift towards using memdescs. When this happens, the size
increase of struct zpdesc will no longer free. With that said, the
difference can be kept minimal.

All the changes that are being implemented are currently guarded under
CONFIG_MEMCG. We can optionally minimize the impact on zram users by
guarding these changes in CONFIG_MEMCG && CONFIG_ZSWAP as well.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 drivers/block/zram/zram_drv.c | 10 ++---
 include/linux/zsmalloc.h      |  2 +-
 mm/zpdesc.h                   | 25 +++++++++++-
 mm/zsmalloc.c                 | 74 +++++++++++++++++++++++++++++------
 mm/zswap.c                    |  2 +-
 5 files changed, 93 insertions(+), 20 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 61d3e2c74901..60ee85679730 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2220,8 +2220,8 @@ static int write_incompressible_page(struct zram *zram, struct page *page,
 	 * like we do for compressible pages.
 	 */
 	handle = zs_malloc(zram->mem_pool, PAGE_SIZE,
-			   GFP_NOIO | __GFP_NOWARN |
-			   __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
+			   GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
+			   __GFP_MOVABLE, page_to_nid(page), false);
 	if (IS_ERR_VALUE(handle))
 		return PTR_ERR((void *)handle);
 
@@ -2283,8 +2283,8 @@ static int zram_write_page(struct zram *zram, struct page *page, u32 index)
 	}
 
 	handle = zs_malloc(zram->mem_pool, comp_len,
-			   GFP_NOIO | __GFP_NOWARN |
-			   __GFP_HIGHMEM | __GFP_MOVABLE, page_to_nid(page));
+			   GFP_NOIO | __GFP_NOWARN | __GFP_HIGHMEM |
+			   __GFP_MOVABLE, page_to_nid(page), false);
 	if (IS_ERR_VALUE(handle)) {
 		zcomp_stream_put(zstrm);
 		return PTR_ERR((void *)handle);
@@ -2514,7 +2514,7 @@ static int recompress_slot(struct zram *zram, u32 index, struct page *page,
 	handle_new = zs_malloc(zram->mem_pool, comp_len_new,
 			       GFP_NOIO | __GFP_NOWARN |
 			       __GFP_HIGHMEM | __GFP_MOVABLE,
-			       page_to_nid(page));
+			       page_to_nid(page), false);
 	if (IS_ERR_VALUE(handle_new)) {
 		zcomp_stream_put(zstrm);
 		return PTR_ERR((void *)handle_new);
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 478410c880b1..8ef28b964bb0 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -28,7 +28,7 @@ struct zs_pool *zs_create_pool(const char *name);
 void zs_destroy_pool(struct zs_pool *pool);
 
 unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags,
-			const int nid);
+			const int nid, bool objcg);
 void zs_free(struct zs_pool *pool, unsigned long obj);
 
 size_t zs_huge_class_size(struct zs_pool *pool);
diff --git a/mm/zpdesc.h b/mm/zpdesc.h
index b8258dc78548..d10a73e4a90e 100644
--- a/mm/zpdesc.h
+++ b/mm/zpdesc.h
@@ -20,10 +20,12 @@
  * @zspage:		Points to the zspage this zpdesc is a part of.
  * @first_obj_offset:	First object offset in zsmalloc pool.
  * @_refcount:		The number of references to this zpdesc.
+ * @objcgs:		Array of objcgs pointers that the stored objs
+ *			belong to. Overlayed on top of page->memcg_data, and
+ *			will always have first bit set if it is a valid pointer.
  *
  * This struct overlays struct page for now. Do not modify without a good
- * understanding of the issues. In particular, do not expand into the overlap
- * with memcg_data.
+ * understanding of the issues.
  *
  * Page flags used:
  * * PG_private identifies the first component page.
@@ -47,6 +49,9 @@ struct zpdesc {
 	 */
 	unsigned int first_obj_offset;
 	atomic_t _refcount;
+#ifdef CONFIG_MEMCG
+	unsigned long objcgs;
+#endif
 };
 #define ZPDESC_MATCH(pg, zp) \
 	static_assert(offsetof(struct page, pg) == offsetof(struct zpdesc, zp))
@@ -59,6 +64,9 @@ ZPDESC_MATCH(__folio_index, handle);
 ZPDESC_MATCH(private, zspage);
 ZPDESC_MATCH(page_type, first_obj_offset);
 ZPDESC_MATCH(_refcount, _refcount);
+#ifdef CONFIG_MEMCG
+ZPDESC_MATCH(memcg_data, objcgs);
+#endif
 #undef ZPDESC_MATCH
 static_assert(sizeof(struct zpdesc) <= sizeof(struct page));
 
@@ -171,4 +179,17 @@ static inline bool zpdesc_is_locked(struct zpdesc *zpdesc)
 {
 	return folio_test_locked(zpdesc_folio(zpdesc));
 }
+
+#ifdef CONFIG_MEMCG
+static inline struct obj_cgroup **zpdesc_objcgs(struct zpdesc *zpdesc)
+{
+	return (struct obj_cgroup **)(zpdesc->objcgs & ~OBJEXTS_FLAGS_MASK);
+}
+
+static inline void zpdesc_set_objcgs(struct zpdesc *zpdesc,
+				     struct obj_cgroup **objcgs)
+{
+	zpdesc->objcgs = (unsigned long)objcgs | MEMCG_DATA_OBJEXTS;
+}
+#endif
 #endif
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 7846f31bcc8b..7d56bb700e11 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -39,6 +39,7 @@
 #include <linux/zsmalloc.h>
 #include <linux/fs.h>
 #include <linux/workqueue.h>
+#include <linux/memcontrol.h>
 #include "zpdesc.h"
 
 #define ZSPAGE_MAGIC	0x58
@@ -777,6 +778,10 @@ static void reset_zpdesc(struct zpdesc *zpdesc)
 	ClearPagePrivate(page);
 	zpdesc->zspage = NULL;
 	zpdesc->next = NULL;
+#ifdef CONFIG_MEMCG
+	kfree(zpdesc_objcgs(zpdesc));
+	zpdesc->objcgs = 0;
+#endif
 	/* PageZsmalloc is sticky until the page is freed to the buddy. */
 }
 
@@ -893,6 +898,43 @@ static void init_zspage(struct size_class *class, struct zspage *zspage)
 	set_freeobj(zspage, 0);
 }
 
+#ifdef CONFIG_MEMCG
+static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
+				struct zpdesc *zpdescs[])
+{
+	/*
+	 * Add 2 to objcgs_per_zpdesc to account for partial objs that may be
+	 * stored at the beginning or end of the zpdesc.
+	 */
+	int objcgs_per_zpdesc = (PAGE_SIZE / class->size) + 2;
+	int i;
+	struct obj_cgroup **objcgs;
+
+	for (i = 0; i < class->pages_per_zspage; i++) {
+		objcgs = kcalloc(objcgs_per_zpdesc, sizeof(struct obj_cgroup *),
+				 gfp & ~__GFP_HIGHMEM);
+		if (!objcgs) {
+			while (--i >= 0) {
+				kfree(zpdesc_objcgs(zpdescs[i]));
+				zpdescs[i]->objcgs = 0;
+			}
+
+			return false;
+		}
+
+		zpdesc_set_objcgs(zpdescs[i], objcgs);
+	}
+
+	return true;
+}
+#else
+static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
+				struct zpdesc *zpdescs[])
+{
+	return true;
+}
+#endif
+
 static void create_page_chain(struct size_class *class, struct zspage *zspage,
 				struct zpdesc *zpdescs[])
 {
@@ -931,7 +973,7 @@ static void create_page_chain(struct size_class *class, struct zspage *zspage,
  */
 static struct zspage *alloc_zspage(struct zs_pool *pool,
 				   struct size_class *class,
-				   gfp_t gfp, const int nid)
+				   gfp_t gfp, const int nid, bool objcg)
 {
 	int i;
 	struct zpdesc *zpdescs[ZS_MAX_PAGES_PER_ZSPAGE];
@@ -952,24 +994,29 @@ static struct zspage *alloc_zspage(struct zs_pool *pool,
 		struct zpdesc *zpdesc;
 
 		zpdesc = alloc_zpdesc(gfp, nid);
-		if (!zpdesc) {
-			while (--i >= 0) {
-				zpdesc_dec_zone_page_state(zpdescs[i]);
-				free_zpdesc(zpdescs[i]);
-			}
-			cache_free_zspage(zspage);
-			return NULL;
-		}
+		if (!zpdesc)
+			goto err;
 		__zpdesc_set_zsmalloc(zpdesc);
 
 		zpdesc_inc_zone_page_state(zpdesc);
 		zpdescs[i] = zpdesc;
 	}
 
+	if (objcg && !alloc_zspage_objcgs(class, gfp, zpdescs))
+		goto err;
+
 	create_page_chain(class, zspage, zpdescs);
 	init_zspage(class, zspage);
 
 	return zspage;
+
+err:
+	while (--i >= 0) {
+		zpdesc_dec_zone_page_state(zpdescs[i]);
+		free_zpdesc(zpdescs[i]);
+	}
+	cache_free_zspage(zspage);
+	return NULL;
 }
 
 static struct zspage *find_get_zspage(struct size_class *class)
@@ -1289,13 +1336,14 @@ static unsigned long obj_malloc(struct zs_pool *pool,
  * @size: size of block to allocate
  * @gfp: gfp flags when allocating object
  * @nid: The preferred node id to allocate new zspage (if needed)
+ * @objcg: Whether the zspage should track per-object memory charging.
  *
  * On success, handle to the allocated object is returned,
  * otherwise an ERR_PTR().
  * Allocation requests with size > ZS_MAX_ALLOC_SIZE will fail.
  */
 unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
-			const int nid)
+			const int nid, bool objcg)
 {
 	unsigned long handle;
 	struct size_class *class;
@@ -1330,7 +1378,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp,
 
 	spin_unlock(&class->lock);
 
-	zspage = alloc_zspage(pool, class, gfp, nid);
+	zspage = alloc_zspage(pool, class, gfp, nid, objcg);
 	if (!zspage) {
 		cache_free_handle(handle);
 		return (unsigned long)ERR_PTR(-ENOMEM);
@@ -1672,6 +1720,10 @@ static void replace_sub_page(struct size_class *class, struct zspage *zspage,
 	if (unlikely(ZsHugePage(zspage)))
 		newzpdesc->handle = oldzpdesc->handle;
 	__zpdesc_set_movable(newzpdesc);
+#ifdef CONFIG_MEMCG
+	zpdesc_set_objcgs(newzpdesc, zpdesc_objcgs(oldzpdesc));
+	oldzpdesc->objcgs = 0;
+#endif
 }
 
 static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
diff --git a/mm/zswap.c b/mm/zswap.c
index af3f0fbb0558..dd083110bfa0 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -905,7 +905,7 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry,
 	}
 
 	gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE;
-	handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page));
+	handle = zs_malloc(pool->zs_pool, dlen, gfp, page_to_nid(page), true);
 	if (IS_ERR_VALUE(handle)) {
 		alloc_ret = PTR_ERR((void *)handle);
 		goto unlock;
-- 
2.47.3

next prev parent reply	other threads:[~2026-02-26 19:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 19:29 [PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
2026-02-26 19:29 ` [PATCH 1/8] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
2026-02-26 19:29 ` [PATCH 2/8] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
2026-02-26 19:29 ` Joshua Hahn [this message]
2026-02-26 21:37   ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Shakeel Butt
2026-02-26 21:43     ` Joshua Hahn
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
2026-02-26 19:29 ` [PATCH 5/8] mm/zsmalloc,zswap: Redirect zswap_entry->obcg to zpdesc Joshua Hahn
2026-02-26 23:13   ` kernel test robot
2026-02-26 19:29 ` [PATCH 6/8] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
2026-02-26 19:29 ` [PATCH 7/8] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
2026-02-26 19:29 ` [PATCH 8/8] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
2026-02-26 22:40   ` kernel test robot
2026-02-26 23:02   ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260226192936.3190275-4-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hoangnhat.pham@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox