linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joshua Hahn <joshua.hahnjy@gmail.com>
To: Minchan Kim <minchan@kernel.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	Nhat Pham <hoangnhat.pham@linux.dev>,
	Nhat Pham <nphamcs@gmail.com>, Harry Yoo <harry.yoo@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@meta.com
Subject: [PATCH 11/11] mm/zsmalloc: Handle charge migration in zpdesc substitution
Date: Wed, 11 Mar 2026 12:51:48 -0700	[thread overview]
Message-ID: <20260311195153.4013476-12-joshua.hahnjy@gmail.com> (raw)
In-Reply-To: <20260311195153.4013476-1-joshua.hahnjy@gmail.com>

In zsmalloc, there are two types of migrations: Migrations of single
compressed objects from one zspage to another, and substitutions of
zpdescs from zspages.

In both of these migrations, memcg association for the compressed
objects do not change. However, the physical location of the compressed
objects may change, which alters their lruvec association.

In this patch, handle the substitution of zpdescs from zspages, which
may change the node of all objects present (wholly or partially).

Take special care to address the partial compressed object at the
beginning of the swapped out zpdesc. "Ownership" of spanning objects
are associated to the zpdesc it begins on. Thus, when handling the
first compressed object, we must iterate through the (up to 4)
zpdescs present in the zspage to find the previous zpdesc, then
retrieve the object's zspage-wide index.

For the same reason, pool->uncompressed_stat, which can only be
accounted at PAGE_SIZE granularity for the node statistics, are
accounted for objects beginning in the zpdesc.

Likewise for the spanning object at the end of the replaced zpdesc,
account only the amount that lives on the zpdesc.

Note that these operations cannot call the existing
zs_{charge, uncharge}_objcg functions we introduced, since we are
holding the class spin lock and obj_cgroup_charge can sleep.

Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com>
---
 mm/zsmalloc.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 92 insertions(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f3508ff8b3ab..a4c90447d28e 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1932,6 +1932,94 @@ static bool zs_page_isolate(struct page *page, isolate_mode_t mode)
 	return page_zpdesc(page)->zspage;
 }
 
+#ifdef CONFIG_MEMCG
+static void zs_migrate_lruvec(struct zs_pool *pool, struct obj_cgroup *objcg,
+			      int old_nid, int new_nid, int charge,
+			      int obj_size)
+{
+	struct mem_cgroup *memcg;
+	struct lruvec *old_lruvec, *new_lruvec;
+	int partial;
+
+	if (old_nid == new_nid || !objcg)
+		return;
+
+	/* Proportional (partial) uncompressed share for this portion */
+	partial = (PAGE_SIZE * charge) / obj_size;
+
+	rcu_read_lock();
+	memcg = obj_cgroup_memcg(objcg);
+	old_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(old_nid));
+	new_lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(new_nid));
+
+	mod_memcg_lruvec_state(old_lruvec, pool->compressed_stat, -charge);
+	mod_memcg_lruvec_state(new_lruvec, pool->compressed_stat, charge);
+
+	mod_memcg_lruvec_state(old_lruvec, pool->uncompressed_stat, -partial);
+	mod_memcg_lruvec_state(new_lruvec, pool->uncompressed_stat, partial);
+	rcu_read_unlock();
+}
+
+/*
+ * Transfer per-lruvec and node-level stats when a zspage replaces a zpdesc
+ * with one from a different NUMA node. Must be called while old_zpdesc is
+ * still linked to the zspage. memcg-level charges are unchanged.
+ */
+static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zspage,
+				   struct zpdesc *old_zpdesc,
+				   struct zpdesc *new_zpdesc,
+				   struct size_class *class)
+{
+	int size = class->size;
+	int old_nid = page_to_nid(zpdesc_page(old_zpdesc));
+	int new_nid = page_to_nid(zpdesc_page(new_zpdesc));
+	unsigned int off, first_obj_offset, page_offset = 0;
+	unsigned int idx;
+	struct zpdesc *cursor = zspage->first_zpdesc;
+
+	if (old_nid == new_nid)
+		return;
+
+	while (cursor != old_zpdesc) {
+		cursor = get_next_zpdesc(cursor);
+		page_offset += PAGE_SIZE;
+	}
+
+	first_obj_offset = get_first_obj_offset(old_zpdesc);
+	idx = (page_offset + first_obj_offset) / size;
+
+	/* Boundary object spaning from the previous zpdesc*/
+	if (idx > 0 && zspage->objcgs[idx - 1])
+		zs_migrate_lruvec(pool, zspage->objcgs[idx - 1],
+				  old_nid, new_nid, first_obj_offset, size);
+
+	for (off = first_obj_offset;
+			off < PAGE_SIZE && idx < class->objs_per_zspage;
+			idx++, off += size) {
+		struct obj_cgroup *objcg = zspage->objcgs[idx];
+		int bytes_on_page = min_t(int, size, PAGE_SIZE - off);
+
+		if (!objcg)
+			continue;
+
+		zs_migrate_lruvec(pool, objcg, old_nid, new_nid,
+				  bytes_on_page, size);
+
+		dec_node_page_state(zpdesc_page(old_zpdesc),
+				    pool->uncompressed_stat);
+		inc_node_page_state(zpdesc_page(new_zpdesc),
+				    pool->uncompressed_stat);
+	}
+}
+#else
+static void zs_page_migrate_lruvec(struct zs_pool *pool, struct zspage *zspage,
+				   struct zpdesc *old_zpdesc,
+				   struct zpdesc *new_zpdesc,
+				   struct size_class *class)
+{
+}
+#endif
+
 static int zs_page_migrate(struct page *newpage, struct page *page,
 		enum migrate_mode mode)
 {
@@ -2004,6 +2092,10 @@ static int zs_page_migrate(struct page *newpage, struct page *page,
 	}
 	kunmap_local(s_addr);
 
+	/* Transfer lruvec/node stats while old zpdesc is still linked */
+	if (pool->memcg_aware)
+		zs_page_migrate_lruvec(pool, zspage, zpdesc, newzpdesc, class);
+
 	replace_sub_page(class, zspage, newzpdesc, zpdesc);
 	/*
 	 * Since we complete the data copy and set up new zspage structure,
-- 
2.52.0



  parent reply	other threads:[~2026-03-11 19:52 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-11 19:51 [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
2026-03-11 19:51 ` [PATCH 01/11] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
2026-03-11 19:56   ` Yosry Ahmed
2026-03-11 20:00   ` Nhat Pham
2026-03-11 19:51 ` [PATCH 02/11] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
2026-03-11 19:58   ` Yosry Ahmed
2026-03-11 20:01   ` Nhat Pham
2026-03-11 19:51 ` [PATCH 03/11] mm/zsmalloc: Introduce conditional memcg awareness to zs_pool Joshua Hahn
2026-03-11 20:12   ` Nhat Pham
2026-03-11 20:16   ` Johannes Weiner
2026-03-11 20:19     ` Yosry Ahmed
2026-03-11 20:20     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 04/11] mm/zsmalloc: Introduce objcgs pointer in struct zspage Joshua Hahn
2026-03-11 20:17   ` Nhat Pham
2026-03-11 20:22     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 05/11] mm/zsmalloc: Store obj_cgroup pointer in zspage Joshua Hahn
2026-03-11 20:17   ` Yosry Ahmed
2026-03-11 20:24     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 06/11] mm/zsmalloc, zswap: Redirect zswap_entry->objcg to zspage Joshua Hahn
2026-03-11 19:51 ` [PATCH 07/11] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
2026-03-12 21:42   ` Johannes Weiner
2026-03-13 15:34     ` Joshua Hahn
2026-03-13 16:49       ` Johannes Weiner
2026-03-11 19:51 ` [PATCH 08/11] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
2026-03-11 20:33   ` Nhat Pham
2026-03-17 19:13     ` Joshua Hahn
2026-03-11 19:51 ` [PATCH 09/11] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
2026-03-11 19:51 ` [PATCH 10/11] mm/zsmalloc: Handle single object charge migration in migrate_zspage Joshua Hahn
2026-03-12  3:51   ` kernel test robot
2026-03-12  3:51   ` kernel test robot
2026-03-12 16:56     ` Joshua Hahn
2026-03-11 19:51 ` Joshua Hahn [this message]
2026-03-11 19:54 ` [PATCH 00/11] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260311195153.4013476-12-joshua.hahnjy@gmail.com \
    --to=joshua.hahnjy@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=harry.yoo@oracle.com \
    --cc=hoangnhat.pham@linux.dev \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox