linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Joshua Hahn <joshua.hahnjy@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Yosry Ahmed <yosry.ahmed@linux.dev>,
	 Nhat Pham <hoangnhat.pham@linux.dev>,
	Chengming Zhou <chengming.zhou@linux.dev>,
	 Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeel.butt@linux.dev>,
	Muchun Song <muchun.song@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	 linux-kernel@vger.kernel.org, kernel-team@meta.com
Subject: Re: [PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting
Date: Mon, 2 Mar 2026 13:31:32 -0800	[thread overview]
Message-ID: <CAKEwX=N-yzg66Ge5YgDNG7nh3ues62fSjmi6oGq1B=gkz6e2Uw@mail.gmail.com> (raw)
In-Reply-To: <20260226192936.3190275-1-joshua.hahnjy@gmail.com>

On Thu, Feb 26, 2026 at 11:29 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> INTRODUCTION
> ============
> The current design for zswap and zsmalloc leaves a clean divide between
> layers of the memory stack. At the higher level, we have zswap, which
> interacts directly with memory consumers, compression algorithms, and
> handles memory usage accounting via memcg limits. At the lower level,
> we have zsmalloc, which handles the page allocation and migration of
> physical pages.
>
> While this logical separation simplifies the codebase, it leaves
> problems for accounting that requires both memory cgroup awareness and
> physical memory location. To name a few:
>
>  - On tiered systems, it is impossible to understand how much toptier
>    memory a cgroup is using, since zswap has no understanding of where
>    the compressed memory is physically stored.
>    + With SeongJae Park's work to store incompressible pages as-is in
>      zswap [1], the size of compressed memory can become non-trivial,
>      and easily consume a meaningful portion of memory.
>
>  - cgroups that restrict memory nodes have no control over which nodes
>    their zswapped objects live on. This can lead to unexpectedly high
>    fault times for workloads, who must eat the remote access latency
>    cost of retrieving the compressed object from a remote node.
>    + Nhat Pham addressed this issue via a best-effort attempt to place
>      compressed objects in the same page as the original page, but this
>      cannot guarantee complete isolation [2].
>
>  - On the flip side, zsmalloc's ignorance of cgroup also makes its
>    shrinker memcg-unaware, which can lead to ineffective reclaim when
>    pressure is localized to a single cgroup.
>
> Until recently, zpool acted as another layer of indirection between
> zswap and zsmalloc, which made bridging memcg and physical location
> difficult. Now that zsmalloc is the only allocator backend for zswap and
> zram [3], it is possible to move memory-cgroup accounting to the
> zsmalloc layer.
>
> Introduce a new per-zpdesc array of objcg pointers to track
> per-memcg-lruvec memory usage by zswap, while leaving zram users
> unaffected.
>
> This creates one source of truth for NR_ZSWAP, and more accurate
> accounting for NR_ZSWAPPED.
>
> This brings sizeof(struct zpdesc) from 56 bytes to 64 bytes, but this
> increase in size is unseen by the rest of the system because zpdesc
> overlays struct page. Implementation details and care taken to handle
> the page->memcg_data field can be found in patch 3.
>
> In addition, move the accounting of memcg charges to the zsmalloc layer,
> whose only user is zswap at the moment.
>
> PATCH OUTLINE
> =============
> Patches 1 and 2 are small cleanups that make the codebase consistent and
> easier to digest.
>
> Patches 3, 4, and 5 allocate and populate the new zpdesc->objcgs field
> with compressed objects' obj_cgroups. zswap_entry->objcgs is removed,
> and redirected to look at the zspage for memcg information.
>
> Patch 6 moves the charging and lifetime management of obj_cgroups to
> the zsmalloc layer, which leaves zswap only as a plumbing layer to hand
> cgroup information to zsmalloc.
>
> Patches 7 and 8 introduce node counters and memcg-lruvec counters for
> zswap. Special care is taken for compressed objects that span multiple
> nodes.
>
> [1] https://lore.kernel.org/linux-mm/20250822190817.49287-1-sj@kernel.org/
> [2] https://lore.kernel.org/linux-mm/20250402204416.3435994-1-nphamcs@gmail.com/#t3
> [3] https://lore.kernel.org/linux-mm/20250829162212.208258-1-hannes@cmpxchg.org/
> [4] https://lore.kernel.org/linux-mm/c8bc2dce-d4ec-c16e-8df4-2624c48cfc06@google.com/
>
> Joshua Hahn (8):
>   mm/zsmalloc: Rename zs_object_copy to zs_obj_copy
>   mm/zsmalloc: Make all obj_idx unsigned ints
>   mm/zsmalloc: Introduce objcgs pointer in struct zpdesc
>   mm/zsmalloc: Store obj_cgroup pointer in zpdesc
>   mm/zsmalloc,zswap: Redirect zswap_entry->obcg to zpdesc
>   mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
>   mm/memcontrol: Track MEMCG_ZSWAPPED in bytes
>   mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec
>
>  drivers/block/zram/zram_drv.c |  17 +-
>  include/linux/memcontrol.h    |  15 +-
>  include/linux/mmzone.h        |   2 +
>  include/linux/zsmalloc.h      |   6 +-
>  mm/memcontrol.c               |  68 ++------
>  mm/vmstat.c                   |   2 +
>  mm/zpdesc.h                   |  25 ++-
>  mm/zsmalloc.c                 | 282 ++++++++++++++++++++++++++++++++--
>  mm/zswap.c                    |  67 ++++----
>  9 files changed, 345 insertions(+), 139 deletions(-)

I might have missed it and this might be in one of the latter patches,
but could also add some quick and dirty benchmark for zswap to ensure
there's no or minimal performance implications? IIUC there is a small
amount of extra overhead in certain steps, because we have to go
through zsmalloc to query objcg. Usemem or kernel build should suffice
IMHO.

To be clear, I don't anticipate any observable performance change, but
it's a good sanity check :) Besides, can't be too careful with stress
testing stuff :P


      parent reply	other threads:[~2026-03-02 21:31 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 19:29 Joshua Hahn
2026-02-26 19:29 ` [PATCH 1/8] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
2026-02-26 19:29 ` [PATCH 2/8] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37   ` Shakeel Butt
2026-02-26 21:43     ` Joshua Hahn
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
2026-02-26 19:29 ` [PATCH 5/8] mm/zsmalloc,zswap: Redirect zswap_entry->obcg to zpdesc Joshua Hahn
2026-02-26 23:13   ` kernel test robot
2026-02-27 19:10     ` Joshua Hahn
2026-02-26 19:29 ` [PATCH 6/8] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
2026-02-26 19:29 ` [PATCH 7/8] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
2026-02-26 19:29 ` [PATCH 8/8] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
2026-02-26 22:40   ` kernel test robot
2026-02-27 19:45     ` Joshua Hahn
2026-02-26 23:02   ` kernel test robot
2026-03-02 21:31 ` Nhat Pham [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=N-yzg66Ge5YgDNG7nh3ues62fSjmi6oGq1B=gkz6e2Uw@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hoangnhat.pham@linux.dev \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=senozhatsky@chromium.org \
    --cc=shakeel.butt@linux.dev \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox