linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Yosry Ahmed <yosry@kernel.org>
Cc: Joshua Hahn <joshua.hahnjy@gmail.com>,
	Minchan Kim <minchan@kernel.org>,
	 Sergey Senozhatsky <senozhatsky@chromium.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Yosry Ahmed <yosry.ahmed@linux.dev>,
	Nhat Pham <hoangnhat.pham@linux.dev>,
	 Chengming Zhou <chengming.zhou@linux.dev>,
	Michal Hocko <mhocko@kernel.org>,
	 Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeel.butt@linux.dev>,
	 Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	 cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,  kernel-team@meta.com
Subject: Re: [PATCH 6/8] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc
Date: Wed, 4 Mar 2026 08:27:59 -0800	[thread overview]
Message-ID: <CAKEwX=ObFWm6cKbi4hL8JLDKui3MsRu-JpEFohBdkqHFY9tVfA@mail.gmail.com> (raw)
In-Reply-To: <CAO9r8zOJ5bkJzptM7+8V2G00dHuycAEAF4CDcaR1YMk0kWyktA@mail.gmail.com>

On Wed, Mar 4, 2026 at 7:47 AM Yosry Ahmed <yosry@kernel.org> wrote:
>
> On Wed, Mar 4, 2026 at 7:11 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
> >
> > On Tue, 3 Mar 2026 15:53:31 -0800 Yosry Ahmed <yosry@kernel.org> wrote:
> >
> > > > diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
> > > > index 067215a6ddcc..88c7cd399261 100644
> > > > --- a/mm/zsmalloc.c
> > > > +++ b/mm/zsmalloc.c
> > > > @@ -963,6 +963,44 @@ static bool alloc_zspage_objcgs(struct size_class *class, gfp_t gfp,
> > > >         return true;
> > > >  }
> > > >
> > > > +static void zs_charge_objcg(struct zpdesc *zpdesc, struct obj_cgroup *objcg,
> > > > +                           int size, unsigned long offset)
> > > > +{
> > > > +       struct mem_cgroup *memcg;
> > > > +
> > > > +       if (!cgroup_subsys_on_dfl(memory_cgrp_subsys))
> > > > +               return;
> > > > +
> > > > +       VM_WARN_ON_ONCE(!(current->flags & PF_MEMALLOC));
> > > > +
> > > > +       /* PF_MEMALLOC context, charging must succeed */
> > > > +       if (obj_cgroup_charge(objcg, GFP_KERNEL, size))
> > > > +               VM_WARN_ON_ONCE(1);
> > > > +
> > > > +       rcu_read_lock();
> > > > +       memcg = obj_cgroup_memcg(objcg);
> > > > +       mod_memcg_state(memcg, MEMCG_ZSWAP_B, size);
> > > > +       mod_memcg_state(memcg, MEMCG_ZSWAPPED, 1);
> >
> > Hello Yosry, I hope you are doing well!
> > Thank you for your feedback : -)
> >
> > > Zsmalloc should not be updating zswap stats (e.g. in case zram starts
> > > supporting memcg charging).  How about moving the stat updates to
> > > zswap?
> >
> > Yeah... I think this was also a big point of concern for me. While reading
> > the code, I was really amazed by how clean the logical divide between
> > zsmalloc and zswap / zram were, and I wanted to preserve it as much as
> > possible.
> >
> > There are a few problems, though. Probably the biggest is that migration
> > of zpdescs and compressed objects within them are invisible to zswap.
> > Of course, this is by design, but this leads to two problems.
> >
> > zswap's ignorance of compressed objects' movements across physical nodes
> > makes it impossible to accurately charge and uncharge from the correct
> > memcg-lruvec.
> >
> > Conversely, zsmalloc's ignorance of memcg association makes it impossible
> > to correctly restrict cpusets.mems during migration.
> >
> > So the clean logical divide makes a lot of sense for separating the
> > high-level cgroup association, compression, etc. from the physical
> > location of the memory and migration / zpdesc compaction, but it would
> > appear that this comes at a cost of oversimplifying the logic and missing
> > out on accurate memory charging and a unified source of truth for the
> > counters.
> >
> > The last thing I wanted to note was that I agree that zsmalloc doing
> > explicit zswap stat updates feels a bit awkward. The reason I chose to do
> > this right now is because when enlightening zsmalloc about the compressed
> > objs' objcgs, zswap is the only one that does this memory accounting.
> > So having an objcg is a bit of a proxy to understand that the consumer
> > is zswap (as opposed to zram). Of course, if zram starts to do memcg
> > accounting as well, we'll have to start doing some other checks to
> > see if the compresed object should be accounted as zram or zswap.
> >
> > OK. That's all the defense I have for my design : -) Now for thinking
> > about other designs:
> >
> > I also explored whether it makes sense to make zsmalloc call a hook into
> > zswap code during and after migrations. The problem is that there isn't
> > a good way to do the compressed object --> zswap entry lookup, and this
> > still doesn't solve the issue of zsmalloc migrating compressed objects
> > without checking whether that object can live on another node.
> >
> > Maybe one possible approach is to turn the array of objcgs into an array
> > of backpointers from compressed objects to their corresponding zswap_entries?
> > One concern is that this does add 8 bytes of additional overhead per
> > zswap entry, and I'm not sure that this is acceptable. I'll keep thinking
> > on whether there's a creative way to save some memory here, though...
> >
> > Of course the other concern is what this will look like for zram users.
> > I guess it can be done similarly to what is done here, and only allocate
> > the array of pointers when called in from zswap.
> >
> > Anyways, thank you for bringing this up. What do you think about the
> > options we have here? I hope that I've motivated why we want
> > per-memcg-lruvec accounting as well. Please let me know if there is anything
> > I can provide additional context for : -)
>
> Thanks for the detailed elaboration.
>
> AFAICT the only zswap-specific part is the actual stat indexes, what
> if these are parameterized at the zsmalloc pool level? AFAICT zswap
> and zram will never share a pool.

TBH, if we were to start from scratch, these should be zsmalloc
counters not zswap counters. Only zsmalloc knows about the memory
placement and real memory consumption (i.e taking into account
intra-slot wasted space) - this information is abstracted away from
all of the callers. And if/when zram supports cgroup tracking, memory
used by zswap and memory used by zram is indistinguishable, no?

Anyway, Joshua, do you think this is doable? Seems promising to me,
but idk if it will be clean to implement or not.


  parent reply	other threads:[~2026-03-04 16:28 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-26 19:29 [PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Joshua Hahn
2026-02-26 19:29 ` [PATCH 1/8] mm/zsmalloc: Rename zs_object_copy to zs_obj_copy Joshua Hahn
2026-02-26 19:29 ` [PATCH 2/8] mm/zsmalloc: Make all obj_idx unsigned ints Joshua Hahn
2026-02-26 19:29 ` [PATCH 3/8] mm/zsmalloc: Introduce objcgs pointer in struct zpdesc Joshua Hahn
2026-02-26 21:37   ` Shakeel Butt
2026-02-26 21:43     ` Joshua Hahn
2026-03-04 16:58   ` Yosry Ahmed
2026-03-04 18:03     ` Joshua Hahn
2026-02-26 19:29 ` [PATCH 4/8] mm/zsmalloc: Store obj_cgroup pointer in zpdesc Joshua Hahn
2026-02-26 19:29 ` [PATCH 5/8] mm/zsmalloc,zswap: Redirect zswap_entry->obcg to zpdesc Joshua Hahn
2026-02-26 23:13   ` kernel test robot
2026-02-27 19:10     ` Joshua Hahn
2026-02-26 19:29 ` [PATCH 6/8] mm/zsmalloc, zswap: Handle objcg charging and lifetime in zsmalloc Joshua Hahn
2026-03-03 23:53   ` Yosry Ahmed
2026-03-04 15:11     ` Joshua Hahn
2026-03-04 15:46       ` Yosry Ahmed
2026-03-04 16:26         ` Joshua Hahn
2026-03-04 16:27         ` Nhat Pham [this message]
2026-03-04 16:45           ` Yosry Ahmed
2026-03-04 16:49             ` Nhat Pham
2026-02-26 19:29 ` [PATCH 7/8] mm/memcontrol: Track MEMCG_ZSWAPPED in bytes Joshua Hahn
2026-02-26 19:29 ` [PATCH 8/8] mm/vmstat, memcontrol: Track ZSWAP_B, ZSWAPPED_B per-memcg-lruvec Joshua Hahn
2026-02-26 22:40   ` kernel test robot
2026-02-27 19:45     ` Joshua Hahn
2026-02-26 23:02   ` kernel test robot
2026-03-02 21:31 ` [PATCH 0/8] mm/zswap, zsmalloc: Per-memcg-lruvec zswap accounting Nhat Pham
2026-03-03 17:51   ` Joshua Hahn
2026-03-03 18:01     ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=ObFWm6cKbi4hL8JLDKui3MsRu-JpEFohBdkqHFY9tVfA@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=hannes@cmpxchg.org \
    --cc=hoangnhat.pham@linux.dev \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=minchan@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=senozhatsky@chromium.org \
    --cc=shakeel.butt@linux.dev \
    --cc=yosry.ahmed@linux.dev \
    --cc=yosry@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox