From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>,
linux-mm@kvack.org, akpm@linux-foundation.org,
chengming.zhou@linux.dev, sj@kernel.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, gourry@gourry.net,
willy@infradead.org, ying.huang@linux.alibaba.com,
jonathan.cameron@huawei.com, dan.j.williams@intel.com,
linux-cxl@vger.kernel.org, minchan@kernel.org,
senozhatsky@chromium.org
Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems
Date: Mon, 31 Mar 2025 10:32:01 -0700 [thread overview]
Message-ID: <CAKEwX=Nw8PZYKd4TcC2+VW7URzT67aM0wJyYMu5X01ngbFO_Yg@mail.gmail.com> (raw)
In-Reply-To: <20250331165306.GC2110528@cmpxchg.org>
On Mon, Mar 31, 2025 at 9:53 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote:
> > March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@gmail.com> wrote:
> >
> > > Currently, systems with CXL-based memory tiering can encounter the
> > > following inversion with zswap: the coldest pages demoted to the CXL
> > > tier can return to the high tier when they are zswapped out,
> > > creating memory pressure on the high tier.
> > > This happens because zsmalloc, zswap's backend memory allocator, does
> > > not enforce any memory policy. If the task reclaiming memory follows
> > > the local-first policy for example, the memory requested for zswap can
> > > be served by the upper tier, leading to the aformentioned inversion.
> > > This RFC fixes this inversion by adding a new memory allocation mode
> > > for zswap (exposed through a zswap sysfs knob), intended for
> > > hosts with CXL, where the memory for the compressed object is requested
> > > preferentially from the same node that the original page resides on.
> >
> > I didn't look too closely, but why not just prefer the same node by
> > default? Why is a knob needed?
>
> +1 It should really be the default.
>
> Even on regular NUMA setups this behavior makes more sense. Consider a
> direct reclaimer scanning nodes in order of allocation preference. If
> it ventures into remote nodes, the memory it compresses there should
> stay there. Trying to shift those contents over to the reclaiming
> thread's preferred node further *increases* its local pressure, and
> provoking more spills. The remote node is also the most likely to
> refault this data again. This is just bad for everybody.
Makes a lot of sense. I'll include this in the v2 of the patch series,
and rephrase this as a generic, NUMA system fix (with CXL as one of
the examples/motivations).
Thanks for the comment, Johannes! I'll remove this knob altogether and
make this the default behavior.
>
> > Or maybe if there's a way to tell the "tier" of the node we can
> > prefer to allocate from the same "tier"?
>
> Presumably, other nodes in the same tier would come first in the
> fallback zonelist of that node, so page_to_nid() should just work.
>
> I wouldn't complicate this until somebody has real systems where it
> does the wrong thing.
>
> My vote is to stick with page_to_nid(), but do it unconditionally.
SGTM.
>
next prev parent reply other threads:[~2025-03-31 17:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-29 11:02 Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store the compressed objects Nhat Pham
2025-03-31 22:17 ` Dan Williams
2025-03-31 23:03 ` Nhat Pham
2025-03-31 23:22 ` Dan Williams
2025-04-01 1:13 ` Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 2/2] zswap: add sysfs knob for same node mode Nhat Pham
2025-03-29 19:53 ` [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Yosry Ahmed
2025-03-29 22:13 ` Nhat Pham
2025-03-29 22:17 ` Nhat Pham
2025-03-31 16:53 ` Johannes Weiner
2025-03-31 17:32 ` Nhat Pham [this message]
2025-03-31 17:06 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAKEwX=Nw8PZYKd4TcC2+VW7URzT67aM0wJyYMu5X01ngbFO_Yg@mail.gmail.com' \
--to=nphamcs@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=chengming.zhou@linux.dev \
--cc=dan.j.williams@intel.com \
--cc=gourry@gourry.net \
--cc=hannes@cmpxchg.org \
--cc=jonathan.cameron@huawei.com \
--cc=kernel-team@meta.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=sj@kernel.org \
--cc=willy@infradead.org \
--cc=ying.huang@linux.alibaba.com \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox