linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	 chengming.zhou@linux.dev, sj@kernel.org, kernel-team@meta.com,
	 linux-kernel@vger.kernel.org, gourry@gourry.net,
	willy@infradead.org,  ying.huang@linux.alibaba.com,
	jonathan.cameron@huawei.com,  dan.j.williams@intel.com,
	linux-cxl@vger.kernel.org, minchan@kernel.org,
	 senozhatsky@chromium.org
Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems
Date: Mon, 31 Mar 2025 10:32:01 -0700	[thread overview]
Message-ID: <CAKEwX=Nw8PZYKd4TcC2+VW7URzT67aM0wJyYMu5X01ngbFO_Yg@mail.gmail.com> (raw)
In-Reply-To: <20250331165306.GC2110528@cmpxchg.org>

On Mon, Mar 31, 2025 at 9:53 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote:
> > March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@gmail.com> wrote:
> >
> > > Currently, systems with CXL-based memory tiering can encounter the
> > > following inversion with zswap: the coldest pages demoted to the CXL
> > > tier can return to the high tier when they are zswapped out,
> > > creating memory pressure on the high tier.
> > > This happens because zsmalloc, zswap's backend memory allocator, does
> > > not enforce any memory policy. If the task reclaiming memory follows
> > > the local-first policy for example, the memory requested for zswap can
> > > be served by the upper tier, leading to the aformentioned inversion.
> > > This RFC fixes this inversion by adding a new memory allocation mode
> > > for zswap (exposed through a zswap sysfs knob), intended for
> > > hosts with CXL, where the memory for the compressed object is requested
> > > preferentially from the same node that the original page resides on.
> >
> > I didn't look too closely, but why not just prefer the same node by
> > default? Why is a knob needed?
>
> +1 It should really be the default.
>
> Even on regular NUMA setups this behavior makes more sense. Consider a
> direct reclaimer scanning nodes in order of allocation preference. If
> it ventures into remote nodes, the memory it compresses there should
> stay there. Trying to shift those contents over to the reclaiming
> thread's preferred node further *increases* its local pressure, and
> provoking more spills. The remote node is also the most likely to
> refault this data again. This is just bad for everybody.

Makes a lot of sense. I'll include this in the v2 of the patch series,
and rephrase this as a generic, NUMA system fix (with CXL as one of
the examples/motivations).

Thanks for the comment, Johannes! I'll remove this knob altogether and
make this the default behavior.

>
> > Or maybe if there's a way to tell the "tier" of the node we can
> > prefer to allocate from the same "tier"?
>
> Presumably, other nodes in the same tier would come first in the
> fallback zonelist of that node, so page_to_nid() should just work.
>
> I wouldn't complicate this until somebody has real systems where it
> does the wrong thing.
>
> My vote is to stick with page_to_nid(), but do it unconditionally.

SGTM.

>


  reply	other threads:[~2025-03-31 17:32 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-29 11:02 Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store the compressed objects Nhat Pham
2025-03-31 22:17   ` Dan Williams
2025-03-31 23:03     ` Nhat Pham
2025-03-31 23:22       ` Dan Williams
2025-04-01  1:13         ` Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 2/2] zswap: add sysfs knob for same node mode Nhat Pham
2025-03-29 19:53 ` [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Yosry Ahmed
2025-03-29 22:13   ` Nhat Pham
2025-03-29 22:17     ` Nhat Pham
2025-03-31 16:53   ` Johannes Weiner
2025-03-31 17:32     ` Nhat Pham [this message]
2025-03-31 17:06   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=Nw8PZYKd4TcC2+VW7URzT67aM0wJyYMu5X01ngbFO_Yg@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=dan.j.williams@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=sj@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox