linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gourry@gourry.net>
To: Yosry Ahmed <yosry.ahmed@linux.dev>
Cc: Nhat Pham <nphamcs@gmail.com>,
	linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org, chengming.zhou@linux.dev, sj@kernel.org,
	kernel-team@meta.com, linux-kernel@vger.kernel.org,
	willy@infradead.org, ying.huang@linux.alibaba.com,
	jonathan.cameron@huawei.com, dan.j.williams@intel.com,
	linux-cxl@vger.kernel.org, minchan@kernel.org,
	senozhatsky@chromium.org
Subject: Re: [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems
Date: Mon, 31 Mar 2025 13:06:11 -0400	[thread overview]
Message-ID: <Z-rLg7xSgu62qCfs@gourry-fedora-PF4VCD3F> (raw)
In-Reply-To: <2759fa95d0071f3c5e33a9c6369f0d0bcecd76b7@linux.dev>

On Sat, Mar 29, 2025 at 07:53:23PM +0000, Yosry Ahmed wrote:
> March 29, 2025 at 1:02 PM, "Nhat Pham" <nphamcs@gmail.com> wrote:
> 
> > Currently, systems with CXL-based memory tiering can encounter the
> > following inversion with zswap: the coldest pages demoted to the CXL
> > tier can return to the high tier when they are zswapped out,
> > creating memory pressure on the high tier.
> > This happens because zsmalloc, zswap's backend memory allocator, does
> > not enforce any memory policy. If the task reclaiming memory follows
> > the local-first policy for example, the memory requested for zswap can
> > be served by the upper tier, leading to the aformentioned inversion.
> > This RFC fixes this inversion by adding a new memory allocation mode
> > for zswap (exposed through a zswap sysfs knob), intended for
> > hosts with CXL, where the memory for the compressed object is requested
> > preferentially from the same node that the original page resides on.
> 
> I didn't look too closely, but why not just prefer the same node by default? Why is a knob needed?
> 

Bit of an open question: does this hurt zswap performance?

And of course the begged question: Does that matter?

Probably the answer is not really and no, but nice to have the knob for
testing.  I imagine we'd drop it with the RFC tag.

> Or maybe if there's a way to tell the "tier" of the node we can prefer to allocate from the same "tier"?

In almost every system, tier=node for any sane situation, though nodes
across sockets can end up lumped into the same tier - which maybe
doesn't matter for zswap but isn't useful for almost anything else.

But maybe there's an argument for adding new tier-policies.

:think: 

int memtier_get_node(enum memtier_policy, int nid);

enum memtier_policy {
    MEMTIER_SAME_TIER,     // get a different node from same tier
    MEMTIER_DEMOTE_ONE,    // demote one step
    MEMTIER_DEMOTE_FAR,    // demote one step away from swap
    MEMTIER_PROMOTE_ONE,   // promote one step
    MEMTIER_PROMOTE_LOCAL, // promote to local on topology
};

Might be worth investigating.  Just spitballing here.

The issue is really fallback allocations.  In most cases, we know what
we'd like to do, but when the system is under pressure the question is
what behavior do we want from these components.  I'd hesistate to make a
strong claim about whether zswap should/should not fall back to a
higher-tier node under system pressure without strong data.

~Gregory


      parent reply	other threads:[~2025-03-31 17:06 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-29 11:02 Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store the compressed objects Nhat Pham
2025-03-31 22:17   ` Dan Williams
2025-03-31 23:03     ` Nhat Pham
2025-03-31 23:22       ` Dan Williams
2025-04-01  1:13         ` Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 2/2] zswap: add sysfs knob for same node mode Nhat Pham
2025-03-29 19:53 ` [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Yosry Ahmed
2025-03-29 22:13   ` Nhat Pham
2025-03-29 22:17     ` Nhat Pham
2025-03-31 16:53   ` Johannes Weiner
2025-03-31 17:32     ` Nhat Pham
2025-03-31 17:06   ` Gregory Price [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z-rLg7xSgu62qCfs@gourry-fedora-PF4VCD3F \
    --to=gourry@gourry.net \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=dan.j.williams@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=nphamcs@gmail.com \
    --cc=senozhatsky@chromium.org \
    --cc=sj@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox