linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-mm@kvack.org, akpm@linux-foundation.org,
	hannes@cmpxchg.org,  yosry.ahmed@linux.dev,
	chengming.zhou@linux.dev, sj@kernel.org,  kernel-team@meta.com,
	linux-kernel@vger.kernel.org, gourry@gourry.net,
	 willy@infradead.org, ying.huang@linux.alibaba.com,
	 jonathan.cameron@huawei.com, linux-cxl@vger.kernel.org,
	minchan@kernel.org,  senozhatsky@chromium.org
Subject: Re: [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store the compressed objects
Date: Mon, 31 Mar 2025 16:03:09 -0700	[thread overview]
Message-ID: <CAKEwX=OEQKdoWbyAO=LKE--ssLzBH0UVhz3EYaCiodpbMtQvKw@mail.gmail.com> (raw)
In-Reply-To: <67eb148e1f818_7baf294b9@dwillia2-mobl3.amr.corp.intel.com.notmuch>

On Mon, Mar 31, 2025 at 3:18 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> Nhat Pham wrote:
> > Curerntly, zsmalloc does not specify any memory policy when it allocates
> > memory for the compressed objects.
> >
> > Let users select the NUMA node for the memory allocation, through the
> > zpool-based API. Direct callers (i.e zram) should not observe any
> > behavioral change.
> >
> > Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> > ---
> >  include/linux/zpool.h |  4 ++--
> >  mm/zpool.c            |  8 +++++---
> >  mm/zsmalloc.c         | 28 +++++++++++++++++++++-------
> >  mm/zswap.c            |  2 +-
> >  4 files changed, 29 insertions(+), 13 deletions(-)
> >
> > diff --git a/include/linux/zpool.h b/include/linux/zpool.h
> > index 52f30e526607..0df8722e13d7 100644
> > --- a/include/linux/zpool.h
> > +++ b/include/linux/zpool.h
> > @@ -22,7 +22,7 @@ const char *zpool_get_type(struct zpool *pool);
> >  void zpool_destroy_pool(struct zpool *pool);
> >
> >  int zpool_malloc(struct zpool *pool, size_t size, gfp_t gfp,
> > -                     unsigned long *handle);
> > +                     unsigned long *handle, int *nid);
>
> I agree with Johannes about the policy knob, so I'll just comment on the
> implementation.
>
> Why not just pass a "const int" for @nid, and use "NUMA_NO_NODE" for the
> "default" case. alloc_pages_node_noprof() is already prepared for a
> NUMA_NO_NODE argument.

Gregory and Johannes gave me that suggestion too! However, my
understanding was that alloc_pages_node(NUMA_NO_NODE, ...) !=
alloc_page(...), and I was trying to preserve the latter since it is
the "original behavior" (primarily for !same_node_mode, but also for
zram, which I tried not to change in this patch).

Please correct me if I'm mistaken, but IIUC:

1. alloc_pages_node(NUMA_NO_NODE, ...) would go to the local/closest node:

/*
 * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE,
 * prefer the current CPU's closest node. Otherwise node must be valid and
 * online.
 */
static inline struct page *alloc_pages_node_noprof(int nid, gfp_t gfp_mask,
   unsigned int order)
{
    if (nid == NUMA_NO_NODE)
        nid = numa_mem_id();


2. whereas, alloc_page(...) (i.e the "original" behavior) would
actually adopt the allocation policy of the task entering reclaim, by
calling:

struct page *alloc_frozen_pages_noprof(gfp_t gfp, unsigned order)
{
    struct mempolicy *pol = &default_policy;

    /*
    * No reference counting needed for current->mempolicy
    * nor system default_policy
    */
    if (!in_interrupt() && !(gfp & __GFP_THISNODE))
        pol = get_task_policy(current);

Now, I think the "original" behavior is dumb/broken, and should be
changed altogether. We should probably always pass the page's node id.
On the zswap side, in the next version I'll remove same_node_mode
sysfs knob and always pass the page's node id to zsmalloc and the page
allocator. That will clean up the zpool path per your (and Johannes'
and Gregory's) suggestion.

That still leaves zram though. zram is more complicated than zswap -
it has multiple allocation paths, so I don't want to touch it quite
yet (and preferably a zram maintainer/developer should do it). :) Or
if zram maintainers are happy with NUMA_NO_NODE, then we can
completely get rid of the pointer arguments etc.


  reply	other threads:[~2025-03-31 23:03 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-29 11:02 [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 1/2] zsmalloc: let callers select NUMA node to store the compressed objects Nhat Pham
2025-03-31 22:17   ` Dan Williams
2025-03-31 23:03     ` Nhat Pham [this message]
2025-03-31 23:22       ` Dan Williams
2025-04-01  1:13         ` Nhat Pham
2025-03-29 11:02 ` [RFC PATCH 2/2] zswap: add sysfs knob for same node mode Nhat Pham
2025-03-29 19:53 ` [RFC PATCH 0/2] zswap: fix placement inversion in memory tiering systems Yosry Ahmed
2025-03-29 22:13   ` Nhat Pham
2025-03-29 22:17     ` Nhat Pham
2025-03-31 16:53   ` Johannes Weiner
2025-03-31 17:32     ` Nhat Pham
2025-03-31 17:06   ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=OEQKdoWbyAO=LKE--ssLzBH0UVhz3EYaCiodpbMtQvKw@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=chengming.zhou@linux.dev \
    --cc=dan.j.williams@intel.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=kernel-team@meta.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=sj@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yosry.ahmed@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox