linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: hannes@cmpxchg.org, cerasuolodomenico@gmail.com,
	yosryahmed@google.com,  sjenning@redhat.com, ddstreet@ieee.org,
	vitaly.wool@konsulko.com,  mhocko@kernel.org,
	roman.gushchin@linux.dev, shakeelb@google.com,
	 muchun.song@linux.dev, chrisl@kernel.org, linux-mm@kvack.org,
	 kernel-team@meta.com, linux-kernel@vger.kernel.org,
	cgroups@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-kselftest@vger.kernel.org, shuah@kernel.org
Subject: Re: [PATCH v6 6/6] zswap: shrinks zswap pool based on memory pressure
Date: Tue, 28 Nov 2023 15:04:03 -0800	[thread overview]
Message-ID: <CAKEwX=PCqwUCb+Gm8CYLNkVE2s5KJ-OQq4y3hx+xr=d3y3_RTQ@mail.gmail.com> (raw)
In-Reply-To: <20231127130055.30c455906d912e09dcb7e79b@linux-foundation.org>

On Mon, Nov 27, 2023 at 1:00 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 27 Nov 2023 11:37:03 -0800 Nhat Pham <nphamcs@gmail.com> wrote:
>
> > Currently, we only shrink the zswap pool when the user-defined limit is
> > hit. This means that if we set the limit too high, cold data that are
> > unlikely to be used again will reside in the pool, wasting precious
> > memory. It is hard to predict how much zswap space will be needed ahead
> > of time, as this depends on the workload (specifically, on factors such
> > as memory access patterns and compressibility of the memory pages).
> >
> > This patch implements a memcg- and NUMA-aware shrinker for zswap, that
> > is initiated when there is memory pressure. The shrinker does not
> > have any parameter that must be tuned by the user, and can be opted in
> > or out on a per-memcg basis.
> >
> > Furthermore, to make it more robust for many workloads and prevent
> > overshrinking (i.e evicting warm pages that might be refaulted into
> > memory), we build in the following heuristics:
> >
> > * Estimate the number of warm pages residing in zswap, and attempt to
> >   protect this region of the zswap LRU.
> > * Scale the number of freeable objects by an estimate of the memory
> >   saving factor. The better zswap compresses the data, the fewer pages
> >   we will evict to swap (as we will otherwise incur IO for relatively
> >   small memory saving).
> > * During reclaim, if the shrinker encounters a page that is also being
> >   brought into memory, the shrinker will cautiously terminate its
> >   shrinking action, as this is a sign that it is touching the warmer
> >   region of the zswap LRU.
> >
> > As a proof of concept, we ran the following synthetic benchmark:
> > build the linux kernel in a memory-limited cgroup, and allocate some
> > cold data in tmpfs to see if the shrinker could write them out and
> > improved the overall performance. Depending on the amount of cold data
> > generated, we observe from 14% to 35% reduction in kernel CPU time used
> > in the kernel builds.
> >
> > ...
> >
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -22,6 +22,7 @@
> >  #include <linux/mm_types.h>
> >  #include <linux/page-flags.h>
> >  #include <linux/local_lock.h>
> > +#include <linux/zswap.h>
> >  #include <asm/page.h>
> >
> >  /* Free memory management - zoned buddy allocator.  */
> > @@ -641,6 +642,7 @@ struct lruvec {
> >  #ifdef CONFIG_MEMCG
> >       struct pglist_data *pgdat;
> >  #endif
> > +     struct zswap_lruvec_state zswap_lruvec_state;
>
> Normally we'd put this in #ifdef CONFIG_ZSWAP.
>
> > --- a/include/linux/zswap.h
> > +++ b/include/linux/zswap.h
> > @@ -5,20 +5,40 @@
> >  #include <linux/types.h>
> >  #include <linux/mm_types.h>
> >
> > +struct lruvec;
> > +
> >  extern u64 zswap_pool_total_size;
> >  extern atomic_t zswap_stored_pages;
> >
> >  #ifdef CONFIG_ZSWAP
> >
> > +struct zswap_lruvec_state {
> > +     /*
> > +      * Number of pages in zswap that should be protected from the shrinker.
> > +      * This number is an estimate of the following counts:
> > +      *
> > +      * a) Recent page faults.
> > +      * b) Recent insertion to the zswap LRU. This includes new zswap stores,
> > +      *    as well as recent zswap LRU rotations.
> > +      *
> > +      * These pages are likely to be warm, and might incur IO if the are written
> > +      * to swap.
> > +      */
> > +     atomic_long_t nr_zswap_protected;
> > +};
> > +
> >  bool zswap_store(struct folio *folio);
> >  bool zswap_load(struct folio *folio);
> >  void zswap_invalidate(int type, pgoff_t offset);
> >  void zswap_swapon(int type);
> >  void zswap_swapoff(int type);
> >  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
> > -
> > +void zswap_lruvec_state_init(struct lruvec *lruvec);
> > +void zswap_lruvec_swapin(struct page *page);
> >  #else
> >
> > +struct zswap_lruvec_state {};
>
> But instead you made it an empty struct in this case.
>
> That's a bit funky, but I guess OK.  It does send a careful reader of
> struct lruvec over to look at the zswap_lruvec_state definition to
> understand what's going on.

I agree.

Originally, I included the fields in struct lruvec itself, guarded by
a #ifdef CONFIG_ZSWAP as you pointed out here. Yosry gave me this
suggestion to hide the zswap-specific states and details from ordinary
lruvec user, and direct people who care and/or need to know about
details towards the zswap codebase, which is good IMHO.

It is a bit weird I admit, but in this case I think it works out.

>
> >  static inline bool zswap_store(struct folio *folio)
> >  {
> >       return false;
> > @@ -33,7 +53,8 @@ static inline void zswap_invalidate(int type, pgoff_t offset) {}
> >  static inline void zswap_swapon(int type) {}
> >  static inline void zswap_swapoff(int type) {}
> >  static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
> > -
> > +static inline void zswap_lruvec_init(struct lruvec *lruvec) {}
> > +static inline void zswap_lruvec_swapin(struct page *page) {}
>
> Needed this build fix:
>
> --- a/include/linux/zswap.h~zswap-shrinks-zswap-pool-based-on-memory-pressure-fix
> +++ a/include/linux/zswap.h
> @@ -54,6 +54,7 @@ static inline void zswap_swapon(int type
>  static inline void zswap_swapoff(int type) {}
>  static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
>  static inline void zswap_lruvec_init(struct lruvec *lruvec) {}
> +static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
>  static inline void zswap_lruvec_swapin(struct page *page) {}
>  #endif
>
> _
>

Yeah that looks like a typo on my part. My bad. v7 includes this fix.


      reply	other threads:[~2023-11-28 23:04 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-27 19:36 [PATCH v6 0/6] workload-specific and memory pressure-driven zswap writeback Nhat Pham
2023-11-27 19:36 ` [PATCH v6 1/6] list_lru: allows explicit memcg and NUMA node selection Nhat Pham
2023-11-27 22:46   ` [PATCH v6 1/6] list_lru: allows explicit memcg and NUMA node selection (fix) Nhat Pham
2023-11-27 23:40     ` Nhat Pham
2023-11-27 19:36 ` [PATCH v6 2/6] memcontrol: allows mem_cgroup_iter() to check for onlineness Nhat Pham
2023-11-27 21:42   ` Andrew Morton
2023-11-27 22:34     ` Nhat Pham
2023-11-28  9:38   ` Michal Hocko
2023-11-28 16:53     ` Nhat Pham
2023-11-28 16:58       ` Nhat Pham
2023-11-29  9:18       ` Michal Hocko
2023-11-30  0:47         ` Nhat Pham
2023-11-27 19:37 ` [PATCH v6 3/6] zswap: make shrinking memcg-aware Nhat Pham
2023-11-27 19:37 ` [PATCH v6 4/6] mm: memcg: add per-memcg zswap writeback stat Nhat Pham
2023-11-27 19:37 ` [PATCH v6 5/6] selftests: cgroup: update per-memcg zswap writeback selftest Nhat Pham
2023-11-27 19:37 ` [PATCH v6 6/6] zswap: shrinks zswap pool based on memory pressure Nhat Pham
2023-11-27 21:00   ` Andrew Morton
2023-11-28 23:04     ` Nhat Pham [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=PCqwUCb+Gm8CYLNkVE2s5KJ-OQq4y3hx+xr=d3y3_RTQ@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chrisl@kernel.org \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@meta.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox