linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Nhat Pham <nphamcs@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org,
	lizefan.x@bytedance.com,  hannes@cmpxchg.org,
	cerasuolodomenico@gmail.com, sjenning@redhat.com,
	 ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org,
	 roman.gushchin@linux.dev, shakeelb@google.com,
	muchun.song@linux.dev,  hughd@google.com, corbet@lwn.net,
	konrad.wilk@oracle.com,  senozhatsky@chromium.org,
	rppt@kernel.org, linux-mm@kvack.org,  kernel-team@meta.com,
	linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
	 david@ixit.cz
Subject: Re: [RFC PATCH v3] zswap: memcontrol: implement zswap writeback disabling
Date: Thu, 2 Nov 2023 18:13:04 -0700	[thread overview]
Message-ID: <CAJD7tkapfHVvF1KCm4iUDWGP0n5zV3GBcNAGDiYNHrpuODPPqQ@mail.gmail.com> (raw)
In-Reply-To: <20231102234236.1784543-1-nphamcs@gmail.com>

On Thu, Nov 2, 2023 at 4:42 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> During our experiment with zswap, we sometimes observe swap IOs due to
> occasional zswap store failures and writebacks-to-swap. These swapping
> IOs prevent many users who cannot tolerate swapping from adopting zswap
> to save memory and improve performance where possible.
>
> This patch adds the option to disable this behavior entirely: do not
> writeback to backing swapping device when a zswap store attempt fail,
> and do not write pages in the zswap pool back to the backing swap
> device (both when the pool is full, and when the new zswap shrinker is
> called).
>
> This new behavior can be opted-in/out on a per-cgroup basis via a new
> cgroup file. By default, writebacks to swap device is enabled, which is
> the previous behavior.
>
> Note that this is subtly different from setting memory.swap.max to 0, as
> it still allows for pages to be stored in the zswap pool (which itself
> consumes swap space in its current form).
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 11 +++++++
>  Documentation/admin-guide/mm/zswap.rst  |  6 ++++
>  include/linux/memcontrol.h              | 12 ++++++++
>  include/linux/zswap.h                   |  6 ++++
>  mm/memcontrol.c                         | 38 +++++++++++++++++++++++++
>  mm/page_io.c                            |  6 ++++
>  mm/shmem.c                              |  3 +-
>  mm/zswap.c                              | 14 +++++++++
>  8 files changed, 94 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 606b2e0eac4b..18c4171392ea 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1672,6 +1672,17 @@ PAGE_SIZE multiple when read back.
>         limit, it will refuse to take any more stores before existing
>         entries fault back in or are written out to disk.
>
> +  memory.zswap.writeback
> +       A read-write single value file which exists on non-root
> +       cgroups.  The default value is "1".
> +
> +       When this is set to 0, all swapping attempts to swapping devices
> +       are disabled. This included both zswap writebacks, and swapping due
> +       to zswap store failure.
> +
> +       Note that this is subtly different from setting memory.swap.max to
> +       0, as it still allows for pages to be written to the zswap pool.
> +
>    memory.pressure
>         A read-only nested-keyed file.
>
> diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
> index 522ae22ccb84..b987e58edb70 100644
> --- a/Documentation/admin-guide/mm/zswap.rst
> +++ b/Documentation/admin-guide/mm/zswap.rst
> @@ -153,6 +153,12 @@ attribute, e. g.::
>
>  Setting this parameter to 100 will disable the hysteresis.
>
> +Some users cannot tolerate the swapping that comes with zswap store failures
> +and zswap writebacks. Swapping can be disabled entirely (without disabling
> +zswap itself) on a cgroup-basis as follows:
> +
> +       echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback
> +
>  When there is a sizable amount of cold memory residing in the zswap pool, it
>  can be advantageous to proactively write these cold pages to swap and reclaim
>  the memory for other use cases. By default, the zswap shrinker is disabled.
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 95f6c9e60ed1..e51eafdf2a15 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -219,6 +219,12 @@ struct mem_cgroup {
>
>  #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
>         unsigned long zswap_max;
> +
> +       /*
> +        * Prevent pages from this memcg from being written back from zswap to
> +        * swap, and from being swapped out on zswap store failures.
> +        */
> +       bool zswap_writeback;
>  #endif
>
>         unsigned long soft_limit;
> @@ -1931,6 +1937,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,
>  bool obj_cgroup_may_zswap(struct obj_cgroup *objcg);
>  void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size);
>  void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size);
> +bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg);
>  #else
>  static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
>  {
> @@ -1944,6 +1951,11 @@ static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg,
>                                              size_t size)
>  {
>  }
> +static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> +{
> +       /* if zswap is disabled, do not block pages going to the swapping device */
> +       return true;
> +}
>  #endif
>
>  #endif /* _LINUX_MEMCONTROL_H */
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index cbd373ba88d2..b4997e27a74b 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -35,6 +35,7 @@ void zswap_swapoff(int type);
>  void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
>  void zswap_lruvec_state_init(struct lruvec *lruvec);
>  void zswap_lruvec_swapin(struct page *page);
> +bool is_zswap_enabled(void);
>  #else
>
>  struct zswap_lruvec_state {};
> @@ -55,6 +56,11 @@ static inline void zswap_swapoff(int type) {}
>  static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
>  static inline void zswap_lruvec_init(struct lruvec *lruvec) {}
>  static inline void zswap_lruvec_swapin(struct page *page) {}
> +
> +static inline bool is_zswap_enabled(void)
> +{
> +       return false;
> +}
>  #endif
>
>  #endif /* _LINUX_ZSWAP_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index e43b5aba8efc..8a6aadcc103c 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5545,6 +5545,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
>         WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX);
>  #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
>         memcg->zswap_max = PAGE_COUNTER_MAX;
> +       WRITE_ONCE(memcg->zswap_writeback, true);

Generally LGTM, just one question.

Would it be more convenient if the initial value is inherited from the
parent (the root starts with true)?

I can see this being useful if we want to set it to false on the
entire machine or one a parent cgroup, we can set it before creating
any children instead of setting it to 0 every time we create a new
cgroup.


  reply	other threads:[~2023-11-03  1:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-02 23:42 Nhat Pham
2023-11-03  1:13 ` Yosry Ahmed [this message]
2023-11-03 19:24   ` Nhat Pham
2023-11-09  2:41     ` Chris Li
2023-11-09  3:08       ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkapfHVvF1KCm4iUDWGP0n5zV3GBcNAGDiYNHrpuODPPqQ@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cerasuolodomenico@gmail.com \
    --cc=corbet@lwn.net \
    --cc=david@ixit.cz \
    --cc=ddstreet@ieee.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kernel-team@meta.com \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=nphamcs@gmail.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=senozhatsky@chromium.org \
    --cc=shakeelb@google.com \
    --cc=sjenning@redhat.com \
    --cc=tj@kernel.org \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox