From: Chris Li <chrisl@kernel.org>
To: Nhat Pham <nphamcs@gmail.com>
Cc: akpm@linux-foundation.org, tj@kernel.org,
lizefan.x@bytedance.com, hannes@cmpxchg.org,
cerasuolodomenico@gmail.com, yosryahmed@google.com,
sjenning@redhat.com, ddstreet@ieee.org,
vitaly.wool@konsulko.com, mhocko@kernel.org,
roman.gushchin@linux.dev, shakeelb@google.com,
muchun.song@linux.dev, hughd@google.com, corbet@lwn.net,
konrad.wilk@oracle.com, senozhatsky@chromium.org,
rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com,
linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org,
david@ixit.cz
Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling
Date: Thu, 7 Dec 2023 16:19:25 -0800 [thread overview]
Message-ID: <CAF8kJuPEKWbr_1a-OzqrYKSPmuty==KhC2vbTPAmm9xcJHo4cg@mail.gmail.com> (raw)
In-Reply-To: <20231207192406.3809579-1-nphamcs@gmail.com>
Hi Nhat,
On Thu, Dec 7, 2023 at 11:24 AM Nhat Pham <nphamcs@gmail.com> wrote:
>
> During our experiment with zswap, we sometimes observe swap IOs due to
> occasional zswap store failures and writebacks-to-swap. These swapping
> IOs prevent many users who cannot tolerate swapping from adopting zswap
> to save memory and improve performance where possible.
>
> This patch adds the option to disable this behavior entirely: do not
> writeback to backing swapping device when a zswap store attempt fail,
> and do not write pages in the zswap pool back to the backing swap
> device (both when the pool is full, and when the new zswap shrinker is
> called).
>
> This new behavior can be opted-in/out on a per-cgroup basis via a new
> cgroup file. By default, writebacks to swap device is enabled, which is
> the previous behavior. Initially, writeback is enabled for the root
> cgroup, and a newly created cgroup will inherit the current setting of
> its parent.
>
> Note that this is subtly different from setting memory.swap.max to 0, as
> it still allows for pages to be stored in the zswap pool (which itself
> consumes swap space in its current form).
>
> This patch should be applied on top of the zswap shrinker series:
>
> https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@gmail.com/
>
> as it also disables the zswap shrinker, a major source of zswap
> writebacks.
I am wondering about the status of "memory.swap.tiers" proof of concept patch?
Are we still on board to have this two patch merge together somehow so
we can have
"memory.swap.tiers" == "all" and "memory.swap.tiers" == "zswap" cover the
memory.zswap.writeback == 1 and memory.zswap.writeback == 0 case?
Thanks
Chris
>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> Reviewed-by: Yosry Ahmed <yosryahmed@google.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 12 ++++++++
> Documentation/admin-guide/mm/zswap.rst | 6 ++++
> include/linux/memcontrol.h | 12 ++++++++
> include/linux/zswap.h | 6 ++++
> mm/memcontrol.c | 38 +++++++++++++++++++++++++
> mm/page_io.c | 6 ++++
> mm/shmem.c | 3 +-
> mm/zswap.c | 13 +++++++--
> 8 files changed, 92 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..2b4ac43efdc8 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1679,6 +1679,18 @@ PAGE_SIZE multiple when read back.
> limit, it will refuse to take any more stores before existing
> entries fault back in or are written out to disk.
>
> + memory.zswap.writeback
> + A read-write single value file. The default value is "1". The
> + initial value of the root cgroup is 1, and when a new cgroup is
> + created, it inherits the current value of its parent.
> +
> + When this is set to 0, all swapping attempts to swapping devices
> + are disabled. This included both zswap writebacks, and swapping due
> + to zswap store failure.
> +
> + Note that this is subtly different from setting memory.swap.max to
> + 0, as it still allows for pages to be written to the zswap pool.
> +
> memory.pressure
> A read-only nested-keyed file.
>
> diff --git a/Documentation/admin-guide/mm/zswap.rst b/Documentation/admin-guide/mm/zswap.rst
> index 62fc244ec702..cfa653130346 100644
> --- a/Documentation/admin-guide/mm/zswap.rst
> +++ b/Documentation/admin-guide/mm/zswap.rst
> @@ -153,6 +153,12 @@ attribute, e. g.::
>
> Setting this parameter to 100 will disable the hysteresis.
>
> +Some users cannot tolerate the swapping that comes with zswap store failures
> +and zswap writebacks. Swapping can be disabled entirely (without disabling
> +zswap itself) on a cgroup-basis as follows:
> +
> + echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback
> +
> When there is a sizable amount of cold memory residing in the zswap pool, it
> can be advantageous to proactively write these cold pages to swap and reclaim
> the memory for other use cases. By default, the zswap shrinker is disabled.
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 43b77363ab8e..5de775e6cdd9 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -219,6 +219,12 @@ struct mem_cgroup {
>
> #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
> unsigned long zswap_max;
> +
> + /*
> + * Prevent pages from this memcg from being written back from zswap to
> + * swap, and from being swapped out on zswap store failures.
> + */
> + bool zswap_writeback;
> #endif
>
> unsigned long soft_limit;
> @@ -1941,6 +1947,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg,
> bool obj_cgroup_may_zswap(struct obj_cgroup *objcg);
> void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size);
> void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size);
> +bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg);
> #else
> static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg)
> {
> @@ -1954,6 +1961,11 @@ static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg,
> size_t size)
> {
> }
> +static inline bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> +{
> + /* if zswap is disabled, do not block pages going to the swapping device */
> + return true;
> +}
> #endif
>
> #endif /* _LINUX_MEMCONTROL_H */
> diff --git a/include/linux/zswap.h b/include/linux/zswap.h
> index 08c240e16a01..a78ceaf3a65e 100644
> --- a/include/linux/zswap.h
> +++ b/include/linux/zswap.h
> @@ -35,6 +35,7 @@ void zswap_swapoff(int type);
> void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
> void zswap_lruvec_state_init(struct lruvec *lruvec);
> void zswap_page_swapin(struct page *page);
> +bool is_zswap_enabled(void);
> #else
>
> struct zswap_lruvec_state {};
> @@ -55,6 +56,11 @@ static inline void zswap_swapoff(int type) {}
> static inline void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg) {}
> static inline void zswap_lruvec_state_init(struct lruvec *lruvec) {}
> static inline void zswap_page_swapin(struct page *page) {}
> +
> +static inline bool is_zswap_enabled(void)
> +{
> + return false;
> +}
> #endif
>
> #endif /* _LINUX_ZSWAP_H */
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index d7bc47316acb..ae8c62c7aa53 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -5538,6 +5538,8 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
> WRITE_ONCE(memcg->soft_limit, PAGE_COUNTER_MAX);
> #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP)
> memcg->zswap_max = PAGE_COUNTER_MAX;
> + WRITE_ONCE(memcg->zswap_writeback,
> + !parent || READ_ONCE(parent->zswap_writeback));
> #endif
> page_counter_set_high(&memcg->swap, PAGE_COUNTER_MAX);
> if (parent) {
> @@ -8174,6 +8176,12 @@ void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size)
> rcu_read_unlock();
> }
>
> +bool mem_cgroup_zswap_writeback_enabled(struct mem_cgroup *memcg)
> +{
> + /* if zswap is disabled, do not block pages going to the swapping device */
> + return !is_zswap_enabled() || !memcg || READ_ONCE(memcg->zswap_writeback);
> +}
> +
> static u64 zswap_current_read(struct cgroup_subsys_state *css,
> struct cftype *cft)
> {
> @@ -8206,6 +8214,31 @@ static ssize_t zswap_max_write(struct kernfs_open_file *of,
> return nbytes;
> }
>
> +static int zswap_writeback_show(struct seq_file *m, void *v)
> +{
> + struct mem_cgroup *memcg = mem_cgroup_from_seq(m);
> +
> + seq_printf(m, "%d\n", READ_ONCE(memcg->zswap_writeback));
> + return 0;
> +}
> +
> +static ssize_t zswap_writeback_write(struct kernfs_open_file *of,
> + char *buf, size_t nbytes, loff_t off)
> +{
> + struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
> + int zswap_writeback;
> + ssize_t parse_ret = kstrtoint(strstrip(buf), 0, &zswap_writeback);
> +
> + if (parse_ret)
> + return parse_ret;
> +
> + if (zswap_writeback != 0 && zswap_writeback != 1)
> + return -EINVAL;
> +
> + WRITE_ONCE(memcg->zswap_writeback, zswap_writeback);
> + return nbytes;
> +}
> +
> static struct cftype zswap_files[] = {
> {
> .name = "zswap.current",
> @@ -8218,6 +8251,11 @@ static struct cftype zswap_files[] = {
> .seq_show = zswap_max_show,
> .write = zswap_max_write,
> },
> + {
> + .name = "zswap.writeback",
> + .seq_show = zswap_writeback_show,
> + .write = zswap_writeback_write,
> + },
> { } /* terminate */
> };
> #endif /* CONFIG_MEMCG_KMEM && CONFIG_ZSWAP */
> diff --git a/mm/page_io.c b/mm/page_io.c
> index cb559ae324c6..5e606f1aa2f6 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -201,6 +201,12 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
> folio_end_writeback(folio);
> return 0;
> }
> +
> + if (!mem_cgroup_zswap_writeback_enabled(folio_memcg(folio))) {
> + folio_mark_dirty(folio);
> + return AOP_WRITEPAGE_ACTIVATE;
> + }
> +
> __swap_writepage(&folio->page, wbc);
> return 0;
> }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index c62f904ba1ca..dd084fbafcf1 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1514,8 +1514,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>
> mutex_unlock(&shmem_swaplist_mutex);
> BUG_ON(folio_mapped(folio));
> - swap_writepage(&folio->page, wbc);
> - return 0;
> + return swap_writepage(&folio->page, wbc);
> }
>
> mutex_unlock(&shmem_swaplist_mutex);
> diff --git a/mm/zswap.c b/mm/zswap.c
> index daaa949837f2..7ee54a3d8281 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -153,6 +153,11 @@ static bool zswap_shrinker_enabled = IS_ENABLED(
> CONFIG_ZSWAP_SHRINKER_DEFAULT_ON);
> module_param_named(shrinker_enabled, zswap_shrinker_enabled, bool, 0644);
>
> +bool is_zswap_enabled(void)
> +{
> + return zswap_enabled;
> +}
> +
> /*********************************
> * data structures
> **********************************/
> @@ -596,7 +601,8 @@ static unsigned long zswap_shrinker_scan(struct shrinker *shrinker,
> struct zswap_pool *pool = shrinker->private_data;
> bool encountered_page_in_swapcache = false;
>
> - if (!zswap_shrinker_enabled) {
> + if (!zswap_shrinker_enabled ||
> + !mem_cgroup_zswap_writeback_enabled(sc->memcg)) {
> sc->nr_scanned = 0;
> return SHRINK_STOP;
> }
> @@ -637,7 +643,7 @@ static unsigned long zswap_shrinker_count(struct shrinker *shrinker,
> struct lruvec *lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(sc->nid));
> unsigned long nr_backing, nr_stored, nr_freeable, nr_protected;
>
> - if (!zswap_shrinker_enabled)
> + if (!zswap_shrinker_enabled || !mem_cgroup_zswap_writeback_enabled(memcg))
> return 0;
>
> #ifdef CONFIG_MEMCG_KMEM
> @@ -956,6 +962,9 @@ static int shrink_memcg(struct mem_cgroup *memcg)
> struct zswap_pool *pool;
> int nid, shrunk = 0;
>
> + if (!mem_cgroup_zswap_writeback_enabled(memcg))
> + return -EINVAL;
> +
> /*
> * Skip zombies because their LRUs are reparented and we would be
> * reclaiming from the parent instead of the dead memcg.
>
> base-commit: cdcab2d34f129f593c0afbb2493bcaf41f4acd61
> --
> 2.34.1
>
next prev parent reply other threads:[~2023-12-08 0:19 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-07 19:24 Nhat Pham
2023-12-07 19:26 ` Yosry Ahmed
2023-12-07 22:11 ` Andrew Morton
2023-12-08 0:42 ` Nhat Pham
2023-12-08 1:14 ` Nhat Pham
2023-12-08 19:58 ` Andrew Morton
2023-12-08 19:57 ` Andrew Morton
2023-12-08 0:19 ` Chris Li [this message]
2023-12-08 1:03 ` Nhat Pham
2023-12-08 1:12 ` Yosry Ahmed
2023-12-08 16:34 ` Johannes Weiner
2023-12-08 20:08 ` Yosry Ahmed
2023-12-09 2:02 ` Chris Li
2023-12-09 0:09 ` Chris Li
2023-12-08 23:55 ` Chris Li
2023-12-09 3:42 ` Johannes Weiner
2023-12-09 17:39 ` Chris Li
2023-12-11 22:55 ` Minchan Kim
2023-12-12 2:43 ` [External] " Zhongkun He
2023-12-12 23:57 ` Chris Li
2023-12-20 10:22 ` Kairui Song
2023-12-14 17:11 ` Johannes Weiner
2023-12-14 17:23 ` Yu Zhao
2023-12-14 18:00 ` Fabian Deutsch
2023-12-14 23:22 ` Chris Li
2023-12-15 7:42 ` Fabian Deutsch
2023-12-15 9:40 ` Chris Li
2023-12-15 9:50 ` Fabian Deutsch
2023-12-15 9:18 ` Fabian Deutsch
2023-12-14 18:03 ` Fabian Deutsch
2023-12-14 17:34 ` Christopher Li
2023-12-14 22:11 ` Johannes Weiner
2023-12-14 22:54 ` Chris Li
2023-12-15 2:19 ` Nhat Pham
2023-12-12 21:36 ` Nhat Pham
2023-12-13 0:29 ` Chris Li
2023-12-11 9:31 ` Kairui Song
2023-12-12 23:39 ` Chris Li
2023-12-20 10:21 ` Kairui Song
2023-12-15 21:21 ` Yosry Ahmed
2023-12-18 14:44 ` Johannes Weiner
2023-12-18 19:21 ` Nhat Pham
2023-12-18 21:54 ` Yosry Ahmed
2023-12-18 21:52 ` Yosry Ahmed
2023-12-20 5:15 ` Johannes Weiner
2023-12-20 8:59 ` Yosry Ahmed
2023-12-20 14:50 ` Johannes Weiner
2023-12-21 0:24 ` Yosry Ahmed
2023-12-21 0:50 ` Nhat Pham
2023-12-21 0:57 ` [PATCH v6] zswap: memcontrol: implement zswap writeback disabling (fix) Nhat Pham
2023-12-24 17:17 ` Chris Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAF8kJuPEKWbr_1a-OzqrYKSPmuty==KhC2vbTPAmm9xcJHo4cg@mail.gmail.com' \
--to=chrisl@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cerasuolodomenico@gmail.com \
--cc=corbet@lwn.net \
--cc=david@ixit.cz \
--cc=ddstreet@ieee.org \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kernel-team@meta.com \
--cc=konrad.wilk@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=nphamcs@gmail.com \
--cc=roman.gushchin@linux.dev \
--cc=rppt@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=shakeelb@google.com \
--cc=sjenning@redhat.com \
--cc=tj@kernel.org \
--cc=vitaly.wool@konsulko.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox