linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nhat Pham <nphamcs@gmail.com>
To: Ivan Shapovalov <intelfx@intelfx.name>
Cc: linux-kernel@vger.kernel.org, "Mike Yuan" <me@yhndnzj.com>,
	"Tejun Heo" <tj@kernel.org>, "Zefan Li" <lizefan.x@bytedance.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Jonathan Corbet" <corbet@lwn.net>,
	"Yosry Ahmed" <yosryahmed@google.com>,
	"Chengming Zhou" <chengming.zhou@linux.dev>,
	"Michal Hocko" <mhocko@kernel.org>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"Shakeel Butt" <shakeel.butt@linux.dev>,
	"Muchun Song" <muchun.song@linux.dev>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Chris Li" <chrisl@kernel.org>,
	cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH] zswap: improve memory.zswap.writeback inheritance
Date: Thu, 26 Sep 2024 19:28:08 -0700	[thread overview]
Message-ID: <CAKEwX=O=Qu4LZt79==FztxFjgBu2+q7C6EDji-ZmW5Ga38_dSg@mail.gmail.com> (raw)
In-Reply-To: <20240926225531.700742-1-intelfx@intelfx.name>

On Thu, Sep 26, 2024 at 3:55 PM Ivan Shapovalov <intelfx@intelfx.name> wrote:
>
> Improve the inheritance behavior of the `memory.zswap.writeback` cgroup
> attribute introduced during the 6.11 cycle. Specifically, in 6.11 we
> walk the parent cgroups until we find a _disabled_ writeback, which does
> not allow the user to selectively enable zswap writeback while having it
> disabled by default.
>
> Instead, introduce a third value for the `memory.zswap.writeback` cgroup
> attribute meaning "follow the parent", and use it as the default value
> for all cgroups. Additionally, introduce a `zswap.writeback_disable`
> module parameter which is used as the "parent" of the root cgroup,
> thus making it the global default.
>
> Reads from `memory.zswap.writeback` cgroup attribute will be coerced to
> 0 or 1 to maintain backwards compatilibity. However, it is possible to
> write -1 to that attribute to make the cgroup follow the parent again.
>
> Fixes: e39925734909 ("mm/memcontrol: respect zswap.writeback setting from parent cg too")
> Fixes: 501a06fe8e4c ("zswap: memcontrol: implement zswap writeback disabling")
> Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 17 +++++++++++------
>  Documentation/admin-guide/mm/zswap.rst  |  9 ++++++++-
>  include/linux/memcontrol.h              |  3 ++-
>  include/linux/zswap.h                   |  6 ++++++
>  mm/memcontrol.c                         | 24 +++++++++++++++++-------
>  mm/zswap.c                              |  9 +++++++++
>  6 files changed, 53 insertions(+), 15 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 95c18bc17083..eea580490679 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1717,10 +1717,12 @@ The following nested keys are defined.
>         entries fault back in or are written out to disk.
>
>    memory.zswap.writeback
> -       A read-write single value file. The default value is "1".
> -       Note that this setting is hierarchical, i.e. the writeback would be
> -       implicitly disabled for child cgroups if the upper hierarchy
> -       does so.
> +       A read-write single value file. The default is to follow the parent
> +       cgroup configuration, and the root cgroup follows the global
> +       ``zswap.writeback_enabled`` module parameter (which is 1 by default).
> +       Thus, this setting is hierarchical, i.e. the writeback setting for
> +       a child cgroup can be implicitly controlled at runtime by changing
> +       any parent value or the global module parameter.
>
>         When this is set to 0, all swapping attempts to swapping devices
>         are disabled. This included both zswap writebacks, and swapping due
> @@ -1729,8 +1731,11 @@ The following nested keys are defined.
>         reclaim inefficiency after disabling writeback (because the same
>         pages might be rejected again and again).
>
> -       Note that this is subtly different from setting memory.swap.max to
> -       0, as it still allows for pages to be written to the zswap pool.
> +       Note that this is different from setting memory.swap.max to 0,
> +       as it still allows for pages to be written to the zswap pool.
> +
> +       This can also be set to -1, which would make the cgroup (and its
> +       future children) follow the parent/global value again.
>

API-design-wise, this seems a bit confusing... Using the value -1 to
indicate the cgroup should follow ancestor is not quite semantically
meaningful. What does "follow the parent" here mean? Reading your
code, it seems to be "find the first non negative from this memcg to
root and just use it", but it's neither intuitive, nor deducable from
the documentation.

There are also ill-defined/poorly-defined behavior. What if we set -1
to all cgroups? Should writeback be enabled or disabled?

The implementation also contradicts the "writeback setting for a child
cgroup can be implicitly controlled at runtime by chaging any parent
value or the global module parameter" part. What happens if we have
the following sequence of zswap.writeback values, from root to current
memcg:

root.memcg.zswap.writeback = 0, 1, -1, -1, -1 = cur_memcg.zswap.writeback

Should we disable or enable cur_memcg's zswap.writeback? It's disabled
at root, but the first non-negative ancestral value is 1, so this
cgroup will follow it (even though that ancestor itself has zswap
writeback disabled!!!)

I think we should separate the values itself from the decision to
check ancestors. The former should be indicated per-memcg, and the
latter decision should only come into play when the memcg itself does
prevent zswap.writeback. It might be less confusing to make the latter
decision globally - perhaps through a mount option, analogous to
memory_recursiveprot?


  parent reply	other threads:[~2024-09-27  2:28 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-26 22:55 Ivan Shapovalov
2024-09-26 23:12 ` Nhat Pham
2024-09-26 23:40   ` Ivan Shapovalov
2024-09-27  1:52     ` Nhat Pham
2024-09-27  2:28 ` Nhat Pham [this message]
2024-09-27 15:01   ` Michal Koutný
2024-09-27 16:40     ` Nhat Pham

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKEwX=O=Qu4LZt79==FztxFjgBu2+q7C6EDji-ZmW5Ga38_dSg@mail.gmail.com' \
    --to=nphamcs@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chrisl@kernel.org \
    --cc=corbet@lwn.net \
    --cc=hannes@cmpxchg.org \
    --cc=intelfx@intelfx.name \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=me@yhndnzj.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox