From: Shakeel Butt <shakeelb@google.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
mm-commits@vger.kernel.org, Tejun Heo <tj@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <guro@fb.com>, Dennis Zhou <dennis@kernel.org>,
Chris Down <chris@chrisdown.name>,
cgroups mailinglist <cgroups@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>
Subject: Re: + mm-consider-subtrees-in-memoryevents.patch added to -mm tree
Date: Fri, 17 May 2019 06:00:11 -0700 [thread overview]
Message-ID: <CALvZod6c9OCy9p79hTgByjn+_BmnQ6p216kD9dgEhCSNFzpeKw@mail.gmail.com> (raw)
In-Reply-To: <20190213124729.GI4525@dhcp22.suse.cz>
On Wed, Feb 13, 2019 at 4:47 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 12-02-19 14:45:42, Andrew Morton wrote:
> [...]
> > From: Chris Down <chris@chrisdown.name>
> > Subject: mm, memcg: consider subtrees in memory.events
> >
> > memory.stat and other files already consider subtrees in their output, and
> > we should too in order to not present an inconsistent interface.
> >
> > The current situation is fairly confusing, because people interacting with
> > cgroups expect hierarchical behaviour in the vein of memory.stat,
> > cgroup.events, and other files. For example, this causes confusion when
> > debugging reclaim events under low, as currently these always read "0" at
> > non-leaf memcg nodes, which frequently causes people to misdiagnose breach
> > behaviour. The same confusion applies to other counters in this file when
> > debugging issues.
> >
> > Aggregation is done at write time instead of at read-time since these
> > counters aren't hot (unlike memory.stat which is per-page, so it does it
> > at read time), and it makes sense to bundle this with the file
I think the above statement (memory.stat read-time aggregation) need
to be fixed due to the latest changes.
> > notifications.
> >
> > After this patch, events are propagated up the hierarchy:
> >
> > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
> > low 0
> > high 0
> > max 0
> > oom 0
> > oom_kill 0
> > [root@ktst ~]# systemd-run -p MemoryMax=1 true
> > Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
> > [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
> > low 0
> > high 0
> > max 7
> > oom 1
> > oom_kill 1
> >
> > As this is a change in behaviour, this can be reverted to the old
> > behaviour by mounting with the `memory_localevents' flag set. However, we
> > use the new behaviour by default as there's a lack of evidence that there
> > are any current users of memory.events that would find this change
> > undesirable.
> >
> > Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
> > Signed-off-by: Chris Down <chris@chrisdown.name>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
However can we please have memory.events.local merged along with this one?
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Tejun Heo <tj@kernel.org>
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: Dennis Zhou <dennis@kernel.org>
> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> FTR: As I've already said here [1] I can live with this change as long
> as there is a larger consensus among cgroup v2 users. So let's give this
> some more time before merging to see whether there is such a consensus.
>
> [1] http://lkml.kernel.org/r/20190201102515.GK11599@dhcp22.suse.cz
>
> > ---
> >
> > Documentation/admin-guide/cgroup-v2.rst | 9 +++++++++
> > include/linux/cgroup-defs.h | 5 +++++
> > include/linux/memcontrol.h | 10 ++++++++--
> > kernel/cgroup/cgroup.c | 16 ++++++++++++++--
> > 4 files changed, 36 insertions(+), 4 deletions(-)
> >
> > --- a/Documentation/admin-guide/cgroup-v2.rst~mm-consider-subtrees-in-memoryevents
> > +++ a/Documentation/admin-guide/cgroup-v2.rst
> > @@ -177,6 +177,15 @@ cgroup v2 currently supports the followi
> > ignored on non-init namespace mounts. Please refer to the
> > Delegation section for details.
> >
> > + memory_localevents
> > +
> > + Only populate memory.events with data for the current cgroup,
> > + and not any subtrees. This is legacy behaviour, the default
> > + behaviour without this option is to include subtree counts.
> > + This option is system wide and can only be set on mount or
> > + modified through remount from the init namespace. The mount
> > + option is ignored on non-init namespace mounts.
> > +
> >
> > Organizing Processes and Threads
> > --------------------------------
> > --- a/include/linux/cgroup-defs.h~mm-consider-subtrees-in-memoryevents
> > +++ a/include/linux/cgroup-defs.h
> > @@ -83,6 +83,11 @@ enum {
> > * Enable cpuset controller in v1 cgroup to use v2 behavior.
> > */
> > CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
> > +
> > + /*
> > + * Enable legacy local memory.events.
> > + */
> > + CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5),
> > };
> >
> > /* cftype->flags */
> > --- a/include/linux/memcontrol.h~mm-consider-subtrees-in-memoryevents
> > +++ a/include/linux/memcontrol.h
> > @@ -789,8 +789,14 @@ static inline void count_memcg_event_mm(
> > static inline void memcg_memory_event(struct mem_cgroup *memcg,
> > enum memcg_memory_event event)
> > {
> > - atomic_long_inc(&memcg->memory_events[event]);
> > - cgroup_file_notify(&memcg->events_file);
> > + do {
> > + atomic_long_inc(&memcg->memory_events[event]);
> > + cgroup_file_notify(&memcg->events_file);
> > +
> > + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > + break;
> > + } while ((memcg = parent_mem_cgroup(memcg)) &&
> > + !mem_cgroup_is_root(memcg));
> > }
> >
> > static inline void memcg_memory_event_mm(struct mm_struct *mm,
> > --- a/kernel/cgroup/cgroup.c~mm-consider-subtrees-in-memoryevents
> > +++ a/kernel/cgroup/cgroup.c
> > @@ -1775,11 +1775,13 @@ int cgroup_show_path(struct seq_file *sf
> >
> > enum cgroup2_param {
> > Opt_nsdelegate,
> > + Opt_memory_localevents,
> > nr__cgroup2_params
> > };
> >
> > static const struct fs_parameter_spec cgroup2_param_specs[] = {
> > - fsparam_flag ("nsdelegate", Opt_nsdelegate),
> > + fsparam_flag("nsdelegate", Opt_nsdelegate),
> > + fsparam_flag("memory_localevents", Opt_memory_localevents),
> > {}
> > };
> >
> > @@ -1802,6 +1804,9 @@ static int cgroup2_parse_param(struct fs
> > case Opt_nsdelegate:
> > ctx->flags |= CGRP_ROOT_NS_DELEGATE;
> > return 0;
> > + case Opt_memory_localevents:
> > + ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> > + return 0;
> > }
> > return -EINVAL;
> > }
> > @@ -1813,6 +1818,11 @@ static void apply_cgroup_root_flags(unsi
> > cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
> > else
> > cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
> > +
> > + if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > + cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> > + else
> > + cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> > }
> > }
> >
> > @@ -1820,6 +1830,8 @@ static int cgroup_show_options(struct se
> > {
> > if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
> > seq_puts(seq, ",nsdelegate");
> > + if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> > + seq_puts(seq, ",memory_localevents");
> > return 0;
> > }
> >
> > @@ -6207,7 +6219,7 @@ static struct kobj_attribute cgroup_dele
> > static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
> > char *buf)
> > {
> > - return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
> > + return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n");
> > }
> > static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
> >
> > _
> >
> > Patches currently in -mm which might be from chris@chrisdown.name are
> >
> > mm-create-mem_cgroup_from_seq.patch
> > mm-extract-memcg-maxable-seq_file-logic-to-seq_show_memcg_tunable.patch
> > mm-proportional-memorylowmin-reclaim.patch
> > mm-proportional-memorylowmin-reclaim-fix.patch
> > mm-memcontrol-expose-thp-events-on-a-per-memcg-basis.patch
> > mm-memcontrol-expose-thp-events-on-a-per-memcg-basis-fix-2.patch
> > mm-make-memoryemin-the-baseline-for-utilisation-determination.patch
> > mm-rename-ambiguously-named-memorystat-counters-and-functions.patch
> > mm-consider-subtrees-in-memoryevents.patch
>
> --
> Michal Hocko
> SUSE Labs
next prev parent reply other threads:[~2019-05-17 13:00 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20190212224542.ZW63a%akpm@linux-foundation.org>
2019-02-13 12:47 ` Michal Hocko
2019-05-16 17:56 ` Johannes Weiner
2019-05-16 18:10 ` Michal Hocko
2019-05-16 19:39 ` Johannes Weiner
2019-05-17 12:33 ` Michal Hocko
2019-05-17 13:00 ` Shakeel Butt
2019-05-22 5:30 ` Suren Baghdasaryan
2019-05-18 1:33 ` Johannes Weiner
2019-05-22 2:23 ` Andrew Morton
2019-05-22 15:44 ` Johannes Weiner
2019-05-17 13:00 ` Shakeel Butt [this message]
2019-05-17 19:04 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CALvZod6c9OCy9p79hTgByjn+_BmnQ6p216kD9dgEhCSNFzpeKw@mail.gmail.com \
--to=shakeelb@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=chris@chrisdown.name \
--cc=dennis@kernel.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mm-commits@vger.kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox