linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: akpm@linux-foundation.org
Cc: mm-commits@vger.kernel.org, tj@kernel.org, hannes@cmpxchg.org,
	guro@fb.com, dennis@kernel.org, chris@chrisdown.name,
	cgroups mailinglist <cgroups@vger.kernel.org>,
	linux-mm@kvack.org
Subject: Re: + mm-consider-subtrees-in-memoryevents.patch added to -mm tree
Date: Wed, 13 Feb 2019 13:47:29 +0100	[thread overview]
Message-ID: <20190213124729.GI4525@dhcp22.suse.cz> (raw)
In-Reply-To: <20190212224542.ZW63a%akpm@linux-foundation.org>

On Tue 12-02-19 14:45:42, Andrew Morton wrote:
[...]
> From: Chris Down <chris@chrisdown.name>
> Subject: mm, memcg: consider subtrees in memory.events
> 
> memory.stat and other files already consider subtrees in their output, and
> we should too in order to not present an inconsistent interface.
> 
> The current situation is fairly confusing, because people interacting with
> cgroups expect hierarchical behaviour in the vein of memory.stat,
> cgroup.events, and other files.  For example, this causes confusion when
> debugging reclaim events under low, as currently these always read "0" at
> non-leaf memcg nodes, which frequently causes people to misdiagnose breach
> behaviour.  The same confusion applies to other counters in this file when
> debugging issues.
> 
> Aggregation is done at write time instead of at read-time since these
> counters aren't hot (unlike memory.stat which is per-page, so it does it
> at read time), and it makes sense to bundle this with the file
> notifications.
> 
> After this patch, events are propagated up the hierarchy:
> 
>     [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
>     low 0
>     high 0
>     max 0
>     oom 0
>     oom_kill 0
>     [root@ktst ~]# systemd-run -p MemoryMax=1 true
>     Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
>     [root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
>     low 0
>     high 0
>     max 7
>     oom 1
>     oom_kill 1
> 
> As this is a change in behaviour, this can be reverted to the old
> behaviour by mounting with the `memory_localevents' flag set.  However, we
> use the new behaviour by default as there's a lack of evidence that there
> are any current users of memory.events that would find this change
> undesirable.
> 
> Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name
> Signed-off-by: Chris Down <chris@chrisdown.name>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Roman Gushchin <guro@fb.com>
> Cc: Dennis Zhou <dennis@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

FTR: As I've already said here [1] I can live with this change as long
as there is a larger consensus among cgroup v2 users. So let's give this
some more time before merging to see whether there is such a consensus.

[1] http://lkml.kernel.org/r/20190201102515.GK11599@dhcp22.suse.cz

> ---
> 
>  Documentation/admin-guide/cgroup-v2.rst |    9 +++++++++
>  include/linux/cgroup-defs.h             |    5 +++++
>  include/linux/memcontrol.h              |   10 ++++++++--
>  kernel/cgroup/cgroup.c                  |   16 ++++++++++++++--
>  4 files changed, 36 insertions(+), 4 deletions(-)
> 
> --- a/Documentation/admin-guide/cgroup-v2.rst~mm-consider-subtrees-in-memoryevents
> +++ a/Documentation/admin-guide/cgroup-v2.rst
> @@ -177,6 +177,15 @@ cgroup v2 currently supports the followi
>  	ignored on non-init namespace mounts.  Please refer to the
>  	Delegation section for details.
>  
> +  memory_localevents
> +
> +        Only populate memory.events with data for the current cgroup,
> +        and not any subtrees. This is legacy behaviour, the default
> +        behaviour without this option is to include subtree counts.
> +        This option is system wide and can only be set on mount or
> +        modified through remount from the init namespace. The mount
> +        option is ignored on non-init namespace mounts.
> +
>  
>  Organizing Processes and Threads
>  --------------------------------
> --- a/include/linux/cgroup-defs.h~mm-consider-subtrees-in-memoryevents
> +++ a/include/linux/cgroup-defs.h
> @@ -83,6 +83,11 @@ enum {
>  	 * Enable cpuset controller in v1 cgroup to use v2 behavior.
>  	 */
>  	CGRP_ROOT_CPUSET_V2_MODE = (1 << 4),
> +
> +	/*
> +	 * Enable legacy local memory.events.
> +	 */
> +	CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5),
>  };
>  
>  /* cftype->flags */
> --- a/include/linux/memcontrol.h~mm-consider-subtrees-in-memoryevents
> +++ a/include/linux/memcontrol.h
> @@ -789,8 +789,14 @@ static inline void count_memcg_event_mm(
>  static inline void memcg_memory_event(struct mem_cgroup *memcg,
>  				      enum memcg_memory_event event)
>  {
> -	atomic_long_inc(&memcg->memory_events[event]);
> -	cgroup_file_notify(&memcg->events_file);
> +	do {
> +		atomic_long_inc(&memcg->memory_events[event]);
> +		cgroup_file_notify(&memcg->events_file);
> +
> +		if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +			break;
> +	} while ((memcg = parent_mem_cgroup(memcg)) &&
> +		 !mem_cgroup_is_root(memcg));
>  }
>  
>  static inline void memcg_memory_event_mm(struct mm_struct *mm,
> --- a/kernel/cgroup/cgroup.c~mm-consider-subtrees-in-memoryevents
> +++ a/kernel/cgroup/cgroup.c
> @@ -1775,11 +1775,13 @@ int cgroup_show_path(struct seq_file *sf
>  
>  enum cgroup2_param {
>  	Opt_nsdelegate,
> +	Opt_memory_localevents,
>  	nr__cgroup2_params
>  };
>  
>  static const struct fs_parameter_spec cgroup2_param_specs[] = {
> -	fsparam_flag  ("nsdelegate",		Opt_nsdelegate),
> +	fsparam_flag("nsdelegate",		Opt_nsdelegate),
> +	fsparam_flag("memory_localevents",	Opt_memory_localevents),
>  	{}
>  };
>  
> @@ -1802,6 +1804,9 @@ static int cgroup2_parse_param(struct fs
>  	case Opt_nsdelegate:
>  		ctx->flags |= CGRP_ROOT_NS_DELEGATE;
>  		return 0;
> +	case Opt_memory_localevents:
> +		ctx->flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> +		return 0;
>  	}
>  	return -EINVAL;
>  }
> @@ -1813,6 +1818,11 @@ static void apply_cgroup_root_flags(unsi
>  			cgrp_dfl_root.flags |= CGRP_ROOT_NS_DELEGATE;
>  		else
>  			cgrp_dfl_root.flags &= ~CGRP_ROOT_NS_DELEGATE;
> +
> +		if (root_flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +			cgrp_dfl_root.flags |= CGRP_ROOT_MEMORY_LOCAL_EVENTS;
> +		else
> +			cgrp_dfl_root.flags &= ~CGRP_ROOT_MEMORY_LOCAL_EVENTS;
>  	}
>  }
>  
> @@ -1820,6 +1830,8 @@ static int cgroup_show_options(struct se
>  {
>  	if (cgrp_dfl_root.flags & CGRP_ROOT_NS_DELEGATE)
>  		seq_puts(seq, ",nsdelegate");
> +	if (cgrp_dfl_root.flags & CGRP_ROOT_MEMORY_LOCAL_EVENTS)
> +		seq_puts(seq, ",memory_localevents");
>  	return 0;
>  }
>  
> @@ -6207,7 +6219,7 @@ static struct kobj_attribute cgroup_dele
>  static ssize_t features_show(struct kobject *kobj, struct kobj_attribute *attr,
>  			     char *buf)
>  {
> -	return snprintf(buf, PAGE_SIZE, "nsdelegate\n");
> +	return snprintf(buf, PAGE_SIZE, "nsdelegate\nmemory_localevents\n");
>  }
>  static struct kobj_attribute cgroup_features_attr = __ATTR_RO(features);
>  
> _
> 
> Patches currently in -mm which might be from chris@chrisdown.name are
> 
> mm-create-mem_cgroup_from_seq.patch
> mm-extract-memcg-maxable-seq_file-logic-to-seq_show_memcg_tunable.patch
> mm-proportional-memorylowmin-reclaim.patch
> mm-proportional-memorylowmin-reclaim-fix.patch
> mm-memcontrol-expose-thp-events-on-a-per-memcg-basis.patch
> mm-memcontrol-expose-thp-events-on-a-per-memcg-basis-fix-2.patch
> mm-make-memoryemin-the-baseline-for-utilisation-determination.patch
> mm-rename-ambiguously-named-memorystat-counters-and-functions.patch
> mm-consider-subtrees-in-memoryevents.patch

-- 
Michal Hocko
SUSE Labs


       reply	other threads:[~2019-02-13 12:47 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20190212224542.ZW63a%akpm@linux-foundation.org>
2019-02-13 12:47 ` Michal Hocko [this message]
2019-05-16 17:56   ` Johannes Weiner
2019-05-16 18:10     ` Michal Hocko
2019-05-16 19:39       ` Johannes Weiner
2019-05-17 12:33         ` Michal Hocko
2019-05-17 13:00           ` Shakeel Butt
2019-05-22  5:30             ` Suren Baghdasaryan
2019-05-18  1:33           ` Johannes Weiner
2019-05-22  2:23             ` Andrew Morton
2019-05-22 15:44               ` Johannes Weiner
2019-05-17 13:00   ` Shakeel Butt
2019-05-17 19:04     ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190213124729.GI4525@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=dennis@kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox