linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Davydov <vdavydov@virtuozzo.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 7/8] mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
Date: Wed, 9 Dec 2015 14:30:38 +0300	[thread overview]
Message-ID: <20151209113037.GS11488@esperanza> (raw)
In-Reply-To: <1449599665-18047-8-git-send-email-hannes@cmpxchg.org>

On Tue, Dec 08, 2015 at 01:34:24PM -0500, Johannes Weiner wrote:
> The original cgroup memory controller has an extension to account slab
> memory (and other "kernel memory" consumers) in a separate "kmem"
> counter, once the user set an explicit limit on that "kmem" pool.
> 
> However, this includes various consumers whose sizes are directly
> linked to userspace activity. Accounting them as an optional "kmem"
> extension is problematic for several reasons:
> 
> 1. It leaves the main memory interface with incomplete semantics. A
>    user who puts their workload into a cgroup and configures a memory
>    limit does not expect us to leave holes in the containment as big
>    as the dentry and inode cache, or the kernel stack pages.
> 
> 2. If the limit set on this random historical subgroup of consumers is
>    reached, subsequent allocations will fail even when the main memory
>    pool available to the cgroup is not yet exhausted and/or has
>    reclaimable memory in it.
> 
> 3. Calling it 'kernel memory' is misleading. The dentry and inode
>    caches are no more 'kernel' (or no less 'user') memory than the
>    page cache itself. Treating these consumers as different classes is
>    a historical implementation detail that should not leak to users.
> 
> So, in addition to page cache, anonymous memory, and network socket
> memory, account the following memory consumers per default in the
> cgroup2 memory controller:
> 
>      - threadinfo
>      - task_struct
>      - task_delay_info
>      - pid
>      - cred
>      - mm_struct
>      - vm_area_struct and vm_region (nommu)
>      - anon_vma and anon_vma_chain
>      - signal_struct
>      - sighand_struct
>      - fs_struct
>      - files_struct
>      - fdtable and fdtable->full_fds_bits
>      - dentry and external_name
>      - inode for all filesystems.
> 
> This should give us reasonable memory isolation for most common
> workloads out of the box.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

The patch looks good to me, but I think we still need to add a boot-time
knob to disable kmem accounting, as we do for sockets:

From: Vladimir Davydov <vdavydov@virtuozzo.com>
Subject: [PATCH] mm: memcontrol: allow to disable kmem accounting for cgroup2

Kmem accounting might incur overhead that some users can't put up with.
Besides, the implementation is still considered unstable. So let's
provide a way to disable it for those users who aren't happy with it.

To disable kmem accounting for cgroup2, pass cgroup.memory=nokmem at
boot time.

Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index c1bda3bbb7db..1b7a85dc6013 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -602,6 +602,7 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
 	cgroup.memory=	[KNL] Pass options to the cgroup memory controller.
 			Format: <string>
 			nosocket -- Disable socket memory accounting.
+			nokmem -- Disable kernel memory accounting.
 
 	checkreqprot	[SELINUX] Set initial checkreqprot flag value.
 			Format: { "0" | "1" }
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6faea81e66d7..6a5572241dc6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -83,6 +83,9 @@ struct mem_cgroup *root_mem_cgroup __read_mostly;
 /* Socket memory accounting disabled? */
 static bool cgroup_memory_nosocket;
 
+/* Kernel memory accounting disabled? */
+static bool cgroup_memory_nokmem;
+
 /* Whether the swap controller is active */
 #ifdef CONFIG_MEMCG_SWAP
 int do_swap_account __read_mostly;
@@ -2898,8 +2901,8 @@ static int memcg_propagate_kmem(struct mem_cgroup *memcg)
 	 * onlined after this point, because it has at least one child
 	 * already.
 	 */
-	if (cgroup_subsys_on_dfl(memory_cgrp_subsys) ||
-	    memcg_kmem_online(parent))
+	if (memcg_kmem_online(parent) ||
+	    (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nokmem))
 		ret = memcg_online_kmem(memcg);
 	mutex_unlock(&memcg_limit_mutex);
 	return ret;
@@ -5587,6 +5590,8 @@ static int __init cgroup_memory(char *s)
 			continue;
 		if (!strcmp(token, "nosocket"))
 			cgroup_memory_nosocket = true;
+		if (!strcmp(token, "nokmem"))
+			cgroup_memory_nokmem = true;
 	}
 	return 0;
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2015-12-09 11:30 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-08 18:34 [PATCH 0/8] mm: memcontrol: account "kmem" in cgroup2 Johannes Weiner
2015-12-08 18:34 ` [PATCH 1/8] mm: memcontrol: drop unused @css argument in memcg_init_kmem Johannes Weiner
2015-12-09  9:01   ` Vladimir Davydov
2015-12-10 12:37   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 2/8] mm: memcontrol: remove double kmem page_counter init Johannes Weiner
2015-12-09  9:05   ` Vladimir Davydov
2015-12-10 12:40   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 3/8] mm: memcontrol: give the kmem states more descriptive names Johannes Weiner
2015-12-09  9:10   ` Vladimir Davydov
2015-12-10 12:47   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 4/8] mm: memcontrol: group kmem init and exit functions together Johannes Weiner
2015-12-09  9:14   ` Vladimir Davydov
2015-12-10 12:56   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 5/8] mm: memcontrol: separate kmem code from legacy tcp accounting code Johannes Weiner
2015-12-09  9:23   ` Vladimir Davydov
2015-12-10 12:59   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 6/8] mm: memcontrol: move kmem accounting code to CONFIG_MEMCG Johannes Weiner
2015-12-09  9:32   ` Vladimir Davydov
2015-12-10 13:17   ` Michal Hocko
2015-12-10 14:00     ` Johannes Weiner
2015-12-10 20:22   ` [PATCH 6/8 v2] " Johannes Weiner
2015-12-10 20:50     ` Johannes Weiner
2015-12-08 18:34 ` [PATCH 7/8] mm: memcontrol: account "kmem" consumers in cgroup2 memory controller Johannes Weiner
2015-12-09 11:30   ` Vladimir Davydov [this message]
2015-12-09 14:32     ` Johannes Weiner
2015-12-10 13:28     ` Michal Hocko
2015-12-10 15:16       ` Johannes Weiner
2015-12-10 16:25         ` Michal Hocko
2015-12-10 14:21   ` Michal Hocko
2015-12-08 18:34 ` [PATCH 8/8] mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM Johannes Weiner
2015-12-09 11:31   ` Vladimir Davydov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151209113037.GS11488@esperanza \
    --to=vdavydov@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox