From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f69.google.com (mail-wm0-f69.google.com [74.125.82.69]) by kanga.kvack.org (Postfix) with ESMTP id 0D31A6B0388 for ; Thu, 23 Feb 2017 10:31:10 -0500 (EST) Received: by mail-wm0-f69.google.com with SMTP id v77so1070274wmv.5 for ; Thu, 23 Feb 2017 07:31:10 -0800 (PST) Received: from mx2.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id 17si7352303wmu.159.2017.02.23.07.31.08 for (version=TLS1 cipher=AES128-SHA bits=128/128); Thu, 23 Feb 2017 07:31:08 -0800 (PST) Date: Thu, 23 Feb 2017 16:31:07 +0100 From: Michal Hocko Subject: Re: [PATCH v2 2/2] mm/cgroup: delay soft limit data allocation Message-ID: <20170223153107.GD29056@dhcp22.suse.cz> References: <1487856999-16581-1-git-send-email-ldufour@linux.vnet.ibm.com> <1487856999-16581-3-git-send-email-ldufour@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1487856999-16581-3-git-send-email-ldufour@linux.vnet.ibm.com> Sender: owner-linux-mm@kvack.org List-ID: To: Laurent Dufour Cc: Johannes Weiner , Vladimir Davydov , Balbir Singh , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org On Thu 23-02-17 14:36:39, Laurent Dufour wrote: > Until a soft limit is set to a cgroup, the soft limit data are useless > so delay this allocation when a limit is set. Hmm, I am still undecided whether this is actually worth it. On one hand distribution kernels tend to have quite large NUMA_SHIFT (e.g. SLES has NUMA_SHIFT=10 and then we will save 8kB+12kB which is not hell of a lot but always good if we can save that, especially for a rarely used feature. The code grown on the other hand (it was in __init section previously) which is a minus, on the other hand. What do you think Johannes? This would be a useful info in the changelog, btw. > Suggested-by: Michal Hocko > Signed-off-by: Laurent Dufour The patch looks good to me so feel free to add Reviewed-by: Michal Hocko > --- > mm/memcontrol.c | 67 ++++++++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 52 insertions(+), 15 deletions(-) > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index a9f10fde44a6..c639c898809d 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -142,7 +142,7 @@ struct mem_cgroup_tree { > struct mem_cgroup_tree_per_node *rb_tree_per_node[MAX_NUMNODES]; > }; > > -static struct mem_cgroup_tree soft_limit_tree __read_mostly; > +static struct mem_cgroup_tree *soft_limit_tree __read_mostly; > > /* for OOM */ > struct mem_cgroup_eventfd_list { > @@ -381,10 +381,52 @@ mem_cgroup_page_nodeinfo(struct mem_cgroup *memcg, struct page *page) > return memcg->nodeinfo[nid]; > } > > +static bool soft_limit_initialize(void) > +{ > + static DEFINE_MUTEX(soft_limit_mutex); > + struct mem_cgroup_tree *tree; > + bool ret = true; > + int node; > + > + mutex_lock(&soft_limit_mutex); > + if (soft_limit_tree) > + goto bail; > + > + tree = kmalloc(sizeof(*soft_limit_tree), GFP_KERNEL); > + if (!tree) { > + ret = false; > + goto bail; > + } > + for_each_node(node) { > + struct mem_cgroup_tree_per_node *rtpn; > + > + rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, > + node_online(node) ? node : NUMA_NO_NODE); > + if (!rtpn) > + goto cleanup; > + > + rtpn->rb_root = RB_ROOT; > + spin_lock_init(&rtpn->lock); > + tree->rb_tree_per_node[node] = rtpn; > + } > + WRITE_ONCE(soft_limit_tree, tree); > +bail: > + mutex_unlock(&soft_limit_mutex); > + return ret; > +cleanup: > + for_each_node(node) > + kfree(tree->rb_tree_per_node[node]); > + kfree(tree); > + ret = false; > + goto bail; > +} > + > static struct mem_cgroup_tree_per_node * > soft_limit_tree_node(int nid) > { > - return soft_limit_tree.rb_tree_per_node[nid]; > + if (!soft_limit_tree) > + return NULL; > + return soft_limit_tree->rb_tree_per_node[nid]; > } > > static struct mem_cgroup_tree_per_node * > @@ -392,7 +434,9 @@ soft_limit_tree_from_page(struct page *page) > { > int nid = page_to_nid(page); > > - return soft_limit_tree.rb_tree_per_node[nid]; > + if (!soft_limit_tree) > + return NULL; > + return soft_limit_tree->rb_tree_per_node[nid]; > } > > static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, > @@ -3003,6 +3047,10 @@ static ssize_t mem_cgroup_write(struct kernfs_open_file *of, > } > break; > case RES_SOFT_LIMIT: > + if (!soft_limit_initialize()) { > + ret = -ENOMEM; > + break; > + } > memcg->soft_limit = nr_pages; > ret = 0; > break; > @@ -5777,7 +5825,7 @@ __setup("cgroup.memory=", cgroup_memory); > */ > static int __init mem_cgroup_init(void) > { > - int cpu, node; > + int cpu; > > #ifndef CONFIG_SLOB > /* > @@ -5797,17 +5845,6 @@ static int __init mem_cgroup_init(void) > INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work, > drain_local_stock); > > - for_each_node(node) { > - struct mem_cgroup_tree_per_node *rtpn; > - > - rtpn = kzalloc_node(sizeof(*rtpn), GFP_KERNEL, > - node_online(node) ? node : NUMA_NO_NODE); > - > - rtpn->rb_root = RB_ROOT; > - spin_lock_init(&rtpn->lock); > - soft_limit_tree.rb_tree_per_node[node] = rtpn; > - } > - > return 0; > } > subsys_initcall(mem_cgroup_init); > -- > 2.7.4 -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org