From: Balbir Singh <bsingharora@gmail.com>
To: Vladimir Davydov <vdavydov@tarantool.org>
Cc: mpe@ellerman.id.au, hannes@cmpxchg.org, mhocko@kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org,
Tejun Heo <tj@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support
Date: Thu, 17 Nov 2016 11:28:12 +1100 [thread overview]
Message-ID: <3accc533-8dda-a69c-fabc-23eb388cf11b@gmail.com> (raw)
In-Reply-To: <20161116090129.GA18225@esperanza>
On 16/11/16 20:01, Vladimir Davydov wrote:
> Hello,
>
> On Wed, Nov 16, 2016 at 10:44:59AM +1100, Balbir Singh wrote:
>> The lack of hotplug support makes us allocate all memory
>> upfront for per node data structures. With large number
>> of cgroups this can be an overhead. PPC64 actually limits
>> n_possible nodes to n_online to avoid some of this overhead.
>>
>> This patch adds the basic notifiers to listen to hotplug
>> events and does the allocation and free of those structures
>> per cgroup. We walk every cgroup per event, its a trade-off
>> of allocating upfront vs allocating on demand and freeing
>> on offline.
>>
>> Cc: Tejun Heo <tj@kernel.org>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
>>
>> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> ---
>> mm/memcontrol.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++-------
>> 1 file changed, 60 insertions(+), 8 deletions(-)
>>
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 91dfc7c..5585fce 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -63,6 +63,7 @@
>> #include <linux/lockdep.h>
>> #include <linux/file.h>
>> #include <linux/tracehook.h>
>> +#include <linux/memory.h>
>> #include "internal.h"
>> #include <net/sock.h>
>> #include <net/ip.h>
>> @@ -1342,6 +1343,10 @@ int mem_cgroup_select_victim_node(struct mem_cgroup *memcg)
>> {
>> return 0;
>> }
>> +
>> +static void mem_cgroup_may_update_nodemask(struct mem_cgroup *memcg)
>> +{
>> +}
>> #endif
>>
>> static int mem_cgroup_soft_reclaim(struct mem_cgroup *root_memcg,
>> @@ -4115,14 +4120,7 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node)
>> {
>> struct mem_cgroup_per_node *pn;
>> int tmp = node;
>> - /*
>> - * This routine is called against possible nodes.
>> - * But it's BUG to call kmalloc() against offline node.
>> - *
>> - * TODO: this routine can waste much memory for nodes which will
>> - * never be onlined. It's better to use memory hotplug callback
>> - * function.
>> - */
>> +
>> if (!node_state(node, N_NORMAL_MEMORY))
>> tmp = -1;
>> pn = kzalloc_node(sizeof(*pn), GFP_KERNEL, tmp);
>> @@ -5773,6 +5771,59 @@ static int __init cgroup_memory(char *s)
>> }
>> __setup("cgroup.memory=", cgroup_memory);
>>
>> +static void memcg_node_offline(int node)
>> +{
>> + struct mem_cgroup *memcg;
>> +
>> + if (node < 0)
>> + return;
>
> Is this possible?
Yes, please see node_states_check_changes_online/offline
>
>> +
>> + for_each_mem_cgroup(memcg) {
>> + free_mem_cgroup_per_node_info(memcg, node);
>> + mem_cgroup_may_update_nodemask(memcg);
>
> If memcg->numainfo_events is 0, mem_cgroup_may_update_nodemask() won't
> update memcg->scan_nodes. Is it OK?
>
>> + }
>
> What if a memory cgroup is created or destroyed while you're walking the
> tree? Should we probably use get_online_mems() in mem_cgroup_alloc() to
> avoid that?
>
The iterator internally takes rcu_read_lock() to avoid any side-effects
of cgroups added/removed. I suspect you are also suggesting using get_online_mems()
around each call to for_each_online_node
My understanding so far is
1. invalidate_reclaim_iterators should be safe (no bad side-effects)
2. mem_cgroup_free - should be safe as well
3. mem_cgroup_alloc - needs protection
4. mem_cgroup_init - needs protection
5. mem_cgroup_remove_from_tress - should be safe
>> +}
>> +
>> +static void memcg_node_online(int node)
>> +{
>> + struct mem_cgroup *memcg;
>> +
>> + if (node < 0)
>> + return;
>> +
>> + for_each_mem_cgroup(memcg) {
>> + alloc_mem_cgroup_per_node_info(memcg, node);
>> + mem_cgroup_may_update_nodemask(memcg);
>> + }
>> +}
>> +
>> +static int memcg_memory_hotplug_callback(struct notifier_block *self,
>> + unsigned long action, void *arg)
>> +{
>> + struct memory_notify *marg = arg;
>> + int node = marg->status_change_nid;
>> +
>> + switch (action) {
>> + case MEM_GOING_OFFLINE:
>> + case MEM_CANCEL_ONLINE:
>> + memcg_node_offline(node);
>
> Judging by __offline_pages(), the MEM_GOING_OFFLINE event is emitted
> before migrating pages off the node. So, I guess freeing per-node info
> here isn't quite correct, as pages still need it to be moved from the
> node's LRU lists. Better move it to MEM_OFFLINE?
>
Good point, will redo
>> + break;
>> + case MEM_GOING_ONLINE:
>> + case MEM_CANCEL_OFFLINE:
>> + memcg_node_online(node);
>> + break;
>> + case MEM_ONLINE:
>> + case MEM_OFFLINE:
>> + break;
>> + }
>> + return NOTIFY_OK;
>> +}
>> +
>> +static struct notifier_block memcg_memory_hotplug_nb __meminitdata = {
>> + .notifier_call = memcg_memory_hotplug_callback,
>> + .priority = IPC_CALLBACK_PRI,
>
> I wonder why you chose this priority?
>
I just chose the lowest priority
>> +};
>> +
>> /*
>> * subsys_initcall() for memory controller.
>> *
>> @@ -5797,6 +5848,7 @@ static int __init mem_cgroup_init(void)
>> #endif
>>
>> hotcpu_notifier(memcg_cpu_hotplug_callback, 0);
>> + register_hotmemory_notifier(&memcg_memory_hotplug_nb);
>>
>> for_each_possible_cpu(cpu)
>> INIT_WORK(&per_cpu_ptr(&memcg_stock, cpu)->work,
>
> I guess, we should modify mem_cgroup_alloc/free() in the scope of this
> patch, otherwise it doesn't make much sense IMHO. May be, it's even
> worth merging patches 1 and 2 altogether.
>
Thanks for the review, I'll revisit the organization of the patches.
Balbir Singh
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-11-17 0:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-15 23:44 [RESEND][v1 0/3] Support memory cgroup hotplug Balbir Singh
2016-11-15 23:44 ` [RESEND] [PATCH v1 1/3] Add basic infrastructure for memcg hotplug support Balbir Singh
2016-11-16 9:01 ` Vladimir Davydov
2016-11-17 0:28 ` Balbir Singh [this message]
2016-11-21 8:36 ` Vladimir Davydov
2016-11-22 0:17 ` Balbir Singh
2016-11-15 23:45 ` [RESEND] [PATCH v1 2/3] Move from all possible nodes to online nodes Balbir Singh
2016-11-15 23:45 ` [RESEND] [PATCH v1 3/3] powerpc: fix node_possible_map limitations Balbir Singh
2016-11-16 16:40 ` Reza Arbab
2016-11-16 16:45 ` [PATCH] powerpc/mm: allow memory hotplug into an offline node Reza Arbab
2017-02-01 1:05 ` Michael Ellerman
2016-11-21 14:03 ` [RESEND][v1 0/3] Support memory cgroup hotplug Michal Hocko
2016-11-22 0:16 ` Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3accc533-8dda-a69c-fabc-23eb388cf11b@gmail.com \
--to=bsingharora@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mhocko@kernel.org \
--cc=mpe@ellerman.id.au \
--cc=tj@kernel.org \
--cc=vdavydov@tarantool.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox