From: Andrew Morton <akpm@linux-foundation.org>
To: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: vdavydov.dev@gmail.com, shakeelb@google.com,
viro@zeniv.linux.org.uk, hannes@cmpxchg.org, mhocko@kernel.org,
tglx@linutronix.de, pombredanne@nexb.com,
stummala@codeaurora.org, gregkh@linuxfoundation.org,
sfr@canb.auug.org.au, guro@fb.com, mka@chromium.org,
penguin-kernel@I-love.SAKURA.ne.jp, chris@chris-wilson.co.uk,
longman@redhat.com, minchan@kernel.org, ying.huang@intel.com,
mgorman@techsingularity.net, jbacik@fb.com, linux@roeck-us.net,
linux-kernel@vger.kernel.org, linux-mm@kvack.org,
willy@infradead.org, lirongqing@baidu.com,
aryabinin@virtuozzo.com
Subject: Re: [PATCH v8 05/17] mm: Assign memcg-aware shrinkers bitmap to memcg
Date: Tue, 3 Jul 2018 13:50:00 -0700 [thread overview]
Message-ID: <20180703135000.b2322ae0e514f028e7941d3c@linux-foundation.org> (raw)
In-Reply-To: <153063056619.1818.12550500883688681076.stgit@localhost.localdomain>
On Tue, 03 Jul 2018 18:09:26 +0300 Kirill Tkhai <ktkhai@virtuozzo.com> wrote:
> Imagine a big node with many cpus, memory cgroups and containers.
> Let we have 200 containers, every container has 10 mounts,
> and 10 cgroups. All container tasks don't touch foreign
> containers mounts. If there is intensive pages write,
> and global reclaim happens, a writing task has to iterate
> over all memcgs to shrink slab, before it's able to go
> to shrink_page_list().
>
> Iteration over all the memcg slabs is very expensive:
> the task has to visit 200 * 10 = 2000 shrinkers
> for every memcg, and since there are 2000 memcgs,
> the total calls are 2000 * 2000 = 4000000.
>
> So, the shrinker makes 4 million do_shrink_slab() calls
> just to try to isolate SWAP_CLUSTER_MAX pages in one
> of the actively writing memcg via shrink_page_list().
> I've observed a node spending almost 100% in kernel,
> making useless iteration over already shrinked slab.
>
> This patch adds bitmap of memcg-aware shrinkers to memcg.
> The size of the bitmap depends on bitmap_nr_ids, and during
> memcg life it's maintained to be enough to fit bitmap_nr_ids
> shrinkers. Every bit in the map is related to corresponding
> shrinker id.
>
> Next patches will maintain set bit only for really charged
> memcg. This will allow shrink_slab() to increase its
> performance in significant way. See the last patch for
> the numbers.
>
> ...
>
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -182,6 +182,11 @@ static int prealloc_memcg_shrinker(struct shrinker *shrinker)
> if (id < 0)
> goto unlock;
>
> + if (memcg_expand_shrinker_maps(id)) {
> + idr_remove(&shrinker_idr, id);
> + goto unlock;
> + }
> +
> if (id >= shrinker_nr_max)
> shrinker_nr_max = id + 1;
> shrinker->id = id;
This function ends up being a rather sad little thing.
: static int prealloc_memcg_shrinker(struct shrinker *shrinker)
: {
: int id, ret = -ENOMEM;
:
: down_write(&shrinker_rwsem);
: id = idr_alloc(&shrinker_idr, shrinker, 0, 0, GFP_KERNEL);
: if (id < 0)
: goto unlock;
:
: if (memcg_expand_shrinker_maps(id)) {
: idr_remove(&shrinker_idr, id);
: goto unlock;
: }
:
: if (id >= shrinker_nr_max)
: shrinker_nr_max = id + 1;
: shrinker->id = id;
: ret = 0;
: unlock:
: up_write(&shrinker_rwsem);
: return ret;
: }
- there's no need to call memcg_expand_shrinker_maps() unless id >=
shrinker_nr_max so why not move the code and avoid calling
memcg_expand_shrinker_maps() in most cases.
- why aren't we decreasing shrinker_nr_max in
unregister_memcg_shrinker()? That's easy to do, avoids pointless
work in shrink_slab_memcg() and avoids memory waste in future
prealloc_memcg_shrinker() calls.
It should be possible to find the highest ID in an IDR tree with a
straightforward descent of the underlying radix tree, but I doubt if
that has been wired up. Otherwise a simple loop in
unregister_memcg_shrinker() would be needed.
next prev parent reply other threads:[~2018-07-03 20:50 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-03 15:08 [PATCH v8 00/17] Improve shrink_slab() scalability (old complexity was O(n^2), new is O(n)) Kirill Tkhai
2018-07-03 15:08 ` [PATCH v8 01/17] list_lru: Combine code under the same define Kirill Tkhai
2018-07-03 15:08 ` [PATCH v8 02/17] mm: Introduce CONFIG_MEMCG_KMEM as combination of CONFIG_MEMCG && !CONFIG_SLOB Kirill Tkhai
2018-07-03 15:09 ` [PATCH v8 03/17] mm: Assign id to every memcg-aware shrinker Kirill Tkhai
2018-07-03 15:27 ` Matthew Wilcox
2018-07-03 15:46 ` Shakeel Butt
2018-07-03 16:17 ` Kirill Tkhai
2018-07-03 17:00 ` Shakeel Butt
2018-07-03 17:32 ` Kirill Tkhai
2018-07-12 11:13 ` Kirill Tkhai
2018-07-12 11:19 ` Kirill Tkhai
2018-07-03 17:47 ` Matthew Wilcox
2018-07-03 20:39 ` Al Viro
2018-07-03 15:46 ` Kirill Tkhai
2018-07-03 17:58 ` Matthew Wilcox
2018-07-03 19:12 ` Kirill Tkhai
2018-07-03 19:19 ` Shakeel Butt
2018-07-03 19:25 ` Matthew Wilcox
2018-07-03 19:54 ` Shakeel Butt
2018-07-03 15:09 ` [PATCH v8 04/17] memcg: Move up for_each_mem_cgroup{, _tree} defines Kirill Tkhai
2018-07-03 15:09 ` [PATCH v8 05/17] mm: Assign memcg-aware shrinkers bitmap to memcg Kirill Tkhai
2018-07-03 20:50 ` Andrew Morton [this message]
2018-07-04 15:51 ` Kirill Tkhai
2018-07-05 22:10 ` Andrew Morton
2018-07-06 17:50 ` Vladimir Davydov
2018-07-05 22:50 ` Matthew Wilcox
2018-07-06 17:30 ` Vladimir Davydov
2018-07-03 15:09 ` [PATCH v8 06/17] mm: Refactoring in workingset_init() Kirill Tkhai
2018-07-03 15:09 ` [PATCH v8 07/17] fs: Refactoring in alloc_super() Kirill Tkhai
2018-07-03 15:09 ` [PATCH v8 08/17] fs: Propagate shrinker::id to list_lru Kirill Tkhai
2018-07-03 15:10 ` [PATCH v8 09/17] list_lru: Add memcg argument to list_lru_from_kmem() Kirill Tkhai
2018-07-03 15:10 ` [PATCH v8 10/17] list_lru: Pass dst_memcg argument to memcg_drain_list_lru_node() Kirill Tkhai
2018-07-03 15:10 ` [PATCH v8 11/17] list_lru: Pass lru " Kirill Tkhai
2018-07-03 15:10 ` [PATCH v8 12/17] mm: Export mem_cgroup_is_root() Kirill Tkhai
2018-07-03 15:10 ` [PATCH v8 13/17] mm: Set bit in memcg shrinker bitmap on first list_lru item apearance Kirill Tkhai
2018-07-03 20:54 ` Andrew Morton
2018-07-03 15:11 ` [PATCH v8 14/17] mm: Iterate only over charged shrinkers during memcg shrink_slab() Kirill Tkhai
2018-07-03 20:58 ` Andrew Morton
2018-07-04 14:56 ` Kirill Tkhai
2018-07-03 15:11 ` [PATCH v8 15/17] mm: Generalize shrink_slab() calls in shrink_node() Kirill Tkhai
2018-07-03 15:11 ` [PATCH v8 16/17] mm: Add SHRINK_EMPTY shrinker methods return value Kirill Tkhai
2018-07-03 15:11 ` [PATCH v8 17/17] mm: Clear shrinker bit if there are no objects related to memcg Kirill Tkhai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180703135000.b2322ae0e514f028e7941d3c@linux-foundation.org \
--to=akpm@linux-foundation.org \
--cc=aryabinin@virtuozzo.com \
--cc=chris@chris-wilson.co.uk \
--cc=gregkh@linuxfoundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=jbacik@fb.com \
--cc=ktkhai@virtuozzo.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux@roeck-us.net \
--cc=lirongqing@baidu.com \
--cc=longman@redhat.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@kernel.org \
--cc=minchan@kernel.org \
--cc=mka@chromium.org \
--cc=penguin-kernel@I-love.SAKURA.ne.jp \
--cc=pombredanne@nexb.com \
--cc=sfr@canb.auug.org.au \
--cc=shakeelb@google.com \
--cc=stummala@codeaurora.org \
--cc=tglx@linutronix.de \
--cc=vdavydov.dev@gmail.com \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox