From: Dave Chinner <david@fromorbit.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yang Shi <shy828301@gmail.com>,
guro@fb.com, ktkhai@virtuozzo.com, shakeelb@google.com,
mhocko@suse.com, akpm@linux-foundation.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation
Date: Wed, 16 Dec 2020 08:59:38 +1100 [thread overview]
Message-ID: <20201215215938.GQ3913616@dread.disaster.area> (raw)
In-Reply-To: <20201215135348.GC379720@cmpxchg.org>
On Tue, Dec 15, 2020 at 02:53:48PM +0100, Johannes Weiner wrote:
> On Tue, Dec 15, 2020 at 01:09:57PM +1100, Dave Chinner wrote:
> > On Mon, Dec 14, 2020 at 02:37:15PM -0800, Yang Shi wrote:
> > > Since memcg_shrinker_map_size just can be changd under holding shrinker_rwsem
> > > exclusively, the read side can be protected by holding read lock, so it sounds
> > > superfluous to have a dedicated mutex.
> >
> > I'm not sure this is a good idea. This couples the shrinker
> > infrastructure to internal details of how cgroups are initialised
> > and managed. Sure, certain operations might be done in certain
> > shrinker lock contexts, but that doesn't mean we should share global
> > locks across otherwise independent subsystems....
>
> They're not independent subsystems. Most of the memory controller is
> an extension of core VM operations that is fairly difficult to
> understand outside the context of those operations. Then there are a
> limited number of entry points from the cgroup interface. We used to
> have our own locks for core VM structures (private page lock e.g.) to
> coordinate VM and cgroup, and that was mostly unintelligble.
Yes, but OTOH you can CONFIG_MEMCG=n and the shrinker infrastructure
and shrinkers all still functions correctly. Ergo, the shrinker
infrastructure is independent of memcgs. Yes, it may have functions
to iterate and manipulate memcgs, but it is not dependent on memcgs
existing for correct behaviour and functionality.
Yet.
> We have since established that those two components coordinate with
> native VM locking and lifetime management. If you need to lock the
> page, you lock the page - instead of having all VM paths that already
> hold the page lock acquire a nested lock to exclude one cgroup path.
>
> In this case, we have auxiliary shrinker data, subject to shrinker
> lifetime and exclusion rules. It's much easier to understand that
> cgroup creation needs a stable shrinker list (shrinker_rwsem) to
> manage this data, than having an aliased lock that is private to the
> memcg callbacks and obscures this real interdependency.
Ok, so the way to do this is to move all the stuff that needs to be
done under a "subsystem global" lock to the one file, not turn a
static lock into a globally visible lock and spray it around random
source files. There's already way too many static globals to manage
separate shrinker and memcg state..
I certainly agree that shrinkers and memcg need to be more closely
integrated. I've only been saying that for ... well, since memcgs
essentially duplicated the top level shrinker path so the shrinker
map could be introduced to avoid calling shrinkers that have no work
to do for memcgs. The shrinker map should be generic functionality
for all shrinker invocations because even a non-memcg machine can
have thousands of registered shrinkers that are mostly idle all the
time.
IOWs, I think the shrinker map management is not really memcg
specific - it's just allocation and assignment of a structure, and
the only memcg bit is the map is being stored in a memcg structure.
Therefore, if we are looking towards tighter integration then we
should acutally move the map management to the shrinker code, not
split the shrinker infrastructure management across different files.
There's already a heap of code in vmscan.c under #ifdef
CONFIG_MEMCG, like the prealloc_shrinker() code path:
prealloc_shrinker() vmscan.c
if (MEMCG_AWARE) vmscan.c
prealloc_memcg_shrinker vmscan.c
#ifdef CONFIG_MEMCG vmscan.c
down_write(shrinker_rwsem) vmscan.c
if (id > shrinker_id_max) vmscan.c
memcg_expand_shrinker_maps memcontrol.c
for_each_memcg memcontrol.c
reallocate shrinker map memcontrol.c
replace shrinker map memcontrol.c
shrinker_id_max = id vmscan.c
down_write(shrinker_rwsem) vmscan.c
#endif
And, really, there's very little code in memcg_expand_shrinker_maps()
here - the only memcg part is the memcg iteration loop, and we
already have them in vmscan.c (e.g. shrink_node_memcgs(),
age_active_anon(), drop_slab_node()) so there's precedence for
moving this memcg iteration for shrinker map management all into
vmscan.c.
Doing so would formalise the shrinker maps as first class shrinker
infrastructure rather than being tacked on to the side of the memcg
infrastructure. At this point it makes total sense to serialise map
manipulations under the shrinker_rwsem.
IOWs, I'm not disagreeing with the direction this patch takes us in,
I'm disagreeing with the implementation as published in the patch
because it doesn't move us closer to a clean, concise single
shrinker infrastructure implementation.
That is, for the medium term, I think we should be getting rid of
the "legacy" non-memcg shrinker path and everything runs under
memcgs. With this patchset moving all the deferred counts to be
memcg aware, the only reason for keeping the non-memcg path around
goes away. If sc->memcg is null, then after this patch set we can
simply use the root memcg and just use it's per-node accounting
rather than having a separate construct for non-memcg aware per-node
accounting.
Hence if SHRINKER_MEMCG_AWARE is set, it simply means we should run
the shrinker if sc->memcg is set. There is no difference in setup
of shrinkers, the duplicate non-memcg/memcg paths go away, and a
heap of code drops out of the shrinker infrastructure. It becomes
much simpler overall.
It also means we have a path for further integrating memcg aware
shrinkers into the shrinker infrastructure because we can always
rely on the shrinker infrastructure being memcg aware. And with that
in mind, I think we should probably also be moving the shrinker code
out of vmscan.c into it's own file as it's really completely
separate infrastructure from the vast majority of page reclaim
infrastructure in vmscan.c...
That's the view I'm looking at this patchset from. Not just as a
standalone bug fix, but also from the perspective of what the
architectural change implies and the directions for tighter
integration it opens up for us.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2020-12-15 21:59 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-14 22:37 [RFC v2 PATCH 0/9] Make shrinker's nr_deferred memcg aware Yang Shi
2020-12-14 22:37 ` [v2 PATCH 1/9] mm: vmscan: use nid from shrink_control for tracepoint Yang Shi
2020-12-14 22:37 ` [v2 PATCH 2/9] mm: memcontrol: use shrinker_rwsem to protect shrinker_maps allocation Yang Shi
2020-12-15 2:09 ` Dave Chinner
2020-12-15 13:53 ` Johannes Weiner
2020-12-15 21:59 ` Dave Chinner [this message]
2020-12-16 13:17 ` Kirill Tkhai
2020-12-16 19:12 ` Johannes Weiner
2020-12-16 21:56 ` Yang Shi
2020-12-16 19:39 ` Roman Gushchin
2020-12-15 14:07 ` Johannes Weiner
2020-12-15 20:32 ` Yang Shi
2020-12-14 22:37 ` [v2 PATCH 3/9] mm: vmscan: guarantee shrinker_slab_memcg() sees valid shrinker_maps for online memcg Yang Shi
2020-12-15 2:04 ` Dave Chinner
[not found] ` <20201215123802.GA379720@cmpxchg.org>
2020-12-15 12:58 ` Kirill Tkhai
2020-12-15 17:14 ` Johannes Weiner
[not found] ` <CAHbLzkrzv48S3ks-x8M=2sHxRS_+c-hLXdt4ScaWD6mC4ZFe8w@mail.gmail.com>
2020-12-28 20:03 ` Yang Shi
2020-12-14 22:37 ` [v2 PATCH 4/9] mm: vmscan: use a new flag to indicate shrinker is registered Yang Shi
2020-12-14 22:37 ` [v2 PATCH 5/9] mm: memcontrol: add per memcg shrinker nr_deferred Yang Shi
2020-12-15 2:22 ` Dave Chinner
2020-12-15 14:45 ` Johannes Weiner
2020-12-15 21:57 ` Yang Shi
2020-12-14 22:37 ` [v2 PATCH 6/9] mm: vmscan: use per memcg nr_deferred of shrinker Yang Shi
2020-12-15 2:46 ` Dave Chinner
2020-12-15 22:27 ` Yang Shi
2020-12-15 23:48 ` Dave Chinner
2020-12-14 22:37 ` [v2 PATCH 7/9] mm: vmscan: don't need allocate shrinker->nr_deferred for memcg aware shrinkers Yang Shi
2020-12-15 3:05 ` Dave Chinner
2020-12-15 23:07 ` Yang Shi
2020-12-18 0:56 ` Yang Shi
2020-12-18 1:09 ` Dave Chinner
2020-12-14 22:37 ` [v2 PATCH 8/9] mm: memcontrol: reparent nr_deferred when memcg offline Yang Shi
2020-12-15 3:07 ` Dave Chinner
2020-12-15 23:10 ` Yang Shi
2020-12-14 22:37 ` [v2 PATCH 9/9] mm: vmscan: shrink deferred objects proportional to priority Yang Shi
2020-12-15 3:23 ` Dave Chinner
2020-12-15 23:59 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201215215938.GQ3913616@dread.disaster.area \
--to=david@fromorbit.com \
--cc=akpm@linux-foundation.org \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=ktkhai@virtuozzo.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=shakeelb@google.com \
--cc=shy828301@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox