linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Roman Gushchin <roman.gushchin@linux.dev>
Subject: Re: [PATCH 02/10] mm: shrinker: Add a .to_text() method for shrinkers
Date: Wed, 28 Aug 2024 13:32:21 +1000	[thread overview]
Message-ID: <Zs6aRZrjqPXQue6r@dread.disaster.area> (raw)
In-Reply-To: <20240824191020.3170516-3-kent.overstreet@linux.dev>

On Sat, Aug 24, 2024 at 03:10:09PM -0400, Kent Overstreet wrote:
> This adds a new callback method to shrinkers which they can use to
> describe anything relevant to memory reclaim about their internal state,
> for example object dirtyness.
....

> +	if (!mutex_trylock(&shrinker_mutex)) {
> +		seq_buf_puts(out, "(couldn't take shrinker lock)");
> +		return;
> +	}

Please don't use the shrinker_mutex like this. There can be tens of
thousands of entries in the shrinker list (because memcgs) and
holding the shrinker_mutex for long running traversals like this is
known to cause latency problems for memcg reaping. If we are at
ENOMEM, the last thing we want to be doing is preventing memcgs from
being reaped.

> +	list_for_each_entry(shrinker, &shrinker_list, list) {
> +		struct shrink_control sc = { .gfp_mask = GFP_KERNEL, };

This iteration and counting setup is neither node or memcg aware.
For node aware shrinkers, this will only count the items freeable
on node 0, and ignore all the other memory in the system. For memcg
systems, it will also only scan the root memcg and so miss counting
any memory in memcg owned caches.

IOWs, the shrinker iteration mechanism needs to iterate both by NUMA
node and by memcg. On large machines with multiple nodes and hosting
thousands of memcgs, a total shrinker state iteration is has to walk
a -lot- of structures.

And example of this is drop_slab() - called from
/proc/sys/vm/drop_caches(). It does this to iterate all the
shrinkers for all the nodes and memcgs in the system:

static unsigned long drop_slab_node(int nid)
{
        unsigned long freed = 0;
        struct mem_cgroup *memcg = NULL;

        memcg = mem_cgroup_iter(NULL, NULL, NULL);
        do {
                freed += shrink_slab(GFP_KERNEL, nid, memcg, 0);
        } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL)) != NULL);

        return freed;
}

void drop_slab(void)
{
        int nid;
        int shift = 0;
        unsigned long freed;

        do {
                freed = 0;
                for_each_online_node(nid) {
                        if (fatal_signal_pending(current))
                                return;

                        freed += drop_slab_node(nid);
                }
        } while ((freed >> shift++) > 1);
}

Hence any iteration for finding the 10 largest shrinkable caches in
the system needs to do something similar. Only, it needs to iterate
memcgs first and then aggregate object counts across all nodes for
shrinkers that are NUMA aware.

Because it needs direct access to the shrinkers, it will need to use
the RCU lock + refcount method of traversal because that's the only
safe way to go from memcg to shrinker instance. IOWs, it
needs to mirror the code in shrink_slab/shrink_slab_memcg to obtain
a safe reference to the relevant shrinker so it can call
->count_objects() and store a refcounted pointer to the shrinker(s)
that will get printed out after the scan is done....

Once the shrinker iteration is sorted out, I'll look further at the
rest of the code in this patch...

-Dave.
-- 
Dave Chinner
david@fromorbit.com


  parent reply	other threads:[~2024-08-28  3:32 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-24 19:10 [PATCH 00/10] shrinker debugging, .to_text() report (resend) Kent Overstreet
2024-08-24 19:10 ` [PATCH 01/10] seq_buf: seq_buf_human_readable_u64() Kent Overstreet
2024-08-24 19:10 ` [PATCH 02/10] mm: shrinker: Add a .to_text() method for shrinkers Kent Overstreet
2024-08-26  3:01   ` Qi Zheng
2024-08-26  3:04     ` Kent Overstreet
2024-08-26  3:10       ` Qi Zheng
2024-08-26  3:32         ` Kent Overstreet
2024-08-28  3:32   ` Dave Chinner [this message]
2024-08-24 19:10 ` [PATCH 03/10] mm: shrinker: Add new stats for .to_text() Kent Overstreet
2024-08-28  2:36   ` Dave Chinner
2024-08-28  2:42     ` Kent Overstreet
2024-08-24 19:10 ` [PATCH 04/10] mm: Centralize & improve oom reporting in show_mem.c Kent Overstreet
2024-08-24 19:10 ` [PATCH 05/10] mm: shrinker: Add shrinker_to_text() to debugfs interface Kent Overstreet
2024-08-24 19:10 ` [PATCH 06/10] bcachefs: shrinker.to_text() methods Kent Overstreet
2024-08-24 19:10 ` [PATCH 07/10] percpu: per_cpu_sum() Kent Overstreet
2024-08-24 19:10 ` [PATCH 08/10] fs: Add super_block->s_inodes_nr Kent Overstreet
2024-08-24 19:10 ` [PATCH 09/10] fs/dcache: Add per-sb accounting for nr dentries Kent Overstreet
2024-08-24 19:10 ` [PATCH 10/10] fs: super_cache_to_text() Kent Overstreet
2024-08-28  3:43   ` Dave Chinner
2024-08-28  1:51 ` [PATCH 00/10] shrinker debugging, .to_text() report (resend) Dave Chinner
2024-08-28  2:23   ` Kent Overstreet
     [not found] <20240824180454.3160385-1-kent.overstreet@linux.dev>
2024-08-24 18:04 ` [PATCH 02/10] mm: shrinker: Add a .to_text() method for shrinkers Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zs6aRZrjqPXQue6r@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=roman.gushchin@linux.dev \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox