From: Yang Shi <shy828301@gmail.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Linux MM <linux-mm@kvack.org>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Chinner <dchinner@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
Date: Wed, 20 Apr 2022 15:24:49 -0700 [thread overview]
Message-ID: <CAHbLzkrOS12pi8WEXyUgYEQ4gy0S9iVrEeBp-2Ypyn=1bthZRA@mail.gmail.com> (raw)
In-Reply-To: <20220416002756.4087977-1-roman.gushchin@linux.dev>
On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
>
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
>
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
>
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
>
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.
>
> Usage examples:
Thanks, Roman. A follow-up question, why do we have to implement this
in kernel if we just count the objects? It seems userspace tools could
achieve it too, for example, drgn :-). Actually I did write a drgn
script for debugging a problem a few months ago, which iterates
specific memcg's lru_list to count the objects by their state.
>
> 1) List registered shrinkers:
> $ cd /sys/kernel/shrinker/
> $ ls
> dqcache-16 sb-cgroup2-30 sb-hugetlbfs-33 sb-proc-41 sb-selinuxfs-22 sb-tmpfs-40 sb-zsmalloc-19
> kfree_rcu-0 sb-configfs-23 sb-iomem-12 sb-proc-44 sb-sockfs-8 sb-tmpfs-42 shadow-18
> sb-aio-20 sb-dax-11 sb-mqueue-21 sb-proc-45 sb-sysfs-26 sb-tmpfs-43 thp_deferred_split-10
> sb-anon_inodefs-15 sb-debugfs-7 sb-nsfs-4 sb-proc-47 sb-tmpfs-1 sb-tmpfs-46 thp_zero-9
> sb-bdev-3 sb-devpts-28 sb-pipefs-14 sb-pstore-31 sb-tmpfs-27 sb-tmpfs-49 xfs_buf-37
> sb-bpf-32 sb-devtmpfs-5 sb-proc-25 sb-rootfs-2 sb-tmpfs-29 sb-tracefs-13 xfs_inodegc-38
> sb-btrfs-24 sb-hugetlbfs-17 sb-proc-39 sb-securityfs-6 sb-tmpfs-35 sb-xfs-36 zspool-34
>
> 2) Get information about a specific shrinker:
> $ cd sb-btrfs-24/
> $ ls
> count count_memcg count_memcg_node count_node scan scan_memcg scan_memcg_node scan_node
>
> 3) Count objects on the system/root cgroup level
> $ cat count
> 212
>
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
> $ cat count_node
> 209 3
>
> 5) Count objects for each memcg (output format: cgroup inode, count)
> $ cat count_memcg
> 1 212
> 20 96
> 53 817
> 2297 2
> 218 13
> 581 30
> 911 124
> <CUT>
>
> 6) Same but with a per-node output
> $ cat count_memcg_node
> 1 209 3
> 20 96 0
> 53 810 7
> 2297 2 0
> 218 13 0
> 581 30 0
> 911 124 0
> <CUT>
>
> 7) Don't display cgroups with less than 500 attached objects
> $ echo 500 > count_memcg
> $ cat count_memcg
> 53 817
> 1868 886
> 2396 799
> 2462 861
>
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
> $ echo "500" > count_memcg_node
> $ cat count_memcg_node
> 53 810 7
> 1868 886 0
> 2396 799 0
> 2462 861 0
>
> 9) Scan system/root shrinker
> $ cat count
> 212
> $ echo 100 > scan
> $ cat scan
> 97
> $ cat count
> 115
>
> 10) Scan individual memcg
> $ echo "1868 500" > scan_memcg
> $ cat scan_memcg
> 193
>
> 11) Scan individual node
> $ echo "1 200" > scan_node
> $ cat scan_node
> 2
>
> 12) Scan individual memcg and node
> $ echo "1868 0 500" > scan_memcg_node
> $ cat scan_memcg_node
> 435
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.
>
>
> Roman Gushchin (5):
> mm: introduce sysfs interface for debugging kernel shrinker
> mm: memcontrol: introduce mem_cgroup_ino() and
> mem_cgroup_get_from_ino()
> mm: introduce memcg interfaces for shrinker sysfs
> mm: introduce numa interfaces for shrinker sysfs
> mm: provide shrinkers with names
>
> arch/x86/kvm/mmu/mmu.c | 2 +-
> drivers/android/binder_alloc.c | 2 +-
> drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 3 +-
> drivers/gpu/drm/msm/msm_gem_shrinker.c | 2 +-
> .../gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 +-
> drivers/gpu/drm/ttm/ttm_pool.c | 2 +-
> drivers/md/bcache/btree.c | 2 +-
> drivers/md/dm-bufio.c | 2 +-
> drivers/md/dm-zoned-metadata.c | 2 +-
> drivers/md/raid5.c | 2 +-
> drivers/misc/vmw_balloon.c | 2 +-
> drivers/virtio/virtio_balloon.c | 2 +-
> drivers/xen/xenbus/xenbus_probe_backend.c | 2 +-
> fs/erofs/utils.c | 2 +-
> fs/ext4/extents_status.c | 3 +-
> fs/f2fs/super.c | 2 +-
> fs/gfs2/glock.c | 2 +-
> fs/gfs2/main.c | 2 +-
> fs/jbd2/journal.c | 2 +-
> fs/mbcache.c | 2 +-
> fs/nfs/nfs42xattr.c | 7 +-
> fs/nfs/super.c | 2 +-
> fs/nfsd/filecache.c | 2 +-
> fs/nfsd/nfscache.c | 2 +-
> fs/quota/dquot.c | 2 +-
> fs/super.c | 2 +-
> fs/ubifs/super.c | 2 +-
> fs/xfs/xfs_buf.c | 2 +-
> fs/xfs/xfs_icache.c | 2 +-
> fs/xfs/xfs_qm.c | 2 +-
> include/linux/memcontrol.h | 9 +
> include/linux/shrinker.h | 25 +-
> kernel/rcu/tree.c | 2 +-
> lib/Kconfig.debug | 9 +
> mm/Makefile | 1 +
> mm/huge_memory.c | 4 +-
> mm/memcontrol.c | 23 +
> mm/shrinker_debug.c | 792 ++++++++++++++++++
> mm/vmscan.c | 66 +-
> mm/workingset.c | 2 +-
> mm/zsmalloc.c | 2 +-
> net/sunrpc/auth.c | 2 +-
> 42 files changed, 957 insertions(+), 47 deletions(-)
> create mode 100644 mm/shrinker_debug.c
>
> --
> 2.35.1
>
next prev parent reply other threads:[~2022-04-20 22:25 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-16 0:27 Roman Gushchin
2022-04-16 0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
2022-04-16 1:35 ` Hillf Danton
2022-04-16 0:27 ` [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
2022-04-16 0:27 ` [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs Roman Gushchin
2022-04-16 0:27 ` [PATCH rfc 4/5] mm: introduce numa " Roman Gushchin
2022-04-16 0:27 ` [PATCH rfc 5/5] mm: provide shrinkers with names Roman Gushchin
2022-04-18 9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
2022-04-18 17:27 ` Roman Gushchin
2022-04-19 6:33 ` Mike Rapoport
2022-04-19 17:58 ` Roman Gushchin
2022-04-19 4:27 ` Andrew Morton
2022-04-19 17:52 ` Roman Gushchin
2022-04-19 18:25 ` Andrew Morton
2022-04-19 18:43 ` Roman Gushchin
2022-04-19 18:33 ` Greg KH
2022-04-19 18:20 ` Kent Overstreet
2022-04-19 18:58 ` Roman Gushchin
2022-04-19 19:46 ` Kent Overstreet
2022-04-19 18:36 ` Kent Overstreet
2022-04-19 18:50 ` Roman Gushchin
2022-04-19 21:10 ` Kent Overstreet
2022-04-20 22:24 ` Yang Shi [this message]
2022-04-20 23:23 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAHbLzkrOS12pi8WEXyUgYEQ4gy0S9iVrEeBp-2Ypyn=1bthZRA@mail.gmail.com' \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=dchinner@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox