linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Dave Chinner <dchinner@redhat.com>,
	 Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH rfc 0/5] mm: introduce shrinker sysfs interface
Date: Wed, 20 Apr 2022 15:24:49 -0700	[thread overview]
Message-ID: <CAHbLzkrOS12pi8WEXyUgYEQ4gy0S9iVrEeBp-2Ypyn=1bthZRA@mail.gmail.com> (raw)
In-Reply-To: <20220416002756.4087977-1-roman.gushchin@linux.dev>

On Fri, Apr 15, 2022 at 5:28 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> There are 50+ different shrinkers in the kernel, many with their own bells and
> whistles. Under the memory pressure the kernel applies some pressure on each of
> them in the order of which they were created/registered in the system. Some
> of them can contain only few objects, some can be quite large. Some can be
> effective at reclaiming memory, some not.
>
> The only existing debugging mechanism is a couple of tracepoints in
> do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't
> covering everything though: shrinkers which report 0 objects will never show up,
> there is no support for memcg-aware shrinkers. Shrinkers are identified by their
> scan function, which is not always enough (e.g. hard to guess which super
> block's shrinker it is having only "super_cache_scan"). They are a passive
> mechanism: there is no way to call into counting and scanning of an individual
> shrinker and profile it.
>
> To provide a better visibility and debug options for memory shrinkers
> this patchset introduces a /sys/kernel/shrinker interface, to some extent
> similar to /sys/kernel/slab.
>
> For each shrinker registered in the system a folder is created. The folder
> contains "count" and "scan" files, which allow to trigger count_objects()
> and scan_objects() callbacks. For memcg-aware and numa-aware shrinkers
> count_memcg, scan_memcg, count_node, scan_node, count_memcg_node
> and scan_memcg_node are additionally provided. They allow to get per-memcg
> and/or per-node object count and shrink only a specific memcg/node.
>
> To make debugging more pleasant, the patchset also names all shrinkers,
> so that sysfs entries can have more meaningful names.
>
> Usage examples:

Thanks, Roman. A follow-up question, why do we have to implement this
in kernel if we just count the objects? It seems userspace tools could
achieve it too, for example, drgn :-). Actually I did write a drgn
script for debugging a problem a few months ago, which iterates
specific memcg's lru_list to count the objects by their state.

>
> 1) List registered shrinkers:
>   $ cd /sys/kernel/shrinker/
>   $ ls
>     dqcache-16          sb-cgroup2-30    sb-hugetlbfs-33  sb-proc-41       sb-selinuxfs-22  sb-tmpfs-40    sb-zsmalloc-19
>     kfree_rcu-0         sb-configfs-23   sb-iomem-12      sb-proc-44       sb-sockfs-8      sb-tmpfs-42    shadow-18
>     sb-aio-20           sb-dax-11        sb-mqueue-21     sb-proc-45       sb-sysfs-26      sb-tmpfs-43    thp_deferred_split-10
>     sb-anon_inodefs-15  sb-debugfs-7     sb-nsfs-4        sb-proc-47       sb-tmpfs-1       sb-tmpfs-46    thp_zero-9
>     sb-bdev-3           sb-devpts-28     sb-pipefs-14     sb-pstore-31     sb-tmpfs-27      sb-tmpfs-49    xfs_buf-37
>     sb-bpf-32           sb-devtmpfs-5    sb-proc-25       sb-rootfs-2      sb-tmpfs-29      sb-tracefs-13  xfs_inodegc-38
>     sb-btrfs-24         sb-hugetlbfs-17  sb-proc-39       sb-securityfs-6  sb-tmpfs-35      sb-xfs-36      zspool-34
>
> 2) Get information about a specific shrinker:
>   $ cd sb-btrfs-24/
>   $ ls
>     count  count_memcg  count_memcg_node  count_node  scan  scan_memcg  scan_memcg_node  scan_node
>
> 3) Count objects on the system/root cgroup level
>   $ cat count
>     212
>
> 4) Count objects on the system/root cgroup level per numa node (on a 2-node machine)
>   $ cat count_node
>     209 3
>
> 5) Count objects for each memcg (output format: cgroup inode, count)
>   $ cat count_memcg
>     1 212
>     20 96
>     53 817
>     2297 2
>     218 13
>     581 30
>     911 124
>     <CUT>
>
> 6) Same but with a per-node output
>   $ cat count_memcg_node
>     1 209 3
>     20 96 0
>     53 810 7
>     2297 2 0
>     218 13 0
>     581 30 0
>     911 124 0
>     <CUT>
>
> 7) Don't display cgroups with less than 500 attached objects
>   $ echo 500 > count_memcg
>   $ cat count_memcg
>     53 817
>     1868 886
>     2396 799
>     2462 861
>
> 8) Don't display cgroups with less than 500 attached objects (sum over all nodes)
>   $ echo "500" > count_memcg_node
>   $ cat count_memcg_node
>     53 810 7
>     1868 886 0
>     2396 799 0
>     2462 861 0
>
> 9) Scan system/root shrinker
>   $ cat count
>     212
>   $ echo 100 > scan
>   $ cat scan
>     97
>   $ cat count
>     115
>
> 10) Scan individual memcg
>   $ echo "1868 500" > scan_memcg
>   $ cat scan_memcg
>     193
>
> 11) Scan individual node
>   $ echo "1 200" > scan_node
>   $ cat scan_node
>     2
>
> 12) Scan individual memcg and node
>   $ echo "1868 0 500" > scan_memcg_node
>   $ cat scan_memcg_node
>     435
>
> If the output doesn't fit into a single page, "...\n" is printed at the end of
> output.
>
>
> Roman Gushchin (5):
>   mm: introduce sysfs interface for debugging kernel shrinker
>   mm: memcontrol: introduce mem_cgroup_ino() and
>     mem_cgroup_get_from_ino()
>   mm: introduce memcg interfaces for shrinker sysfs
>   mm: introduce numa interfaces for shrinker sysfs
>   mm: provide shrinkers with names
>
>  arch/x86/kvm/mmu/mmu.c                        |   2 +-
>  drivers/android/binder_alloc.c                |   2 +-
>  drivers/gpu/drm/i915/gem/i915_gem_shrinker.c  |   3 +-
>  drivers/gpu/drm/msm/msm_gem_shrinker.c        |   2 +-
>  .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |   2 +-
>  drivers/gpu/drm/ttm/ttm_pool.c                |   2 +-
>  drivers/md/bcache/btree.c                     |   2 +-
>  drivers/md/dm-bufio.c                         |   2 +-
>  drivers/md/dm-zoned-metadata.c                |   2 +-
>  drivers/md/raid5.c                            |   2 +-
>  drivers/misc/vmw_balloon.c                    |   2 +-
>  drivers/virtio/virtio_balloon.c               |   2 +-
>  drivers/xen/xenbus/xenbus_probe_backend.c     |   2 +-
>  fs/erofs/utils.c                              |   2 +-
>  fs/ext4/extents_status.c                      |   3 +-
>  fs/f2fs/super.c                               |   2 +-
>  fs/gfs2/glock.c                               |   2 +-
>  fs/gfs2/main.c                                |   2 +-
>  fs/jbd2/journal.c                             |   2 +-
>  fs/mbcache.c                                  |   2 +-
>  fs/nfs/nfs42xattr.c                           |   7 +-
>  fs/nfs/super.c                                |   2 +-
>  fs/nfsd/filecache.c                           |   2 +-
>  fs/nfsd/nfscache.c                            |   2 +-
>  fs/quota/dquot.c                              |   2 +-
>  fs/super.c                                    |   2 +-
>  fs/ubifs/super.c                              |   2 +-
>  fs/xfs/xfs_buf.c                              |   2 +-
>  fs/xfs/xfs_icache.c                           |   2 +-
>  fs/xfs/xfs_qm.c                               |   2 +-
>  include/linux/memcontrol.h                    |   9 +
>  include/linux/shrinker.h                      |  25 +-
>  kernel/rcu/tree.c                             |   2 +-
>  lib/Kconfig.debug                             |   9 +
>  mm/Makefile                                   |   1 +
>  mm/huge_memory.c                              |   4 +-
>  mm/memcontrol.c                               |  23 +
>  mm/shrinker_debug.c                           | 792 ++++++++++++++++++
>  mm/vmscan.c                                   |  66 +-
>  mm/workingset.c                               |   2 +-
>  mm/zsmalloc.c                                 |   2 +-
>  net/sunrpc/auth.c                             |   2 +-
>  42 files changed, 957 insertions(+), 47 deletions(-)
>  create mode 100644 mm/shrinker_debug.c
>
> --
> 2.35.1
>


  parent reply	other threads:[~2022-04-20 22:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-16  0:27 Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 1/5] mm: introduce sysfs interface for debugging kernel shrinker Roman Gushchin
2022-04-16  1:35   ` Hillf Danton
2022-04-16  0:27 ` [PATCH rfc 2/5] mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 3/5] mm: introduce memcg interfaces for shrinker sysfs Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 4/5] mm: introduce numa " Roman Gushchin
2022-04-16  0:27 ` [PATCH rfc 5/5] mm: provide shrinkers with names Roman Gushchin
2022-04-18  9:27 ` [PATCH rfc 0/5] mm: introduce shrinker sysfs interface Mike Rapoport
2022-04-18 17:27   ` Roman Gushchin
2022-04-19  6:33     ` Mike Rapoport
2022-04-19 17:58       ` Roman Gushchin
2022-04-19  4:27 ` Andrew Morton
2022-04-19 17:52   ` Roman Gushchin
2022-04-19 18:25     ` Andrew Morton
2022-04-19 18:43       ` Roman Gushchin
2022-04-19 18:33     ` Greg KH
2022-04-19 18:20 ` Kent Overstreet
2022-04-19 18:58   ` Roman Gushchin
2022-04-19 19:46     ` Kent Overstreet
2022-04-19 18:36 ` Kent Overstreet
2022-04-19 18:50   ` Roman Gushchin
2022-04-19 21:10     ` Kent Overstreet
2022-04-20 22:24 ` Yang Shi [this message]
2022-04-20 23:23   ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHbLzkrOS12pi8WEXyUgYEQ4gy0S9iVrEeBp-2Ypyn=1bthZRA@mail.gmail.com' \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox