linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, kent.overstreet@linux.dev,
	mhocko@suse.com, vbabka@suse.cz, roman.gushchin@linux.dev,
	mgorman@suse.de, dave@stgolabs.net, willy@infradead.org,
	liam.howlett@oracle.com, corbet@lwn.net, void@manifault.com,
	peterz@infradead.org, juri.lelli@redhat.com,
	catalin.marinas@arm.com, will@kernel.org, arnd@arndb.de,
	tglx@linutronix.de, mingo@redhat.com,
	dave.hansen@linux.intel.com, x86@kernel.org, peterx@redhat.com,
	david@redhat.com, axboe@kernel.dk, mcgrof@kernel.org,
	masahiroy@kernel.org, nathan@kernel.org, dennis@kernel.org,
	tj@kernel.org, muchun.song@linux.dev, rppt@kernel.org,
	paulmck@kernel.org, pasha.tatashin@soleen.com,
	yosryahmed@google.com, yuzhao@google.com, dhowells@redhat.com,
	hughd@google.com, andreyknvl@gmail.com, keescook@chromium.org,
	ndesaulniers@google.com, vvvvvv@google.com,
	gregkh@linuxfoundation.org, ebiggers@google.com,
	ytcoode@gmail.com, vincent.guittot@linaro.org,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, bristot@redhat.com, vschneid@redhat.com,
	cl@linux.com, penberg@kernel.org, iamjoonsoo.kim@lge.com,
	42.hyeyoo@gmail.com, glider@google.com, elver@google.com,
	dvyukov@google.com, shakeelb@google.com,
	songmuchun@bytedance.com, jbaron@akamai.com, rientjes@google.com,
	minchan@google.com, kaleshsingh@google.com,
	kernel-team@android.com, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, iommu@lists.linux.dev,
	linux-arch@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, linux-modules@vger.kernel.org,
	kasan-dev@googlegroups.com, cgroups@vger.kernel.org
Subject: Re: [PATCH v3 00/35] Memory allocation profiling
Date: Wed, 14 Feb 2024 01:20:20 -0500	[thread overview]
Message-ID: <20240214062020.GA989328@cmpxchg.org> (raw)
In-Reply-To: <20240212213922.783301-1-surenb@google.com>

I'll do a more throrough code review, but before the discussion gets
too sidetracked, I wanted to add my POV on the overall merit of the
direction that is being proposed here.

I have backported and used this code for debugging production issues
before. Logging into a random host with an unfamiliar workload and
being able to get a reliable, comprehensive list of kernel memory
consumers is one of the coolest things I have seen in a long
time. This is a huge improvement to sysadmin quality of life.

It's also a huge improvement for MM developers. We're the first points
of contact for memory regressions that can be caused by pretty much
any driver or subsystem in the kernel.

I encourage anybody who is undecided on whether this is worth doing to
build a kernel with these patches applied and run it on their own
machine. I think you'll be surprised what you'll find - and how myopic
and uninformative /proc/meminfo feels in comparison to this. Did you
know there is a lot more to modern filesystems than the VFS objects we
are currently tracking? :)

Then imagine what this looks like on a production host running a
complex mix of filesystems, enterprise networking, bpf programs, gpus
and accelerators etc.

Backporting the code to a slightly older production kernel wasn't too
difficult. The instrumentation layering is explicit, clean, and fairly
centralized, so resolving minor conflicts around the _noprof renames
and the wrappers was pretty straight-forward.

When we talk about maintenance cost, a fair shake would be to weigh it
against the cost and reliability of our current method: evaluating
consumers in the kernel on a case-by-case basis and annotating the
alloc/free sites by hand; then quibbling with the MM community about
whether that consumer is indeed significant enough to warrant an entry
in /proc/meminfo, and what the catchiest name for the stat would be.

I think we can agree that this is vastly less scalable and more
burdensome than central annotations around a handful of mostly static
allocator entry points. Especially considering the rate of change in
the kernel as a whole, and that not everybody will think of the
comprehensive MM picture when writing a random driver. And I think
that's generous - we don't even have the network stack in meminfo.

So I think what we do now isn't working. In the Meta fleet, at any
given time the p50 for unaccounted kernel memory is several gigabytes
per host. The p99 is between 15% and 30% of total memory. That's a
looot of opaque resource usage we have to accept on faith.

For hunting down regressions, all it takes is one untracked consumer
in the kernel to really throw a wrench into things. It's difficult to
find in the noise with tracing, and if it's not growing after an
initial allocation spike, you're pretty much out of luck finding it at
all. Raise your hand if you've written a drgn script to walk pfns and
try to guess consumers from the state of struct page :)

I agree we should discuss how the annotations are implemented on a
technical basis, but my take is that we need something like this.

In a codebase of our size, I don't think the allocator should be
handing out memory without some basic implied tracking of where it's
going. It's a liability for production environments, and it can hide
bad memory management decisions in drivers and other subsystems for a
very long time.


  parent reply	other threads:[~2024-02-14  6:20 UTC|newest]

Thread overview: 202+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-12 21:38 Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 01/35] lib/string_helpers: Add flags param to string_get_size() Suren Baghdasaryan
2024-02-12 22:09   ` Kees Cook
2024-02-13  8:26   ` Andy Shevchenko
2024-02-13  8:29     ` Andy Shevchenko
2024-02-13 23:55       ` Kent Overstreet
2024-02-13 22:06     ` Kent Overstreet
2024-02-29 20:54       ` Andy Shevchenko
2024-02-14 20:11   ` Matthew Wilcox
2024-02-12 21:38 ` [PATCH v3 02/35] scripts/kallysms: Always include __start and __stop symbols Suren Baghdasaryan
2024-02-12 22:06   ` Kees Cook
2024-02-12 21:38 ` [PATCH v3 03/35] fs: Convert alloc_inode_sb() to a macro Suren Baghdasaryan
2024-02-12 22:07   ` Kees Cook
2024-02-12 21:38 ` [PATCH v3 04/35] mm: enumerate all gfp flags Suren Baghdasaryan
2024-02-12 22:10   ` Kees Cook
2024-02-12 21:38 ` [PATCH v3 05/35] mm: introduce slabobj_ext to support slab object extensions Suren Baghdasaryan
2024-02-12 22:14   ` Kees Cook
2024-02-13  2:20     ` Suren Baghdasaryan
2024-02-14 17:59   ` Vlastimil Babka
2024-02-14 19:19     ` Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 06/35] mm: introduce __GFP_NO_OBJ_EXT flag to selectively prevent slabobj_ext creation Suren Baghdasaryan
2024-02-12 22:14   ` Kees Cook
2024-02-12 21:38 ` [PATCH v3 07/35] mm/slab: introduce SLAB_NO_OBJ_EXT to avoid obj_ext creation Suren Baghdasaryan
2024-02-12 22:14   ` Kees Cook
2024-02-15 21:31   ` Vlastimil Babka
2024-02-15 21:37     ` Kent Overstreet
2024-02-15 21:50       ` Vlastimil Babka
2024-02-15 22:10         ` Suren Baghdasaryan
2024-02-16 18:41           ` Suren Baghdasaryan
2024-02-16 18:49             ` Vlastimil Babka
2024-02-12 21:38 ` [PATCH v3 08/35] mm: prevent slabobj_ext allocations for slabobj_ext and kmem_cache objects Suren Baghdasaryan
2024-02-12 22:15   ` Kees Cook
2024-02-15 21:44   ` Vlastimil Babka
2024-02-15 22:13     ` Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 09/35] slab: objext: introduce objext_flags as extension to page_memcg_data_flags Suren Baghdasaryan
2024-02-12 22:15   ` Kees Cook
2024-02-12 21:38 ` [PATCH v3 10/35] lib: code tagging framework Suren Baghdasaryan
2024-02-12 22:27   ` Kees Cook
2024-02-13  2:04     ` Suren Baghdasaryan
2024-02-16  7:22       ` Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 11/35] lib: code tagging module support Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 12/35] lib: prevent module unloading if memory is not freed Suren Baghdasaryan
2024-02-12 21:38 ` [PATCH v3 13/35] lib: add allocation tagging support for memory allocation profiling Suren Baghdasaryan
2024-02-12 22:40   ` Kees Cook
2024-02-13  1:01     ` Suren Baghdasaryan
2024-02-13 22:28       ` Darrick J. Wong
2024-02-13 22:35         ` Suren Baghdasaryan
2024-02-13 22:38           ` Kees Cook
2024-02-13 22:47             ` Steven Rostedt
2024-02-16  8:50             ` Vlastimil Babka
2024-02-16  8:55               ` Suren Baghdasaryan
2024-02-16 23:26     ` Kent Overstreet
2024-02-17  0:08       ` Kees Cook
2024-02-16  0:54   ` Andrew Morton
     [not found]     ` <wdj72247rptlp4g7dzpvgrt3aupbvinskx3abxnhrxh32bmxvt@pm3d3k6rn7pm>
     [not found]       ` <CA+CK2bBod-1FtrWQH89OUhf0QMvTar1btTsE0wfROwiCumA8tg@mail.gmail.com>
     [not found]         ` <iqynyf7tiei5xgpxiifzsnj4z6gpazujrisdsrjagt2c6agdfd@th3rlagul4nn>
2024-02-16  9:02           ` Suren Baghdasaryan
2024-02-16  9:03             ` Suren Baghdasaryan
2024-02-16 17:18             ` Pasha Tatashin
2024-02-17 20:10               ` Kent Overstreet
2024-02-16  8:57   ` Vlastimil Babka
2024-02-18  2:21     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 14/35] lib: introduce support for page allocation tagging Suren Baghdasaryan
2024-02-16  9:45   ` Vlastimil Babka
2024-02-16 16:44     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 15/35] mm: percpu: increase PERCPU_MODULE_RESERVE to accommodate allocation tags Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 16/35] change alloc_pages name in dma_map_ops to avoid name conflicts Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 17/35] mm: enable page allocation tagging Suren Baghdasaryan
2024-02-12 22:59   ` Kees Cook
2024-02-12 21:39 ` [PATCH v3 18/35] mm: create new codetag references during page splitting Suren Baghdasaryan
2024-02-16 14:33   ` Vlastimil Babka
2024-02-16 16:46     ` Suren Baghdasaryan
2024-02-18  0:44       ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 19/35] mm/page_ext: enable early_page_ext when CONFIG_MEM_ALLOC_PROFILING_DEBUG=y Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 20/35] lib: add codetag reference into slabobj_ext Suren Baghdasaryan
2024-02-16 15:36   ` Vlastimil Babka
2024-02-16 17:04     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 21/35] mm/slab: add allocation accounting into slab allocation and free paths Suren Baghdasaryan
2024-02-12 22:59   ` Kees Cook
2024-02-16 16:31   ` Vlastimil Babka
2024-02-16 16:38     ` Kent Overstreet
2024-02-16 17:11       ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 22/35] mm/slab: enable slab allocation tagging for kmalloc and friends Suren Baghdasaryan
2024-02-12 23:01   ` Kees Cook
2024-02-16 16:52   ` Vlastimil Babka
2024-02-16 17:03     ` Kent Overstreet
2024-02-12 21:39 ` [PATCH v3 23/35] mm/slub: Mark slab_free_freelist_hook() __always_inline Suren Baghdasaryan
2024-02-13  0:31   ` Kees Cook
2024-02-13  0:34     ` Suren Baghdasaryan
2024-02-13  2:08     ` Kent Overstreet
2024-02-14 15:13       ` Vlastimil Babka
2024-02-15  4:04         ` Liam R. Howlett
2024-02-12 21:39 ` [PATCH v3 24/35] mempool: Hook up to memory allocation profiling Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 25/35] xfs: Memory allocation profiling fixups Suren Baghdasaryan
2024-02-14 22:22   ` Dave Chinner
2024-02-14 22:36     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 26/35] mm: percpu: Introduce pcpuobj_ext Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 27/35] mm: percpu: Add codetag reference into pcpuobj_ext Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 28/35] mm: percpu: enable per-cpu allocation tagging Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 29/35] mm: vmalloc: Enable memory allocation profiling Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 30/35] rhashtable: Plumb through alloc tag Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 31/35] lib: add memory allocations report in show_mem() Suren Baghdasaryan
2024-02-13  0:10   ` Kees Cook
2024-02-13  0:22     ` Steven Rostedt
2024-02-13  4:33       ` Kent Overstreet
2024-02-13  8:17         ` Suren Baghdasaryan
2024-02-15  9:22   ` Michal Hocko
2024-02-15 14:58     ` Suren Baghdasaryan
2024-02-15 16:44       ` Michal Hocko
2024-02-15 16:47         ` Suren Baghdasaryan
2024-02-15 18:29           ` Kent Overstreet
2024-02-15 18:33             ` Suren Baghdasaryan
2024-02-15 18:38               ` Kent Overstreet
2024-02-15 18:41             ` Michal Hocko
2024-02-15 18:49               ` Suren Baghdasaryan
2024-02-15 20:22             ` Vlastimil Babka
2024-02-15 20:33               ` Kent Overstreet
2024-02-15 21:54                 ` Michal Hocko
2024-02-15 22:54                   ` Kent Overstreet
2024-02-15 23:07                 ` Steven Rostedt
2024-02-15 23:16                   ` Steven Rostedt
2024-02-15 23:27                     ` Steven Rostedt
2024-02-15 23:56                       ` Kent Overstreet
2024-02-19 17:17                         ` Suren Baghdasaryan
2024-02-20 16:23                           ` Michal Hocko
2024-02-20 17:18                             ` Kent Overstreet
2024-02-20 17:24                               ` Michal Hocko
2024-02-20 17:32                                 ` Kent Overstreet
2024-02-20 18:27                           ` Vlastimil Babka
2024-02-20 20:59                             ` Suren Baghdasaryan
2024-02-21 13:21                             ` Tetsuo Handa
2024-02-21 18:26                               ` Suren Baghdasaryan
2024-02-15 23:19                   ` Dave Hansen
2024-02-15 23:54                     ` Kent Overstreet
2024-02-15 23:51                   ` Kent Overstreet
2024-02-16  0:21                     ` Steven Rostedt
2024-02-16  0:32                       ` Kent Overstreet
2024-02-16  0:39                         ` Steven Rostedt
2024-02-16  0:50                           ` Kent Overstreet
2024-02-12 21:39 ` [PATCH v3 32/35] codetag: debug: skip objext checking when it's for objext itself Suren Baghdasaryan
2024-02-16 18:39   ` Vlastimil Babka
2024-02-19  1:04     ` Suren Baghdasaryan
2024-02-19  9:17       ` Vlastimil Babka
2024-02-19 16:55         ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 33/35] codetag: debug: mark codetags for reserved pages as empty Suren Baghdasaryan
2024-02-12 22:45   ` Kees Cook
2024-02-13  0:15     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 34/35] codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations Suren Baghdasaryan
2024-02-12 22:49   ` Kees Cook
2024-02-13  0:09     ` Suren Baghdasaryan
2024-02-12 21:39 ` [PATCH v3 35/35] MAINTAINERS: Add entries for code tagging and memory allocation profiling Suren Baghdasaryan
2024-02-12 22:43   ` Kees Cook
2024-02-13  0:33     ` Suren Baghdasaryan
2024-02-13  0:14 ` [PATCH v3 00/35] Memory " Pasha Tatashin
2024-02-13  0:29 ` Kees Cook
2024-02-13  0:47   ` Suren Baghdasaryan
2024-02-13 12:24 ` Michal Hocko
2024-02-13 21:58   ` Suren Baghdasaryan
2024-02-13 22:04     ` David Hildenbrand
2024-02-13 22:09       ` Kent Overstreet
2024-02-13 22:17         ` David Hildenbrand
2024-02-13 22:29           ` Kent Overstreet
2024-02-13 23:11             ` Darrick J. Wong
2024-02-13 23:24               ` Kent Overstreet
2024-02-13 22:30           ` Suren Baghdasaryan
2024-02-13 22:48             ` David Hildenbrand
2024-02-13 22:50               ` Kent Overstreet
2024-02-13 22:57                 ` David Hildenbrand
2024-02-13 22:59                 ` Suren Baghdasaryan
2024-02-13 23:02                   ` David Hildenbrand
2024-02-13 23:12                     ` Kent Overstreet
2024-02-13 23:22                       ` David Hildenbrand
2024-02-13 23:28                         ` Suren Baghdasaryan
2024-02-13 23:54                           ` Pasha Tatashin
2024-02-14  0:04                             ` Kent Overstreet
2024-02-14 10:01                           ` David Hildenbrand
2024-02-13 23:08                   ` Kent Overstreet
2024-02-14 10:20                     ` Vlastimil Babka
2024-02-14 16:38                       ` Kent Overstreet
2024-02-14 15:00                     ` Matthew Wilcox
2024-02-14 15:13                       ` Kent Overstreet
2024-02-14 13:23                   ` Michal Hocko
2024-02-14 16:55                   ` Andrew Morton
2024-02-14 17:14                     ` Suren Baghdasaryan
2024-02-14 17:52                     ` Kent Overstreet
2024-02-14 19:24                       ` Suren Baghdasaryan
2024-02-14 20:00                         ` Kent Overstreet
2024-02-14  6:20 ` Johannes Weiner [this message]
2024-02-14 14:46   ` Michal Hocko
2024-02-14 15:01     ` Kent Overstreet
2024-02-14 16:02       ` Michal Hocko
2024-02-14 16:17         ` Kent Overstreet
2024-02-14 16:31           ` Michal Hocko
2024-02-14 17:14             ` Suren Baghdasaryan
2024-02-14 18:44 ` Andy Shevchenko
2024-02-14 18:51   ` Suren Baghdasaryan
2024-02-14 18:53 ` Tim Chen
2024-02-14 19:09   ` Suren Baghdasaryan
2024-02-14 20:17     ` Yosry Ahmed
2024-02-14 20:30       ` Suren Baghdasaryan
2024-02-14 22:59         ` Tim Chen
2024-02-16  8:38 ` Jani Nikula
2024-02-16  8:42   ` Kent Overstreet
2024-02-16  9:07     ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240214062020.GA989328@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=arnd@arndb.de \
    --cc=axboe@kernel.dk \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=cgroups@vger.kernel.org \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=dhowells@redhat.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dvyukov@google.com \
    --cc=ebiggers@google.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=iommu@lists.linux.dev \
    --cc=jbaron@akamai.com \
    --cc=juri.lelli@redhat.com \
    --cc=kaleshsingh@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=keescook@chromium.org \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel-team@android.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-modules@vger.kernel.org \
    --cc=masahiroy@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=mingo@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=penberg@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=shakeelb@google.com \
    --cc=songmuchun@bytedance.com \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    --cc=vincent.guittot@linaro.org \
    --cc=void@manifault.com \
    --cc=vschneid@redhat.com \
    --cc=vvvvvv@google.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=yosryahmed@google.com \
    --cc=ytcoode@gmail.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox