linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: David Wang <00107082@163.com>
Cc: kent.overstreet@linux.dev, Hao Ge <hao.ge@linux.dev>,
	akpm@linux-foundation.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Hao Ge <gehao@kylinos.cn>,
	 Alessio Balsini <balsini@google.com>,
	Pasha Tatashin <tatashin@google.com>,
	 Sourav Panda <souravpanda@google.com>
Subject: Re: [PATCH] tools/mm: Introduce a tool to handle entries in allocinfo
Date: Mon, 13 Jan 2025 13:47:50 -0800	[thread overview]
Message-ID: <CAJuCfpGOLz-GodPq4xz+R7Gf7tDHPEHopRFFNgBRazoFkteKCA@mail.gmail.com> (raw)
In-Reply-To: <48f208b6.32ab.19455c70dbe.Coremail.00107082@163.com>

On Sat, Jan 11, 2025 at 6:32 AM David Wang <00107082@163.com> wrote:
>
> Hi,

Hi David,
Sorry for the delay. I'm not ignoring your input, I'm just a bit busy
and didn't have time to properly reply to your questions.

>
> I have using this feature for a long while, and I believe this memory alloc profiling feature
> is quite powerful.
>
> But, I have been wondering how to use this data, specifically:
> how anomaly could be detected, what pattern should be defined as anomaly?
>
> So far, I have tools collecting those data (via prometheus), make basic analysis, i.e. top-k, group-by or rate.
> Those analysis help me understand my system, but I cannot tell whether it is abnormal or not.
>
> And sometimes I would just read through /proc/allocinfo, trying to pickup something.
> (Sometimes get lucky, actually only once, find the underflow problem weeks ago.)
>
> A tool would be more helpful if it can identify anomalies, and we can add more pattern as develop along.

You are absolutely correct. An automatic detection of problematic
patterns would be the ultimate goal. We are analyzing the data we
collect and trying to come up with strategies for identifying such
patterns. Simple and obvious pattern for a leak would be constant
growth but there might be others like sawtooth pattern or spikes which
could point to opportunities to optimize the usage by employing object
pools/caches. Categorizing allocations into hierarchical groups and
measuring per-group consumption might be another useful technique we
are considering. All this is in quite early stages, so ideas and
suggestions from people using this API would be very valuable.

>
> A pattern may be hard to define, especially when it involves context. For example,
> I happened to notice following strange things recently:
>
>          896       14 kernel/sched/topology.c:2275 func:__sdt_alloc 1025
>          896       14 kernel/sched/topology.c:2266 func:__sdt_alloc 1025
>           96        6 kernel/sched/topology.c:2259 func:__sdt_alloc 1025
>        12288       24 kernel/sched/topology.c:2252 func:__sdt_alloc 1025    <----- B
>            0        0 kernel/sched/topology.c:2242 func:__sdt_alloc 210
>            0        0 kernel/sched/topology.c:2238 func:__sdt_alloc 210
>            0        0 kernel/sched/topology.c:2234 func:__sdt_alloc 210
>            0        0 kernel/sched/topology.c:2230 func:__sdt_alloc 210     <----- A
> Code A
> 2230                 sdd->sd = alloc_percpu(struct sched_domain *);
> 2231                 if (!sdd->sd)
> 2232                         return -ENOMEM;
> 2233
>
> Code B
> 2246                 for_each_cpu(j, cpu_map) {
>                              ...
>
> 2251
> 2252                         sd = kzalloc_node(sizeof(struct sched_domain) + cpumask_size(),
> 2253                                         GFP_KERNEL, cpu_to_node(j));
> 2254                         if (!sd)
> 2255                                 return -ENOMEM;
> 2256
> 2257                         *per_cpu_ptr(sdd->sd, j) = sd;
>
>
> The address of memory alloced by 'Code B', is stored in memory "Code A', the allocation counter for 'Code A'
> is *0*, while 'Code B' is not *0*.  Something odd happens here, either it is expected and some ownership changes happened somewhere
> , or it is a leak, or it is an accounting problem.
>
> If a tool can help identify this kind of pattern, that would be great!~

Hmm. I don't see an easy way to identify such code dependencies from
allocinfo data alone. I think that would involve some sophisticated
code analysis tooling.

>
>
> Any suggestions about how to proceed with the memory problem of kernel/sched/topology.c mentioneded
>  above?, or is it a problem at all?

From your follow-up email, it looks like you already found the answer :)
Thanks,
Suren.

>
>
> Thanks
> David
>
>
>
>
> At 2025-01-07 05:11:47, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Mon, Jan 6, 2025 at 3:22 AM Hao Ge <hao.ge@linux.dev> wrote:
> >>
> >> From: Hao Ge <gehao@kylinos.cn>
> >>
> >> Some users always say that the information provided by /proc/allocinfo
> >> is too extensive or bulky.
> >>
> >
> >CC'ing Alessio along with Pasha and Sourav who were interested in such a tool.
> >
> >Hi Hao,
> >Thanks for the tool! Actually Alessio just developed a tool called
> >alloctop (similar to slabtop) which I think will do what you want and
> >more. It supports sorting, filtering, continuous update, etc. It's
> >written in Rust and we are planning to upstream it once we finish
> >testing and evaluating it on Android. Please take a look and see if it
> >fits your usecase. Please also note that this tool has been
> >implemented just last week, so hot off the press and might have some
> >early bugs.
> >Thanks,
> >Suren.
> >
> >[1] https://cs.android.com/android/platform/superproject/main/+/main:system/memory/libmeminfo/tools/alloctop/src/
> >
> >>
>


  parent reply	other threads:[~2025-01-13 21:48 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-06 11:21 Hao Ge
2025-01-06 21:11 ` Suren Baghdasaryan
2025-01-07 15:11   ` Alessio Balsini
2025-01-08  1:16     ` Hao Ge
2025-01-11 14:31   ` David Wang
2025-01-12  4:41     ` David Wang
2025-01-13  8:03       ` memory alloc profiling seems not work properly during bootup? David Wang
2025-01-13 21:56         ` Suren Baghdasaryan
2025-01-14  3:35           ` David Wang
2025-01-14 18:48             ` Suren Baghdasaryan
2025-01-15  1:27               ` David Wang
2025-01-20 21:03                 ` Suren Baghdasaryan
2025-01-13 21:47     ` Suren Baghdasaryan [this message]
2025-01-09  0:19 ` [PATCH] tools/mm: Introduce a tool to handle entries in allocinfo kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJuCfpGOLz-GodPq4xz+R7Gf7tDHPEHopRFFNgBRazoFkteKCA@mail.gmail.com \
    --to=surenb@google.com \
    --cc=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=balsini@google.com \
    --cc=gehao@kylinos.cn \
    --cc=hao.ge@linux.dev \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=souravpanda@google.com \
    --cc=tatashin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox