Re: [PATCH 1/2] mm/mempolicy: track page allocations per mempolicy

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "JP Kobryn (Meta)" <inwardvessel@gmail.com>
To: Vlastimil Babka <vbabka@suse.cz>, linux-mm@kvack.org
Cc: apopple@nvidia.com, akpm@linux-foundation.org,
	axelrasmussen@google.com, byungchul@sk.com,
	cgroups@vger.kernel.org, david@kernel.org, eperezma@redhat.com,
	gourry@gourry.net, jasowang@redhat.com, hannes@cmpxchg.org,
	joshua.hahnjy@gmail.com, Liam.Howlett@oracle.com,
	linux-kernel@vger.kernel.org, lorenzo.stoakes@oracle.com,
	matthew.brost@intel.com, mst@redhat.com, mhocko@suse.com,
	rppt@kernel.org, muchun.song@linux.dev,
	zhengqi.arch@bytedance.com, rakie.kim@sk.com,
	roman.gushchin@linux.dev, shakeel.butt@linux.dev,
	surenb@google.com, virtualization@lists.linux.dev,
	weixugc@google.com, xuanzhuo@linux.alibaba.com,
	ying.huang@linux.alibaba.com, yuanchu@google.com, ziy@nvidia.com,
	kernel-team@meta.com
Subject: Re: [PATCH 1/2] mm/mempolicy: track page allocations per mempolicy
Date: Fri, 13 Feb 2026 11:56:15 -0800	[thread overview]
Message-ID: <fd56ae2c-64ac-46bd-bcb2-503df995a6a1@gmail.com> (raw)
In-Reply-To: <d52066f1-c83e-4406-adca-5a403adb4f44@suse.cz>

On 2/13/26 12:54 AM, Vlastimil Babka wrote:
> On 2/12/26 22:25, JP Kobryn wrote:
>> On 2/12/26 7:24 AM, Vlastimil Babka wrote:
>>> On 2/12/26 05:51, JP Kobryn wrote:
>>>> It would be useful to see a breakdown of allocations to understand which
>>>> NUMA policies are driving them. For example, when investigating memory
>>>> pressure, having policy-specific counts could show that allocations were
>>>> bound to the affected node (via MPOL_BIND).
>>>>
>>>> Add per-policy page allocation counters as new node stat items. These
>>>> counters can provide correlation between a mempolicy and pressure on a
>>>> given node.
>>>>
>>>> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
>>>> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
>>>
>>> Are the numa_{hit,miss,etc.} counters insufficient? Could they be extended
>>> in a way that would capture any missing important details? A counter per
>>> policy type seems exhaustive, but then on one hand it might be not important
>>> to distinguish beetween some of them, and on the other hand it doesn't track
>>> the nodemask anyway.
>>
>> The two patches of the series should complement each other. When
>> investigating memory pressure, we could identify the affected nodes
>> (patch 2). Then we can cross-reference the policy-specific stats to find
>> any correlation (this patch).
>>
>> I think extending numa_* counters would call for more permutations to
>> account for the numa stat per policy. I think distinguishing between
>> MPOL_DEFAULT and MPOL_BIND is meaningful, for example. Am I
> 
> Are there other useful examples or would it be enough to add e.g. a
> numa_bind counter to the numa_hit/miss/etc?

Aside from bind, it's worth emphasizing that with default policy
tracking we could see if the local node is the source of pressure. In
the interleave case, we would be able to see if the loads are being
balanced or, in the weighted case, being distributed properly.

On extending the numa stats instead, I looked into this some more. I'm
not sure if they're a good fit. They seem more about whether the
allocator succeeded at placement rather than which policy drove the
allocation. Thoughts?

> What I'm trying to say the level of detail you are trying to add to the
> always-on counters seems like more suitable for tracepoints. The counters
> should be limited to what's known to be useful and not "everything we are
> able to track and possibly could need one day".
In a triage scenario, having the stats collected up to the time of the
reported issue would be better. We make use of the tool called below[0].
It periodically samples the system and allows us to view the
historical state prior to the issue. If we started at the time of the
incident and attached tracepoints it would be too late.

The triage workflow would look like this:
1) Pressure/OOMs reported while system-wide memory is free.
2) Check per-node pgscan/pgsteal stats (provided by patch 2) to narrow
down node(s) under pressure.
3) Check per-policy allocation counters (this patch) on that node to
find what policy was driving it.

[0] https://github.com/facebookincubator/below

next prev parent reply	other threads:[~2026-02-13 21:43 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-12  4:51 [PATCH 0/2] improve per-node allocation and reclaim visibility JP Kobryn
2026-02-12  4:51 ` [PATCH 1/2] mm/mempolicy: track page allocations per mempolicy JP Kobryn
2026-02-12  7:29   ` Michal Hocko
2026-02-12 21:22     ` JP Kobryn
2026-02-16  8:26       ` Michal Hocko
2026-02-16 17:50         ` JP Kobryn (Meta)
2026-02-16 21:07           ` Michal Hocko
2026-02-17  7:48             ` JP Kobryn (Meta)
2026-02-17 12:37               ` Michal Hocko
2026-02-17 18:19                 ` JP Kobryn (Meta)
2026-02-17 18:52                   ` Michal Hocko
2026-02-12 15:07   ` Shakeel Butt
2026-02-12 21:23     ` JP Kobryn
2026-02-12 15:24   ` Vlastimil Babka
2026-02-12 21:25     ` JP Kobryn
2026-02-13  8:54       ` Vlastimil Babka
2026-02-13 19:56         ` JP Kobryn (Meta) [this message]
2026-02-18  4:25   ` kernel test robot
2026-02-12  4:51 ` [PATCH 2/2] mm: move pgscan and pgsteal to node stats JP Kobryn
2026-02-12  7:08   ` Michael S. Tsirkin
2026-02-12 21:23     ` JP Kobryn
2026-02-12  7:29   ` Michal Hocko
2026-02-12 21:20     ` JP Kobryn
2026-02-12  4:57 ` [PATCH 0/2] improve per-node allocation and reclaim visibility Matthew Wilcox
2026-02-12 21:22   ` JP Kobryn
2026-02-12 21:53     ` Matthew Wilcox
2026-02-12 18:08 ` [syzbot ci] " syzbot ci

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fd56ae2c-64ac-46bd-bcb2-503df995a6a1@gmail.com \
    --to=inwardvessel@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=byungchul@sk.com \
    --cc=cgroups@vger.kernel.org \
    --cc=david@kernel.org \
    --cc=eperezma@redhat.com \
    --cc=gourry@gourry.net \
    --cc=hannes@cmpxchg.org \
    --cc=jasowang@redhat.com \
    --cc=joshua.hahnjy@gmail.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=matthew.brost@intel.com \
    --cc=mhocko@suse.com \
    --cc=mst@redhat.com \
    --cc=muchun.song@linux.dev \
    --cc=rakie.kim@sk.com \
    --cc=roman.gushchin@linux.dev \
    --cc=rppt@kernel.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=virtualization@lists.linux.dev \
    --cc=weixugc@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --cc=ying.huang@linux.alibaba.com \
    --cc=yuanchu@google.com \
    --cc=zhengqi.arch@bytedance.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox