From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7D7CC8303C for ; Tue, 8 Jul 2025 22:54:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C563C6B008A; Tue, 8 Jul 2025 18:54:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C07DE6B0092; Tue, 8 Jul 2025 18:54:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1D0F6B0095; Tue, 8 Jul 2025 18:54:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id A27236B008A for ; Tue, 8 Jul 2025 18:54:13 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id DB28A1A0259 for ; Tue, 8 Jul 2025 22:54:12 +0000 (UTC) X-FDA: 83642602344.24.31449FF Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf03.hostedemail.com (Postfix) with ESMTP id DA26C2000D for ; Tue, 8 Jul 2025 22:54:10 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=KucOM3tY; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf03.hostedemail.com: domain of cachen@purestorage.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cachen@purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1752015251; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Hahesr2E12PeN/VYRYbSetT9Vks8JpO0KFuDlwFlQmU=; b=hCb3oSlaBNMULreJoOc4vMomMyo3UlDKHQCHBZ7UYeuy+2JeHjWKkKDfUDkqv0CtOW0FF1 gcpa4+HraqyZTdnsP/FdSUgzxQgQT3dHhwlDIFqtK0+5cgMrSj3BNzPr0ofQSXH8j+Y7Bt N3q15/d7DY+fFMvAglnB32sQkmsbFZ8= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=KucOM3tY; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf03.hostedemail.com: domain of cachen@purestorage.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=cachen@purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1752015251; a=rsa-sha256; cv=none; b=nuhMYYNRCUgbKr3UbcuSScSa5fNY1s2M0HmM66MUkrERyX0FMqKwW0Rd5X0HI4gNdeDHqq vFxPCncg5+muaGfN4l/xgs3H4YG7aAQk0E/SLxczTFdlqe7AHfI2P0Y8JCG3Rxw2/eE+qa uXhJBTgVeOoIH8TWsO9VlLZ1XZeb/F8= Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-23c71b21f72so6633885ad.2 for ; Tue, 08 Jul 2025 15:54:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1752015249; x=1752620049; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Hahesr2E12PeN/VYRYbSetT9Vks8JpO0KFuDlwFlQmU=; b=KucOM3tYMauwZQkal4JjBYybYOxqYhWOMg3rdj+t4Pp+AaLUetpeXRGKgM1e4SVNYe lCBx6dGEshU2yoKABHViimzIGdpoz3Z4bDERLy4j2v5EDM35s43/rRHIuYAdmAUJWUHD +M/NUbZeYoP5LUK1UAIMeoFbmOz/VFYZxmn5ebin8YpoXZj1IjFaXpSyYJK9FCbcfAHw YuNdqn6ort88bvXyjBjv1CTIger9mtMHACvYGidnFkRHMieMZ/Ta4CegydM9WteaIuEx ANqGZXMZiDgjjGgvF/VYe6h3TONV/zmcnKFFev+vfSbW+my5xR0MNAm1DicuowLNaBoU ILfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1752015249; x=1752620049; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Hahesr2E12PeN/VYRYbSetT9Vks8JpO0KFuDlwFlQmU=; b=doPhuMdWV8X1/tErCVUQYvf4rn+95jLfHgKgwdannxUNmycvWl/BdTceLhrQcmpkGN TAw8VNimTYuiIRYzu5nDDPdxtvpqpwjacwoLJ+vCLydfHw71og8yKNe1cijYwfaGxPK7 ziqe9zZ1Blm9N9fR53Rer8JU54Glqj4/CCB07WXZ//pltEcTR2/0PZCIoNjWC6W//oDo hDX1IUIWdAStUxhg+Rfdf2DmY6GEANiBJ3UrqCbVguTQ6asVxUOA8wnDVweGaKuzhRX6 aHhqbC3uxZhIA+U+bMSVNZ9AMG4a0rWEqJ9KDQps1lreOP9E6ky8eDVP2L3KsJMHLGMh mP3g== X-Forwarded-Encrypted: i=1; AJvYcCUgfLD7azrY+kKsiPeVEXXRNpstvMMA8t/CRKL8hmlpRcrXHC7j0dfuNDVXh4wPAUy6Xc5kkkDR+Q==@kvack.org X-Gm-Message-State: AOJu0Yyl/TfMq3fmbRhl0kD6mKR/S+BVz92WrDFsd42ClrRRZWtMzGku ktxwmgEST0NXeutwawLQi3pVLqw4+rpkA7U+YVT43X31SbNbe6c6MKBJEpimiJj3A00XXlTUbAj 2OvEgZlr2qh1jK7bBKotFU/8ClGPVEKQK1IHZHgYiLQ== X-Gm-Gg: ASbGncsIQ9vEimT9RK1YUtpS/yW0yIHo/O2k813K4WLujeql6amfPmDgCh9f1R7yhNT ukMcgVGCet+O4nWfCDCDQkICvb0ekvLQ0dwe4LSB+lXTBGzJlfDCNTBjDLMrMCmhMhdbId0y0yo KoPMSJ1b/Q5yqHCAK12bWN8YDLBOwOCa+kdkTlkxTP9WQ= X-Google-Smtp-Source: AGHT+IGADx26dAcrzfxKAjhpC7u72M+Z3rBDoztyFhptmMakIfiBbcyOWbNlb53JBjV1bnb6ehXtodyNg+HlxCeG/aQ= X-Received: by 2002:a17:902:e547:b0:235:f45f:ed35 with SMTP id d9443c01a7336-23ddb192debmr1447025ad.2.1752015249456; Tue, 08 Jul 2025 15:54:09 -0700 (PDT) MIME-Version: 1.0 References: <20250610233053.973796-1-cachen@purestorage.com> <3c9b5773-83ed-4f13-11a8-fcc162c8c483@google.com> In-Reply-To: <3c9b5773-83ed-4f13-11a8-fcc162c8c483@google.com> From: Casey Chen Date: Tue, 8 Jul 2025 15:53:58 -0700 X-Gm-Features: Ac12FXx2hupUJnBnCUnV7-fMW4YB846qp-c61t_mbQIUyqncD6t9tHRQGTk-b38 Message-ID: Subject: Re: [PATCH] alloc_tag: add per-NUMA node stats To: David Rientjes Cc: Kent Overstreet , Andrew Morton , surenb@google.com, corbet@lwn.net, dennis@kernel.org, tj@kernel.org, cl@gentwo.org, Vlastimil Babka , mhocko@suse.com, jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com, roman.gushchin@linux.dev, harry.yoo@oracle.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, yzhong@purestorage.com, Sourav Panda Content-Type: multipart/alternative; boundary="00000000000095df56063972d677" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: DA26C2000D X-Stat-Signature: 773dnke59bdsir4dffdg495ndmreqd1c X-Rspam-User: X-HE-Tag: 1752015250-193590 X-HE-Meta: U2FsdGVkX19KB8DFLJu/F13Pvt789DduqSucJ2Irf8B3pu9RleuMTC40mmA5OmqKeUfCsPtINscbGivE98rwHdHB+/lHQbSKmx1Ow1CrzHRLPvHY3mNUag4mvziKncMo1h/40H2ozxAqbjpmCpIfuSsMO3XMA5pWz1Cz9dNdekoKNllOXMWTZU1spDlCtxjlyi7fuegtkyQOgNxjefG+LXERDD+UPi1XoaocrrOzZbrk8Vdn1gzA9n60ArxjRRoWscKMgXw9JkFdovQEGaf3y767TT9uEvZWV+VNaj4PPKEMcnSSqZ/2KVT3PkCDEXKypd37OOMAKpC7qEJa0QRsMRtNqojwabBk2LaTGQA/X5zVyolrtNwxNtGHVv6q4gk0ygZmj3fg0dy5RrwjDCfoY4vJSh+EkcfbpD7TwqXPR8E//sCxgUOZtvxb2G/lSbQB1d8JWLDF8x9ky84wHpO4CY+nx6NNJJc2aGj4lIM5hBw885Xuokre3azSeWczpV6GqQkPlAJ3yKZCdKOLa1Ed3xc1p4s/Au8Xl3l+Vuw4RdjtpJFwDNJ3iPUB9lxg1poeYflc0FB/lji5GqdAvUMhQvNQ5R3FM+fvFbkpTarsi1AVYgEO0vu8+EZc9CjmohshyJ948osXMOIFBneAOq8g3NHcXdi99cp8FFIORkLbCMJxYoKv0dF38bP+cPcA1HTFIGKp/NQndvfvnsC1+7FXSxm3wwE55qY7FoxLGPOKTZ1yBzhMYdt2qEyC3/LC931yz8Skna1lZUeLSkDZn/xG50nC58oIiVRpovQNdFDupM1lUbs+JzIr8ItLxpfhpue82oJktIOIRiA8P0mJdhreDwXZtz9r8IRqbxrZ2Oo31cAOuuxxFsdtbFaV/KD+WMWnbi30uNdMAVTfvi5xlAalGTe8cZm3NrQ8ZUcohZFjIIOwoKX/xSssF/2zoWahiCUfdTLnLPe+l4Tjvoxuikc 4mEIyF7a /NwZfYuIwOpJijChjy31m9UtMO0Bb0kUlaHUMLImeO4aHXMJxnv6sWjORlfuh0JxPrF6Ii//bUk5g4M4Ft7QHgoZz/r9/nuRbXEnODJAj6fe3TJGXhqVIhRFLFAlh28RqHW1cYA7sMSLAqwjHfiNANyAABaC9wwaXw2/2sZvDeTACpkQ7BvwVI1F83Wh2sHtO57/TbpR7NtS2qpqQXnOVf6ib9aCOfd5FTzDqBA6gCAKILckcxUwWvt2JIf9xVPn0uFBYm4RiKb33EKtxiBXUaZ1CfxLwMUElT8ANtpyhaRuA9z9Yn+GBFC6xwMIfBf+cjGVZD0hH35Swi6D5HgzHmNS9NuqRc4tENB754cWKOuosS//HdsijPwN+MSJuI5fQcuUN1WZabHoRCY0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --00000000000095df56063972d677 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jul 8, 2025 at 2:52=E2=80=AFPM David Rientjes = wrote: > On Wed, 18 Jun 2025, Kent Overstreet wrote: > > > On Tue, Jun 10, 2025 at 05:30:53PM -0600, Casey Chen wrote: > > > Add support for tracking per-NUMA node statistics in /proc/allocinfo. > > > Previously, each alloc_tag had a single set of counters (bytes and > > > calls), aggregated across all CPUs. With this change, each CPU can > > > maintain separate counters for each NUMA node, allowing finer-grained > > > memory allocation profiling. > > > > > > This feature is controlled by the new > > > CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS option: > > > > > > * When enabled (=3Dy), the output includes per-node statistics follow= ing > > > the total bytes/calls: > > > > > > > > > ... > > > 315456 9858 mm/dmapool.c:338 func:pool_alloc_page > > > nid0 94912 2966 > > > nid1 220544 6892 > > > 7680 60 mm/dmapool.c:254 func:dma_pool_create > > > nid0 4224 33 > > > nid1 3456 27 > > > > I just received a report of memory reclaim issues where it seems DMA32 > > is stuffed full. > > > > So naturally, instrumenting to see what's consuming DMA32 is going to b= e > > the first thing to do, which made me think of your patchset. > > > > I wonder if we should think about something a bit more general, so it's > > easy to break out accounting different ways depending on what we want t= o > > debug. > > > > Right, per-node memory attribution, or per zone, is very useful. > > Casey, what's the latest status of your patch? Using alloc_tag for > attributing memory overheads has been exceedingly useful for Google Cloud > and adding better insight it for per-node breakdown would be even better. > > Our use case is quite simple: we sell guest memory to the customer as > persistent hugetlb and keep some memory on the host for ourselves (VMM, > host userspace, host kernel). We track every page of that overhead memor= y > because memory pressure here can cause all sorts of issues like userspace > unresponsiveness. We also want to sell as much guest memory as possible > to avoid stranding cpus. > > To do that, per-node breakdown of memory allocations would be a tremendou= s > help. We have memory that is asymmetric for NUMA, even for memory that > has affinity to the NIC. Being able to inspect the origins of memory for > a specific NUMA node that is under memory pressure where other NUMA nodes > are not under memory pressure would be excellent. > > Adding Sourav Panda as well as he may have additional thoughts on this. > Thanks for your interest. I have been busy with other team projects and will get back on this soon. I will address comments and send out a new code review. --00000000000095df56063972d677 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Tue, Jul 8, = 2025 at 2:52=E2=80=AFPM David Rientjes <rientjes@google.com> wrote:
On Wed, 18 Jun 2025, Kent Overstreet wrote:

> On Tue, Jun 10, 2025 at 05:30:53PM -0600, Casey Chen wrote:
> > Add support for tracking per-NUMA node statistics in /proc/alloci= nfo.
> > Previously, each alloc_tag had a single set of counters (bytes an= d
> > calls), aggregated across all CPUs. With this change, each CPU ca= n
> > maintain separate counters for each NUMA node, allowing finer-gra= ined
> > memory allocation profiling.
> >
> > This feature is controlled by the new
> > CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS option:
> >
> > * When enabled (=3Dy), the output includes per-node statistics fo= llowing
> >=C2=A0 =C2=A0the total bytes/calls:
> >
> > <size> <calls> <tag info>
> > ...
> > 315456=C2=A0 =C2=A0 =C2=A0 =C2=A09858=C2=A0 =C2=A0 =C2=A0mm/dmapo= ol.c:338 func:pool_alloc_page
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nid0=C2=A0 =C2=A0 =C2=A094912=C2= =A0 =C2=A0 =C2=A0 =C2=A0 2966
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nid1=C2=A0 =C2=A0 =C2=A0220544= =C2=A0 =C2=A0 =C2=A0 =C2=A06892
> > 7680=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A060=C2=A0 =C2=A0 =C2=A0 =C2= =A0mm/dmapool.c:254 func:dma_pool_create
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nid0=C2=A0 =C2=A0 =C2=A04224=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A033
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0nid1=C2=A0 =C2=A0 =C2=A03456=C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A027
>
> I just received a report of memory reclaim issues where it seems DMA32=
> is stuffed full.
>
> So naturally, instrumenting to see what's consuming DMA32 is going= to be
> the first thing to do, which made me think of your patchset.
>
> I wonder if we should think about something a bit more general, so it&= #39;s
> easy to break out accounting different ways depending on what we want = to
> debug.
>

Right, per-node memory attribution, or per zone, is very useful.

Casey, what's the latest status of your patch?=C2=A0 Using alloc_tag fo= r
attributing memory overheads has been exceedingly useful for Google Cloud <= br> and adding better insight it for per-node breakdown would be even better.= =C2=A0

Our use case is quite simple: we sell guest memory to the customer as
persistent hugetlb and keep some memory on the host for ourselves (VMM, host userspace, host kernel).=C2=A0 We track every page of that overhead me= mory
because memory pressure here can cause all sorts of issues like userspace <= br> unresponsiveness.=C2=A0 We also want to sell as much guest memory as possib= le
to avoid stranding cpus.

To do that, per-node breakdown of memory allocations would be a tremendous =
help.=C2=A0 We have memory that is asymmetric for NUMA, even for memory tha= t
has affinity to the NIC.=C2=A0 Being able to inspect the origins of memory = for
a specific NUMA node that is under memory pressure where other NUMA nodes <= br> are not under memory pressure would be excellent.

Adding Sourav Panda as well as he may have additional thoughts on this.
=

Thanks for your interest. I have been busy= with other team projects and will get back on this soon. I will address co= mments and send out a new code review.=C2=A0
--00000000000095df56063972d677--