From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC41AC5AD49 for ; Mon, 2 Jun 2025 23:35:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 120C96B0365; Mon, 2 Jun 2025 19:35:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F8ED6B0366; Mon, 2 Jun 2025 19:35:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 035986B0367; Mon, 2 Jun 2025 19:35:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D9E6F6B0365 for ; Mon, 2 Jun 2025 19:35:34 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 628971D6043 for ; Mon, 2 Jun 2025 23:35:34 +0000 (UTC) X-FDA: 83512069788.26.F584151 Received: from out-176.mta1.migadu.com (out-176.mta1.migadu.com [95.215.58.176]) by imf09.hostedemail.com (Postfix) with ESMTP id 5072B140004 for ; Mon, 2 Jun 2025 23:35:32 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=GLgzXyAY; spf=pass (imf09.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748907332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aKbpJ81J6dMLOAbC1m5Rqk2tKUtu6ElqrVEvBLWSN00=; b=8QhgTQ45Uq3sYv42wAChi7Z6Gk63OKt+mQTrMBNbbPQnQ+deYNPPxlOIpnudIdKEYnIrKP I/nlbfTxGXv2D/t6pq/tf26i/qpTVzqGXSql5tKx2aTOzxtLFrzjnDuxnUMXAM1vWPgGh+ 7J4gbtCSbWz0JSaruZHwZFsnPIG2ZTA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=GLgzXyAY; spf=pass (imf09.hostedemail.com: domain of kent.overstreet@linux.dev designates 95.215.58.176 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748907332; a=rsa-sha256; cv=none; b=xUVp9Cael2blfoQJSDFD9JcQPTHsCzA/2gpGuvS0f0qtfuW1GgNeuGwXykVIU79ILc34k6 TJwn2esTB1/ve/iFdRXJDBW4nNSKvlj90mtqqZIdC+5OdPGJ8ItXqDSuCEiQSkYoiEafMc EOvDX9xIMq5rrYRrHR3/6Hn5YjlVHKs= Date: Mon, 2 Jun 2025 19:35:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1748907330; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=aKbpJ81J6dMLOAbC1m5Rqk2tKUtu6ElqrVEvBLWSN00=; b=GLgzXyAYfn/BXZdR0GBMWKUgVYYtS4kJnwwWwX21UJWUtA/JjeEcKQFvdoj4rtddDDjb4d a2WKHNeNwE0B+CQPcdfAh+giEbopw39KUM0Ea5xUDIvJR8U+Nr2+zFPQNPt8X8MKsJwyts SnYUbfVCnLEhoPS+JKZBpQux6fxtVr0= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Steven Rostedt Cc: Casey Chen , linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com, Peter Zijlstra , Ingo Molnar , Namhyung Kim , Masami Hiramatsu , Arnaldo Carvalho de Melo , Ian Rogers Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats Message-ID: References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> <20250602180826.3a0aafc0@gandalf.local.home> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20250602180826.3a0aafc0@gandalf.local.home> X-Migadu-Flow: FLOW_OUT X-Stat-Signature: inuay97i6coo5r4hw9u8qewmfp7goxgs X-Rspamd-Queue-Id: 5072B140004 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748907332-371566 X-HE-Meta: U2FsdGVkX1/zezwOmMyhP+KjDKpP/0bVTaEjAKOPcgrXyFJKL4i0avAodb2xwIT7f2W788JoO2En2VtWv7Y/0wi2Wra9gRxRoOARLKWeZwsHhgh60ZyXgTj+H1W1fK08cObGR+3SOrGlqw078rv7ceeouOhxmPONn0oL4iQRVbXV+c0JnzrOj0PmBuqcuG+mBmA7vdBjuMoXKkhlh0/5rmrgfhu+fQCabvvudqNn87qWghl4A7nQHppZowPa9HPAs+CBQQ+of4ggzx13fng3nGqDKUkSai38iqRV8KXnytXXyUcPjUR32kkv7DbfSXX56ptxvyWMPVxxgCSWvc/jTDG/W8A4Sw5Y42seziqVC1RD1tDvT+05MHPc7xyn6qXl5CTvMj2JnEvluug4BvHxsoiNHA48ZqM+z/he9LIHsL4ohQFdQliUtSlDJmqGZtr3L30OZYcs130zLKAzioYZ34KM0fPq4xDgUFzjSv2XcOKoqK7iQMr9pKag+SezSBm00QZj0Nn3k6GppP98GEu+yS4Iq5t8cZ9hjhw6dTlWeNQrDCcW+PhOA5LeZKH7pur1mXSrmHTWHGANT3AUr2rtYKZz9A1khaf/tVzqh5OgTNhcyYVJvV0zWuLm38pwQvjikyBIisfGQ0/lmOtorNOgO0AykoHXdugQR7n+2L+43TACyTXTuc+IKQa5jKhy4SLv4os1/0Eg7EKE0sCXMJkxxOjQ21CcT9cJBY5wj6mm6TAosz55i+/vZzDPUnk/aehPDOo8V+naThGLwnhl8BY/E3cKfl7he5xasbBGfmI6CutukSabDfdylJW0f2glY8Uwl4NwJ9mJ/TJtuKxdgovk5aLlge5my1CIendtZbTJSO397uOGW6xQS3Nlz+aWDB1dUs7HVBmQwI7tMG6OYz6XqTAT95XLr+hNzLhL3DTGheKGj2ourKoWwsqmklcGOa9WSTvhLf1vZzKJJT/iOf2 F93ze6TH wDEm5upb3v0EbsBoQFNl5utpNSHJM2aR9d4ZzhKgFUxc+nd2Wr1/RWHHMbiSg8jcCnp+PTLG+NOYTkZVTp0+VT6wPb23hlJgLunuFgUZM85jD3yD5TUeTfLJwndFeAZHsaVRdCPsTX36w/Xizvi7zhjabTeroNZPniUq/I32pX/OH4aCKLTgJAGa5CwA7jHpkE4cz4iVXjEWfRsPC/cMyvAHNAjJ5TFxY+j7Q/nZHs5Nx3hQWUCOCFlFzlzY92WF5tUu6d5ZVU+vM0MSG+9Xy+q/sWnYVT/0xHfhz6PA6OVhRJLv6A4lE8Ki3LA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 02, 2025 at 06:08:26PM -0400, Steven Rostedt wrote: > On Mon, 2 Jun 2025 17:52:49 -0400 > Kent Overstreet wrote: > > > +cc Steven, Peter, Ingo > > > > On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote: > > > On Fri, May 30, 2025 at 5:05 PM Kent Overstreet > > > wrote: > > > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > > > On Thu, May 29, 2025 at 6:11 PM Kent Overstreet > > > > > wrote: > > > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: > > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list rcu_list") > > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total and per-NUMA > > > > > > > nodes are displayed in a single row for each alloc_tag in /proc/allocinfo. > > > > > > > Also percpu allocation is marked and its stats is stored on NUMA node 0. > > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc > > > > > > > percpu n total 447232 1747 numa0 269568 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > > > percpu n total 83200 325 numa0 30976 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > > > ... > > > > > > > percpu n total 364800 5700 numa0 109440 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > > percpu n total 1249280 39040 numa0 374784 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu allocation > > > > > stats, 'bytes' is per-cpu, we have to multiply it by the number of > > > > > CPUs to get the total bytes. Mark it so we know the exact amount of > > > > > memory used. Any /proc/allocinfo parser can understand it and make > > > > > correct calculations. > > > > > > > > Ok, just wanted to be sure it wasn't something else. Let's shorten that > > > > though, a single character should suffice (we already have a header that > > > > can explain what it is) - if you're growing the width we don't want to > > > > overflow. > > > > > > > > > > Does it have a header ? > > > > > > > > > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node stats counter once the > > > > > > > system boots up and knows how many NUMA nodes available. percpu allocators > > > > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RESERVE. > > > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the memory for > > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and memory is accounted. > > > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this would mainly be > > > > > > of interest to people looking at optimizing allocations to make sure > > > > > > they're on the right numa node? > > > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue and make some > > > > > optimizations. I can make it a kconfig. Does anybody else have any > > > > > opinion about this feature ? Thanks! > > > > > > > > I would like to see some other opinions from potential users, have you > > > > been circulating it? > > > > > > We have been using it internally for a while. I don't know who the > > > potential users are and how to reach them so I am sharing it here to > > > collect opinions from others. > > > > I'd ask the tracing and profiling people for their thoughts, and anyone > > working on tooling that might consume this. > > > > I'm wondering if there might be some way of feeding more info into perf, > > since profiling cache misses is a big thing that it does. > > > > It might be a long shot, since we're just accounting usage, or it might > > spark some useful ideas. > > > > Can you share a bit about how you're using this internally? > > I'm guessing this is to show where in the kernel functions are using memory? Exactly Now that we've got a mapping from address to source location that owns it, I'm wondering if there's anything else we can do with it. > I added to the Cc people who tend to use perf for analysis then just having > those that maintain the kernel side of perf. Perfect, thanks