From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8F6DC5AD49 for ; Mon, 2 Jun 2025 21:52:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 51C8C6B0353; Mon, 2 Jun 2025 17:52:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CD1B6B0354; Mon, 2 Jun 2025 17:52:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E31E6B0355; Mon, 2 Jun 2025 17:52:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 269076B0353 for ; Mon, 2 Jun 2025 17:52:58 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C42331414BA for ; Mon, 2 Jun 2025 21:52:57 +0000 (UTC) X-FDA: 83511811194.21.5A6CC14 Received: from out-181.mta0.migadu.com (out-181.mta0.migadu.com [91.218.175.181]) by imf30.hostedemail.com (Postfix) with ESMTP id 0949C8000B for ; Mon, 2 Jun 2025 21:52:55 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mX8z+RUA; spf=pass (imf30.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748901176; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s95SLvfYZz/GmT0Zb6GrN+OwJMmlZoxFMHtNe+VmHwo=; b=PMwYEKbxS6Rz4U7X58Mfov4JshrEsOP8ynqmcWhR3AHnH0dA3860lKYg1JJtMHhgf/gdjn LYmewy5sCWoy7xHZ1chwwRPxf5hyTCrhESTzGhc/fqU0W94Cn+X3VIS2mx+o7xc6NMETWR Z0XlGToSu23rhQgdp6AnzdH4UqSIH4A= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=mX8z+RUA; spf=pass (imf30.hostedemail.com: domain of kent.overstreet@linux.dev designates 91.218.175.181 as permitted sender) smtp.mailfrom=kent.overstreet@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748901176; a=rsa-sha256; cv=none; b=EMqZy/cnxR4sDr5rTexIWknd3XrEvT96oQ+imY2ox/Xl9IwYGB+NO2NhxkZHuOF5RUEiLK 11a+g0aEnr/85DNgIv11ZKtLOCsPdjQXOIn37rWSjAftdYYb8XPu2Wx68cDRoiME66fZyl wIUpfZYS0xmvadmJS3SPYCZWwYIPuTw= Date: Mon, 2 Jun 2025 17:52:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1748901173; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=s95SLvfYZz/GmT0Zb6GrN+OwJMmlZoxFMHtNe+VmHwo=; b=mX8z+RUAXZgrrUP/QLIUqaEYqH5YlVUpq7enc5sG4b34Ror2+LEppWmDPIIuLFnrLlFyDl yA1ECl4ZE7j8dBfEnEQu730gpLDMdY+rMpwh8xAar7FOWg6R8UHKAXwWlCmg6U+Y1bOWw3 /3iYhTZnkvldBndLN0Zq135CNwyBIxU= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Kent Overstreet To: Casey Chen Cc: linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com, Steven Rostedt , Peter Zijlstra , Ingo Molnar Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats Message-ID: References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 0949C8000B X-Stat-Signature: ckth5p868guorswg471oss7hbpn5zpjq X-Rspam-User: X-HE-Tag: 1748901175-146705 X-HE-Meta: U2FsdGVkX1/ggGBuCeSGQ2rj1QbTswfbjeRIZeb/XiEHtqrzk47UyzoAPKhHv+jSgsDIqLFSz7nxDo8/wfR0Gek+kyT6MvylfOeWOdaGc9x3UOXf2zrL4QwWakJKC3muxmrn9cW5SMxO9fBmBOHTeIVrMB9o1M843ScRexfnwJR51akTZGXEI+WE0Bjo/tH59UG+YtukOS74vOyuRzCknAv1/xa1PG26Cnw+E7KndrvhCKWy/coO85QtWsMYNRzih81aut4qfI5TMHBf1kE0megUfZAW2XcaD+Um03w/qZOHLnRz3Bjmbj/JQ5IjqOelWyC9NzVGKEGC4OS4MNjkqCS1f3xYvbzs1eHpE8olmMFNIRQT4nKmWWhrKY4B8KP3zXuq1GhleAIZrPA76sCcawfxiXg/PMRn0eSc59osfi/TGaPvAx0aLRva4ZiirJq8PRQORD8F5E4cFBjypM4PTvC6Ya6D76NNwY/5WHUIoTAmvB8DcNRVHsNHOC3Jxa9jIFsh+1ZWD1l6YPOfK2FZP/3i3ap2S3bQfi7SaIL/tx0ZyInTDTlh5vFLWH076h/GqY3N9rs2XSxUTe35wes//rEDam4jqu9G6AeljDjgLlCUbPL0np+hggWIdBkTkrtcWwqRvPQBTc83o+DNN9n8qTcRPbPRtc2s5STtY3narOxrGsjR7bEHcyG8l0tk+GREs6UCmzBO0L94oEttexCMTuIArr0d+Sn3bvrjOTL3MDa0s0ckFYYSLwlFwBfRqihwejEAaiPShNmd+GiOD3ETzeAlvq6Y5G+B/0AJhp/NBH3QDEMaGnbXZjroGvf5RGRhNeNs9WnQefrBqsEPhS8QkxXMj5OqSU7SmZZ/XJ1+68lvHc9/WWGwxOyxw9/150/riNeaVBSH9QzF336hoiYo5VKU7UHdpqNc X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: +cc Steven, Peter, Ingo On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote: > On Fri, May 30, 2025 at 5:05 PM Kent Overstreet > wrote: > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > On Thu, May 29, 2025 at 6:11 PM Kent Overstreet > > > wrote: > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list rcu_list") > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total and per-NUMA > > > > > nodes are displayed in a single row for each alloc_tag in /proc/allocinfo. > > > > > Also percpu allocation is marked and its stats is stored on NUMA node 0. > > > > > For example, the resulting file looks like below. > > > > > > > > > > percpu y total 8588 2147 numa0 8588 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc > > > > > percpu n total 447232 1747 numa0 269568 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > percpu n total 83200 325 numa0 30976 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > ... > > > > > percpu n total 364800 5700 numa0 109440 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > percpu n total 1249280 39040 numa0 374784 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu allocation > > > stats, 'bytes' is per-cpu, we have to multiply it by the number of > > > CPUs to get the total bytes. Mark it so we know the exact amount of > > > memory used. Any /proc/allocinfo parser can understand it and make > > > correct calculations. > > > > Ok, just wanted to be sure it wasn't something else. Let's shorten that > > though, a single character should suffice (we already have a header that > > can explain what it is) - if you're growing the width we don't want to > > overflow. > > > > Does it have a header ? > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node stats counter once the > > > > > system boots up and knows how many NUMA nodes available. percpu allocators > > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RESERVE. > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the memory for > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and memory is accounted. > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this would mainly be > > > > of interest to people looking at optimizing allocations to make sure > > > > they're on the right numa node? > > > > > > Yes, to help us know if there is an NUMA imbalance issue and make some > > > optimizations. I can make it a kconfig. Does anybody else have any > > > opinion about this feature ? Thanks! > > > > I would like to see some other opinions from potential users, have you > > been circulating it? > > We have been using it internally for a while. I don't know who the > potential users are and how to reach them so I am sharing it here to > collect opinions from others. I'd ask the tracing and profiling people for their thoughts, and anyone working on tooling that might consume this. I'm wondering if there might be some way of feeding more info into perf, since profiling cache misses is a big thing that it does. It might be a long shot, since we're just accounting usage, or it might spark some useful ideas. Can you share a bit about how you're using this internally?