From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 295EEC5B549 for ; Mon, 2 Jun 2025 22:07:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB5AC6B035B; Mon, 2 Jun 2025 18:07:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A65986B035C; Mon, 2 Jun 2025 18:07:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A2B16B035D; Mon, 2 Jun 2025 18:07:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 7B7856B035B for ; Mon, 2 Jun 2025 18:07:19 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D7DC31D755D for ; Mon, 2 Jun 2025 22:07:18 +0000 (UTC) X-FDA: 83511847356.29.D4CFB73 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf13.hostedemail.com (Postfix) with ESMTP id 2D82A20004 for ; Mon, 2 Jun 2025 22:07:16 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of "SRS0=bdPQ=YR=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=bdPQ=YR=goodmis.org=rostedt@kernel.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748902037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Zm8GwNlfgsZTxtMRw2yUdy9ZQyV3RIyg8uk0Ia02PPA=; b=PyckkSWpX2UW9l4JhGRceXUm8gFC1RwigLYWyJomI3LLkI5s6vWuUwRdqcmkFKI1tdFtE4 xqZcZ942sebv3nqaQdMufj1dmP4ZRyX0lqPv0a5IqE57wUA5TqjYWSsVOgDZoyuIFRtAx+ W5pPfXOGFAE4FcePa6B8yd27rrCGI08= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=none; spf=pass (imf13.hostedemail.com: domain of "SRS0=bdPQ=YR=goodmis.org=rostedt@kernel.org" designates 139.178.84.217 as permitted sender) smtp.mailfrom="SRS0=bdPQ=YR=goodmis.org=rostedt@kernel.org"; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748902037; a=rsa-sha256; cv=none; b=EMF9VcoSV/oqe277PfbrqU3PpkrA5j9cDDTr+rkh8Bu7coB2SkKRe23gCytD0ZYGDlFNbq dFb+Dyni/87rjLEM7fMJqNigeWX6UkOsG2a9toG/j+ElteaopTmYb3QYMoxSMBqWVNCV0j VyWoJUrPnY/mt2dpGrlqyLrDB3lmrAk= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id 333025C54A1; Mon, 2 Jun 2025 22:04:59 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7FF7FC4CEEB; Mon, 2 Jun 2025 22:07:14 +0000 (UTC) Date: Mon, 2 Jun 2025 18:08:26 -0400 From: Steven Rostedt To: Kent Overstreet Cc: Casey Chen , linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com, Peter Zijlstra , Ingo Molnar , Namhyung Kim , Masami Hiramatsu , Arnaldo Carvalho de Melo , Ian Rogers Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats Message-ID: <20250602180826.3a0aafc0@gandalf.local.home> In-Reply-To: References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> X-Mailer: Claws Mail 3.20.0git84 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2D82A20004 X-Stat-Signature: fyr677nipxjk3kjmtezmhe3afo7bwmfb X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748902036-674782 X-HE-Meta: U2FsdGVkX1/DBtETe3zo8HGVY4T1H94jplyupZqFg/upFEJSo6sTSIoq4BUw5Al12m9FoqHExeE6HkXIfM9um//yuVso0LgI28Qh+dNGKHTvhBZaQFsd7RmWKFWN74myvD9Fi3dMEbW20r2ckbLTha5DD/hWc5BbBDJv8OO0MFoWOLqkHcOL+tAC6TpDlgDN/1rM7Gh/y0tut7PAd1q5FGGj1hj2yKmKGJGz7kL7HjM8uI59kZ9Hbc25pKfY6+3BH60DOLcZ/S1ofTOD05RVhjIaWzBkJ/+KHlZNlHsKWVaZfU+VzlKBGVjP+mb5J2BwBJ6wjKkT3iHri40xuyMVrFGEOsNzEGDlGjk7W/havL8E+V1PLnIdQ134jm1WIb0rKnz89KWQ+HEaMffsh9/zw6Sii7KSAF6PwMXEjxOrK/eTag0a6zPU5Gg+UypcP3iSRQzdPoAlwzNPpic+NqOKO5jIWouHTfl/qmzkp2IlIvJbhkWoY8yUHpK4ifP543kajKtZqbQdI3bgbn1+eE5HNKfqNORJPt7hAfle2EHXtNxauhAx2l6bVl0jWveM6+UmmY0waztCuFeb2WAYtq5x0CGybzCXr/4ci5NrDiMQCW/kc4TKTdQvLY0aEUMZN2kmFud0IPSRbm4JbsNBFqIBcgZ/EYDLyGCp2p7Ca87so5QsxfJxJdOVmL33eKUr9lJlyufXrOwjLrST+Wg8QXdfAxkAYwTqi56Cy5yK7o9SV2hohsJQ/xCOs+uKZysDu1mL9vcrXZuLmQfBFCGLcEx7/V1eoq0gUIPhV+UGg9OCxWZwQJIVbOsUQAga4BcT5Bc3OB8YbD1dRYBb8J3D6nmnjUmBCj6uVtgUBKWs/a1isfRQWsVddJ2rpZpSUqwwBDj2U6a6PdF2W9lM4S5jMZ7MPQYqfWg4p/jooV3btC5iJ5Y4mOhSPvCbZQUGHoJ/DxpAP7x4QNwaQq4HhMwxWH/ 6lVZm/i4 8GFhifSKFVoFxffF3Z5ue5iacr8nzp/+ZssbpjRh9iM0iN5VJmAiPgkxUhDI30/cHbS0AX6QXZNlpviJ8qZYEwGXApoZeNAOPSKdAm8jPJEC0oOyEMBJERE3crA+kC9OReIMssm2JWAXZ4OdmN8uZ4ZF5BSGyD4L8S2e5r+DzxMzFeqhdGx9UKm/HdAvvmmOslYVVHcEe9KgUZsU9volVHR4I6dgHF001zU1+5wdG6EdAjludLeqgdzxXlkwegEAPW1f8egTzEVd/vsCLGabEQv9FwBHNABcqHQr7O8B6P9frSP57sLHrGutT5Cxpqt4VRRf2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 2 Jun 2025 17:52:49 -0400 Kent Overstreet wrote: > +cc Steven, Peter, Ingo >=20 > On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote: > > On Fri, May 30, 2025 at 5:05=E2=80=AFPM Kent Overstreet > > wrote: =20 > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: =20 > > > > On Thu, May 29, 2025 at 6:11=E2=80=AFPM Kent Overstreet > > > > wrote: =20 > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: =20 > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list r= cu_list") > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel= /git/akpm/mm > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total a= nd per-NUMA > > > > > > nodes are displayed in a single row for each alloc_tag in /proc= /allocinfo. > > > > > > Also percpu allocation is marked and its stats is stored on NUM= A node 0. > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588 214= 7 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc > > > > > > percpu n total 447232 1747 numa0 269568 105= 3 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > > percpu n total 83200 325 numa0 30976 12= 1 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > > ... > > > > > > percpu n total 364800 5700 numa0 109440 171= 0 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c= :1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > percpu n total 1249280 39040 numa0 374784 1171= 2 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c= :1376 [mlx5_core] func:alloc_cmd_box =20 > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > =20 > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu allocat= ion > > > > stats, 'bytes' is per-cpu, we have to multiply it by the number of > > > > CPUs to get the total bytes. Mark it so we know the exact amount of > > > > memory used. Any /proc/allocinfo parser can understand it and make > > > > correct calculations. =20 > > > > > > Ok, just wanted to be sure it wasn't something else. Let's shorten th= at > > > though, a single character should suffice (we already have a header t= hat > > > can explain what it is) - if you're growing the width we don't want to > > > overflow. > > > =20 > >=20 > > Does it have a header ? > > =20 > > > > =20 > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node stats cou= nter once the > > > > > > system boots up and knows how many NUMA nodes available. percpu= allocators > > > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RE= SERVE. > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the = memory for > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and memory= is accounted. =20 > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this would ma= inly be > > > > > of interest to people looking at optimizing allocations to make s= ure > > > > > they're on the right numa node? =20 > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue and make s= ome > > > > optimizations. I can make it a kconfig. Does anybody else have any > > > > opinion about this feature ? Thanks! =20 > > > > > > I would like to see some other opinions from potential users, have you > > > been circulating it? =20 > >=20 > > We have been using it internally for a while. I don't know who the > > potential users are and how to reach them so I am sharing it here to > > collect opinions from others. =20 >=20 > I'd ask the tracing and profiling people for their thoughts, and anyone > working on tooling that might consume this. >=20 > I'm wondering if there might be some way of feeding more info into perf, > since profiling cache misses is a big thing that it does. >=20 > It might be a long shot, since we're just accounting usage, or it might > spark some useful ideas. >=20 > Can you share a bit about how you're using this internally? I'm guessing this is to show where in the kernel functions are using memory? I added to the Cc people who tend to use perf for analysis then just having those that maintain the kernel side of perf. -- Steve