From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4152C5AD49 for ; Tue, 3 Jun 2025 06:46:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3FB926B03A2; Tue, 3 Jun 2025 02:46:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AD496B03B7; Tue, 3 Jun 2025 02:46:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29D446B03B8; Tue, 3 Jun 2025 02:46:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 088976B03A2 for ; Tue, 3 Jun 2025 02:46:43 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id AA46259B7F for ; Tue, 3 Jun 2025 06:46:42 +0000 (UTC) X-FDA: 83513156244.30.CA6393C Received: from mail-il1-f173.google.com (mail-il1-f173.google.com [209.85.166.173]) by imf02.hostedemail.com (Postfix) with ESMTP id B08CC80006 for ; Tue, 3 Jun 2025 06:46:40 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L7sKJdzX; spf=pass (imf02.hostedemail.com: domain of irogers@google.com designates 209.85.166.173 as permitted sender) smtp.mailfrom=irogers@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748933200; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ghEHe9XMMZ2oa+Ux24H/x5mT7kGbw0daCD7ZyyFuTeE=; b=DYFrwmFc8wSirGfQs9Y6tI96P2IYvFfWA/+Ps6Gqi4LT/1Vdzz/DDW8QHdpD20bbYszR1q rGX8ZEXWaUnE+r6wFT7ChbNBARaoQIongn74CIJsSGpx0pfKzUP8YNtJ7pqB0GPpRYvTk/ evG2JA0Wl4KbXmNrMCblkijP8te2+WE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=L7sKJdzX; spf=pass (imf02.hostedemail.com: domain of irogers@google.com designates 209.85.166.173 as permitted sender) smtp.mailfrom=irogers@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748933200; a=rsa-sha256; cv=none; b=dMlF89JtchXywXTleQSj+OVsIHAfQiasQi84KQgLU98ITDttC1A4fr2xeoUQrfT5o66Njv 2IwqMIfZj5C155REDzEBBGv1exKl+UVdypz4g5kCNNFfQqfRp20/M1lH/sKpMJ1qchf3zr I/5iPdlv806ALB/2RYrtKpBNW2D0/N4= Received: by mail-il1-f173.google.com with SMTP id e9e14a558f8ab-3dd745f8839so117175ab.0 for ; Mon, 02 Jun 2025 23:46:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748933200; x=1749538000; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ghEHe9XMMZ2oa+Ux24H/x5mT7kGbw0daCD7ZyyFuTeE=; b=L7sKJdzXZOfvrfHJK8Y9mQ8CgouWurQKdIZvAF89Vzz6yrbhuVtNgCFvKHXnzMBA/g LyYuG/JmxyUgOl12s+4J8qFyqZHjGaN+a8Yr4RsGDN6YFFvzIQAhLjxRbyhgB5yfPtQq jzeIrylTtarVrhKzhUJ1qfGVGF8TYKE9nNJGSa4t7aaqpIddKJw6n93tPp+WZ1MPeDrw GYUfaq7Q0STXDv+mEsUc9CjrnnYpTsLHaTdu9eNjdnWPZE7BpnRxnBgGF4FKU3FYpsYn wipSB7aV5BVFbyQpK5DrGIzWPc/xVDwY0YyUzfFqBLzHVhnF3WQlvVZyTehHQeGI+7Ik DImA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748933200; x=1749538000; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ghEHe9XMMZ2oa+Ux24H/x5mT7kGbw0daCD7ZyyFuTeE=; b=LeGnRClmH6plDl2HxB7Ye8qmE8fMAhb6+ynLrN6L9hTBxKqHWavcupAScsasmADf/G iT/VoUTPwqnu35+hJTromX03FOrnsicik5TMJ/T5nIbSsAp73TLuJMGwPUX+Vcg/HH4G 4lAuFjp7SG6q98Rl+gyezCG60/237QoAjJ6lzi8U3Oeddg8aBzg0SDRLzl1RBmCHMm31 ufxYkvYR43E/RxcqLSgUCJ+TtuLVS6yi5VJV33vXlUTZVnnkPCWex2tF91hEj8Qgv3/E +joGXb7YbzkIXwbWQ5JHUDzKNoiDuLh3TZTDMNIjFiJXCKJfvjn27L47ZA4x/bionRdn /ovA== X-Forwarded-Encrypted: i=1; AJvYcCX3t5QGui7OcAW+Lf8gXHBT14DzTUxFbvwwFtxAVg7xYBPGEUR/A917OzUJ+g7dAQwpTBudY8Ox4g==@kvack.org X-Gm-Message-State: AOJu0YwgG3dp5qZGiwjg2NbGbP5ak82w84CxA6f6tDDVoVIQ87EDxdM6 k4IfhereJM2k1I3qyZqyYqg0NKWLLR1ZEk2wQ+QEgxyTYv1Sv/e+MhYlArAfg4bCFRkWeBO9i9M aonbB5aQ+vZRnn/GDLFH075m9jXx9c3kLX3voLFR3 X-Gm-Gg: ASbGncv03dvC6WuAGHs6cpRGoAMxwx/07drY9C8eOr5tS+spB0/1hERTbA54Kmnd3tS XMeuen81oTgcUb2e4KsUAKGUHv6cNssQVG7u00KOUVjnbwtk+vdu468S0Dw6GKzGRYSinDApkKe kLOo6wkIcG1KFETwESbA42feqGbKCdT2rv69kK8mEQEfkENpSFno7r8ic= X-Google-Smtp-Source: AGHT+IEH9uh2BI7nO6tzUlYuupczjsV4zb5O0THDqDOZSrXl63Ey0Fqt2gxVVZRsWX+Yo+ORLvx2/pIagxPDDJRTuw0= X-Received: by 2002:a05:6e02:1707:b0:3da:7c33:5099 with SMTP id e9e14a558f8ab-3ddb7853afdmr2217575ab.13.1748933199455; Mon, 02 Jun 2025 23:46:39 -0700 (PDT) MIME-Version: 1.0 References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> <20250602180826.3a0aafc0@gandalf.local.home> In-Reply-To: From: Ian Rogers Date: Mon, 2 Jun 2025 23:46:27 -0700 X-Gm-Features: AX0GCFtmLGlCvVz0ibbhEvvXXYhhFPnuaB_RwNZ3Dkz6oM4zgg3Gts-kxrxckaA Message-ID: Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats To: Kent Overstreet Cc: Steven Rostedt , Casey Chen , linux-mm@kvack.org, surenb@google.com, yzhong@purestorage.com, Peter Zijlstra , Ingo Molnar , Namhyung Kim , Masami Hiramatsu , Arnaldo Carvalho de Melo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 98ybkikz7ce1r3qbfd4oxyu8dcs1hxs3 X-Rspamd-Queue-Id: B08CC80006 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1748933200-197849 X-HE-Meta: U2FsdGVkX19bQ3WtO7v5NCwABs8PBf48NSlUzKSzFC8pXDGE3TuQ/s7kLBKjGe95pt5qabKg+V3kppDoHLH/uGqG0vDo0kOLeErjMdcbqbGJXTiuwZ7A4KUPia3xbr0d0BegW7OlAdDDGz8P0DqB8ImyH4GuvS0sSSn77F1aQTUww4g1nrO5og0RFHVx4KDyqIl+NXgH/Kp8hTQjxMfc6BCs3Xq5VgrMaAKDri9JC0kUj7lp+uxj4HtE8nTuAcSYroznrerT6fECPtAUZ1dgZcYrcbUxAhzSiNQRYuYpIMcO3fLp7c+cr7g2BvP4LTS9xEkLKsAVS0yRD1IWUjVlc+TA3j05CZwKFE1ie5uFsQvYWiFoS5V6oPEn1y00sw4h10YLJZjlJKjKe+DjZU22YC4VIpcDW2GEGqKCIgaKcmcMvLOCUS/fy/yAN7tI8XE5OIKUHjw5Ao1uTRnvw3SdqYmyru/Phszxr2FrU84u2aWw6KGM59Vx15UeL2Y/CD3VPc1jSZRClhT1CUC+176qz0uvcmLME0ErHYyFkLAE5HeVYeQUYt3OxEXkwlhtd+p/IKWrhmq2KB7L7WFzJKrj9MpugyHeQe0PcRDaGx8OpzunQSImtC1ZUQXOVG6HPdoLNKKTFy+UsUSvghANp/D6GT3MWveGzb5HAsqZHtCg+nNIuJE6THUAVeqFCHxrRGn2PPTsIl90ghTJAr5Qx4LuBzW5wM2aSPCu6pLny4KfD8y0u0LdI6N5qGshBgOTy5skx/zdCaoZj4T14zH+isfGaqQKGu158tXzxMVVAUkBoSkCwfhtzBxUAiyM2Pwyqunf1utLz49VhrVQEG/adBxFOMzP3nwFlmh4o75U5n4LL5+sXUqPoMGn3ZIDbXpTtzeb2tHXYdHJJLTTXrs8BJPRhFxaTRIDGxXNhX7hgkRiV7r3ZWPXgPZZ3HK9qRArmRbvtuYZoNl80PsCuNg61M/ e+ym/xk1 LlZZpMMz0uRjip2Ud9BpAyn2VAa1XB6gj4vCg3pvQOj1z4kVLPhF5Pi6XH47ItgR8TtGkfrWvn6So5r8FF+1z3uvGriUr5L6dTbGCN0nTXzZewn8XIlAAlvaMHZkHf4R8Kf4+3Ns3Kp3T2K4ImbrKRjQWzzSDEDxt4L7Y3YPXFe9jZQW4IJfsyKKeVzaa7HOiVtobOYfqgJQOWjUiVR8futppY6j46oqkOhAAxU3ltOvvV8zpUiTbLUoyBY2Eq6cdc/dCuWXApwR99kIMfrBuJ/63ouN0PwolD0+K X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 2, 2025 at 4:35=E2=80=AFPM Kent Overstreet wrote: > > On Mon, Jun 02, 2025 at 06:08:26PM -0400, Steven Rostedt wrote: > > On Mon, 2 Jun 2025 17:52:49 -0400 > > Kent Overstreet wrote: > > > > > +cc Steven, Peter, Ingo > > > > > > On Mon, Jun 02, 2025 at 01:48:43PM -0700, Casey Chen wrote: > > > > On Fri, May 30, 2025 at 5:05=E2=80=AFPM Kent Overstreet > > > > wrote: > > > > > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > > > > On Thu, May 29, 2025 at 6:11=E2=80=AFPM Kent Overstreet > > > > > > wrote: > > > > > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: > > > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_li= st rcu_list") > > > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/ke= rnel/git/akpm/mm > > > > > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in tot= al and per-NUMA > > > > > > > > nodes are displayed in a single row for each alloc_tag in /= proc/allocinfo. > > > > > > > > Also percpu allocation is marked and its stats is stored on= NUMA node 0. > > > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588 = 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc > > > > > > > > percpu n total 447232 1747 numa0 269568 = 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > > > > percpu n total 83200 325 numa0 30976 = 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > > > > ... > > > > > > > > percpu n total 364800 5700 numa0 109440 = 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/c= md.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > > > percpu n total 1249280 39040 numa0 374784 = 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/c= md.c:1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu all= ocation > > > > > > stats, 'bytes' is per-cpu, we have to multiply it by the number= of > > > > > > CPUs to get the total bytes. Mark it so we know the exact amoun= t of > > > > > > memory used. Any /proc/allocinfo parser can understand it and m= ake > > > > > > correct calculations. > > > > > > > > > > Ok, just wanted to be sure it wasn't something else. Let's shorte= n that > > > > > though, a single character should suffice (we already have a head= er that > > > > > can explain what it is) - if you're growing the width we don't wa= nt to > > > > > overflow. > > > > > > > > > > > > > Does it have a header ? > > > > > > > > > > > > > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node stats= counter once the > > > > > > > > system boots up and knows how many NUMA nodes available. pe= rcpu allocators > > > > > > > > are used for memory allocation hence increase PERCPU_DYNAMI= C_RESERVE. > > > > > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so = the memory for > > > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and me= mory is accounted. > > > > > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this woul= d mainly be > > > > > > > of interest to people looking at optimizing allocations to ma= ke sure > > > > > > > they're on the right numa node? > > > > > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue and ma= ke some > > > > > > optimizations. I can make it a kconfig. Does anybody else have = any > > > > > > opinion about this feature ? Thanks! > > > > > > > > > > I would like to see some other opinions from potential users, hav= e you > > > > > been circulating it? > > > > > > > > We have been using it internally for a while. I don't know who the > > > > potential users are and how to reach them so I am sharing it here t= o > > > > collect opinions from others. > > > > > > I'd ask the tracing and profiling people for their thoughts, and anyo= ne > > > working on tooling that might consume this. > > > > > > I'm wondering if there might be some way of feeding more info into pe= rf, > > > since profiling cache misses is a big thing that it does. > > > > > > It might be a long shot, since we're just accounting usage, or it mig= ht > > > spark some useful ideas. > > > > > > Can you share a bit about how you're using this internally? > > > > I'm guessing this is to show where in the kernel functions are using me= mory? > > Exactly > > Now that we've got a mapping from address to source location that owns > it, I'm wondering if there's anything else we can do with it. > > > I added to the Cc people who tend to use perf for analysis then just ha= ving > > those that maintain the kernel side of perf. > > Perfect, thanks This looks nice! In the perf tool we already do some /proc processing and map the data into what looks like a perf event: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.gi= t/tree/tools/perf/util/tool_pmu.c?h=3Dperf-tools-next#n261 ``` $ perf stat -e user_time,system_time true Performance counter stats for 'true': 350,000 user_time 2,054,000 system_time 0.002222811 seconds time elapsed 0.000350000 seconds user 0.002054000 seconds sys ``` There's no reason we can't do memory information, the patch series I sent adding DRM information (unmerged) contains it: https://lore.kernel.org/lkml/20250403202439.57791-4-irogers@google.com/ Perf supports per-NUMA node aggregation and even has patches to make it more accurate on Intel for sub-NUMA systems: https://lore.kernel.org/lkml/20250515181417.491401-1-irogers@google.com/ It may be there are advantages to having the perf tool only events be kernel events longer term (supporting sampling being a key one) but making changes in the tool is fast and convenient. It can be nice to do things like dump counts every second: ``` $ perf stat -e temp_cpu,fan1 -I 1000 # time counts unit events 1.001152826 34.00 'C temp_cpu 1.001152826 2,570 rpm fan1 2.008358661 34.00 'C temp_cpu 2.008358661 2,572 rpm fan1 3.015209566 34.00 'C temp_cpu 3.015209566 2,570 rpm fan1 ... ``` Thanks, Ian