From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 80CD8C5B543 for ; Tue, 10 Jun 2025 15:56:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0A6C66B008A; Tue, 10 Jun 2025 11:56:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 030BB6B008C; Tue, 10 Jun 2025 11:56:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E41436B0092; Tue, 10 Jun 2025 11:56:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C00526B008A for ; Tue, 10 Jun 2025 11:56:45 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 651061D4BD1 for ; Tue, 10 Jun 2025 15:56:45 +0000 (UTC) X-FDA: 83539943970.08.D465F73 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf06.hostedemail.com (Postfix) with ESMTP id 86ECC180009 for ; Tue, 10 Jun 2025 15:56:43 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HquVmWkq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749571003; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=g/0jFGjLJnpX+xCxgDDKoKQKXV3lyQ9/AjF8zYsroR0=; b=zogjLBnEyb4nYvpca0xolftdMgpCW5LAPHOlAW3iIKksPth+CAP/rnXWdrGoch4zPR9jWG E8cDto+kiCn5iAYuuYtK9utow8f78eJOSSkCZYm+nvZ/aRbTrrt0amlMAuFiVuF7lIwkJI 9Np21cheO3NSo/74a7m5+Czyyk8NhUs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749571003; a=rsa-sha256; cv=none; b=maSJOVf5QGatRXOktx/5mz5338HM0QBt+yTiJp9OgNg9jbRopyLfZ5NcbRuVxnIwZdZKH1 cxyk43RxwGBuEF5f6RUyzydPZ2k/2W2pFypKcrc2GSysF47fr/RdMZBg+Eik+GR9ncVTDf qjeGyOHesuXGVqW0rleE5tDzkhTx9Oo= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=HquVmWkq; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of surenb@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=surenb@google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-47e9fea29easo405011cf.1 for ; Tue, 10 Jun 2025 08:56:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1749571002; x=1750175802; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=g/0jFGjLJnpX+xCxgDDKoKQKXV3lyQ9/AjF8zYsroR0=; b=HquVmWkqbwGO3QytO807fBTsKrZuLmadzOHxxpVq472P59MB1p8aVc46OkwggolFXg KG/eRnisNn0fYSrzZprsU2bzFnCG9CE/Mt4+RkYMx6+29HQu/8lrcURQFFY32m8syh7x zyaA82rQEEuO1WTL6pRkTsz3vEE4oGQ3gwSCNrk9AV/hUyUO8s1HOjhJayEkbFNpVsJr 8wmAEJPDVjw37VEmtx6g7rFrKdNxFg6hvFuDpeY/k9OZHfQIFP8RRshCTEDILtmVpSET YMt1owoMWAdrrAzB2rWj0/CRx42kTkYop9g8YmE8dW/R3hQareodFbMjEYHCfu/cV/9G DgZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749571002; x=1750175802; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=g/0jFGjLJnpX+xCxgDDKoKQKXV3lyQ9/AjF8zYsroR0=; b=wBAulldwgRuv6ZYkJDJpcXjl1GXfwAe/aHP3VApAIHhkuA3FFs8wq9naaNCl1n5CC2 mH84/yHkvDt8lBQz+DV90s5BDfY5oWyT5MJfZCne1u1L0ChksRZx70FIsYE98NSwN/Cy mXawN4Ul9ei3zyj316bk4To0X12yRF+U6+pbWG3zgxsKhxpkgQhWKRjdacTdNExy6zrn 596x4Sc43lYeeh4BW8Jy0KbWwHHsI2EPC2g9vBBLJZFZvyNZmywpoi2/1ZQ65XF2ZuhT B41ZotO0HDp0ynqujHglLZU8jn/jv1IbkHaP4GtJcHlrXuO5qrcT0B0cKVpqZH8GV1A8 bSGA== X-Forwarded-Encrypted: i=1; AJvYcCXYDIlop1CsXi05G8fQOT3aSDmNOLChOWtWWDGkTIbQqrXgWBPudTND857QApOxE6rOFZBpzyAKDA==@kvack.org X-Gm-Message-State: AOJu0YywjC77nrR+H4rq6ZxAVVPgyitywzEDVfgHnuIrzpQSU/NCu2N3 sUqoj4GfRiGfBPECQX2FVWa3U14f6HY+yBpg3g4oBzaJRThKPKoei1MB1KhUFdSX7RqEjsk1Bg/ gEwr09IR00+dNCa60A9kfJfNoeYIQwtYiGuu8Hls9vd8N7OxfojP6QqmbwlM= X-Gm-Gg: ASbGncv+MPqS1EsvvsZtqJX/+2O2Ksqjzb5dWFxWk4Eip1Bt/F9zUNcguAho8rvIBuI xotieQjZBSJoRsXdrXqPMZB4ltljuutM0yMUDv8WPWcnIq+jLZA0p39kwJSe5tT4nj7VeWyeDum fuseao+iR6yPnjlTxjVv2ksGKj/GCkr5WEyEwHtEwMEIukYiVipSDLzxX3jynOqbDtLlvcYwae4 g== X-Google-Smtp-Source: AGHT+IGkk+11J34EkaUTTJmPsvFEIg5F/zoLmSSfQh34EytAp/ECUrz0QvXvbkQ9VIEfHh3BV1J80QLAbDVpd7RFTpg= X-Received: by 2002:ac8:5914:0:b0:47b:840:7f5b with SMTP id d75a77b69052e-4a6fe644885mr9385241cf.29.1749571002158; Tue, 10 Jun 2025 08:56:42 -0700 (PDT) MIME-Version: 1.0 References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 10 Jun 2025 08:56:31 -0700 X-Gm-Features: AX0GCFvdgO123FaRl3IWQzjesI4zpBbZDeh_D0M6JpuPTaMTlHL7fzOUmcUp2Fk Message-ID: Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats To: Casey Chen Cc: Kent Overstreet , linux-mm@kvack.org, yzhong@purestorage.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: d7hu7xwxd9y979f5hozdc6noomc1ymjm X-Rspamd-Queue-Id: 86ECC180009 X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1749571003-688181 X-HE-Meta: U2FsdGVkX1/mQIMtKEmQvz7UJeM9jlo/VVXfvT+oy8kMRH1CokkVjbsuepa3aaD5RkmCnkVDREiBVyhyPP/j4Yy7BKmnNtvuKTHfAVhVweZOLTvH6VKHXVuSGmKpRhYjXnprB61bgDW3tdUPuL+77fLF1A9IpkvErcU9rafPt7qzEv8pOvUxucU79pfnKHfiN3kDfSKbU381xRCiabmnfkdtrTDYgkv/idnJMnbA2v/5Ig/5avyDDOV9K4/w93I/8Hsn3wneTpUNzCQVfW3thPprPmp5AFwGgdWc6U8SADSS8m3ZPnuqK/zOKuTxLj+WKF2NxFNjicpRKRFP+c/fcmOFTZjTXvOceQwp0l22N7kKyxLyQvcH81GrRhT9xmgj/KINUnhR1+fm91ADKgpw4Xqbusg6ZkOss8sAk6oaMNAEio91DSV8Rj7TreEejtbJ0QG1bxSksGB/LaS9pD26chlP9ljnVG23MLVQsaY69yp8oW0hiMeUF/ow46K1mUCsrzAMGvQKpg3BWyyfH5mhJudSSmHvIXVmVhB26U3ML29OYd268dUBg9ZFOrDfXs0fKMpPzU9QoZJ5Fs514zteZso8NSA/cJZkohnLfg9lWjAKScWpy4MGWUSV+dihElRkuSXxcDmOlRGp9OiRcb9i7Dny4rGo9t3TRR8zXTugwHG3pzwPvCGMZprnNFez/GFHtaVX+Y9xVgjKfHYafdres8LceFDZDGXnH7acX8/lax6xFm/hNKl46v2uxd338lRcjhnaStvPPzHiqLHPCmXpjMoQCM19Vmsf58ckPrfpqkww/h0fXzmhAmxMXWW/+vBw8ZYG7KQ4YnFfDtzGA8TamLyRcRCYFWnqvQ91F9/w1WMhcZV+cEefrGFynHas5exOHf8IO1ErnSfLf9oJxOhICsdYaLH3ONe1cmjwgseeXEGdeNSjbljCieGbAVbgU+KWd9zSXFJ+mk3jrA2XUfD PvXOYEbY cfiHqJWBCc+dC7tbEdpJVfHeRgRGpXuhxgU04qzjGOKZCFZIm+w2VZNksLkurt667lDes4ZrZjAq9rVeiAZVrJkxU6h3p9xJuaCZ2Y6gyMbVJlnUPLaKiKWd3/rWAL+XnTnmv/WS9g2URMsXJChA0ojY5wjliffSgGE/kfB4BFBUZyF0tbZxNTp7orYSY2PEiL5s8bGueANkFKMb/OA9aY9PcwAibfLe5Bd1M+DNcx3+y8H4ZY5tpYLr7nskIvPdRPJAw6gWhkXzfnstY2u2MIxIRSaiqBst6yGzWjjHJqpAE521tacrWXvylXkus8J05Eowwg89I4wqDluAEGK3QPfxmkg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 9, 2025 at 5:22=E2=80=AFPM Casey Chen = wrote: > > On Wed, Jun 4, 2025 at 8:22=E2=80=AFAM Suren Baghdasaryan wrote: > > > > On Tue, Jun 3, 2025 at 5:55=E2=80=AFPM Casey Chen wrote: > > > > > > On Tue, Jun 3, 2025 at 8:01=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > > > On Mon, Jun 2, 2025 at 2:32=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > > > On Mon, Jun 2, 2025 at 1:48=E2=80=AFPM Casey Chen wrote: > > > > > > > > > > > > On Fri, May 30, 2025 at 5:05=E2=80=AFPM Kent Overstreet > > > > > > wrote: > > > > > > > > > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > > > > > > On Thu, May 29, 2025 at 6:11=E2=80=AFPM Kent Overstreet > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrot= e: > > > > > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make activ= e_list rcu_list") > > > > > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linu= x/kernel/git/akpm/mm > > > > > > > > > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in= total and per-NUMA > > > > > > > > > > nodes are displayed in a single row for each alloc_tag = in /proc/allocinfo. > > > > > > > > > > Also percpu allocation is marked and its stats is store= d on NUMA node 0. > > > > > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588= 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_d= esc > > > > > > > > > > percpu n total 447232 1747 numa0 269568= 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bu= lk > > > > > > > > > > percpu n total 83200 325 numa0 30976= 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_on= e > > > > > > > > > > ... > > > > > > > > > > percpu n total 364800 5700 numa0 109440= 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/co= re/cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > > > > > percpu n total 1249280 39040 numa0 374784= 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/co= re/cmd.c:1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > > > > > > > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu= allocation > > > > > > > > stats, 'bytes' is per-cpu, we have to multiply it by the nu= mber of > > > > > > > > CPUs to get the total bytes. Mark it so we know the exact a= mount of > > > > > > > > memory used. Any /proc/allocinfo parser can understand it a= nd make > > > > > > > > correct calculations. > > > > > > > > > > > > > > Ok, just wanted to be sure it wasn't something else. Let's sh= orten that > > > > > > > though, a single character should suffice (we already have a = header that > > > > > > > can explain what it is) - if you're growing the width we don'= t want to > > > > > > > overflow. > > > > > > > > > > > > > > > > > > > Does it have a header ? > > > > > > > > > > Yes. See print_allocinfo_header(). > > > > > > > > I was thinking if instead of changing /proc/allocinfo format to > > > > contain both total and per-node information we can keep it as is > > > > (containing only totals) while exposing per-node information inside > > > > new /sys/devices/system/node/node/allocinfo files. That se= ems > > > > cleaner to me. > > > > > > > > > > The output of /sys/devices/system/node/node/allocinfo is > > > strictly limited to a single PAGE_SIZE and it cannot display stats fo= r > > > all tags. > > > > Ugh, that's a pity. Another option would be to add "nid" column like > > this when this config is specified: > > > > nid bytes calls > > 0 8588 2147 kernel/irq/irqdesc.c:425 func:alloc= _desc > > 1 0 0 kernel/irq/irqdesc.c:425 > > func:alloc_desc > > ... > > > > It bloats the file size but looks more structured to me. > > > > How about this format ? > > With CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS=3Dy, /proc/allocinfo looks= like: > allocinfo - version: 1.0 > > 0 0 init/main.c:1310 func:do_initcalls > 0 0 0 > 1 0 0 If we go that way then why not: allocinfo - version: 2.0 776704 1517 kernel/workqueue.c:4301 func:alloc_unbound_pwq nid0 348672 681 nid1 428032 836 6144 6 kernel/workqueue.c:4133 func:get_unbound_pool nid0 4096 4 nid1 2048 2 ... If CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS=3Dn the file format will not c= hange. > ... > 776704 1517 kernel/workqueue.c:4301 func:alloc_unbound_pw= q > 0 348672 681 > 1 428032 836 > 6144 6 kernel/workqueue.c:4133 func:get_unbound_pool > 0 4096 4 > 1 2048 2 > > With CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS=3Dn, /proc/allocinfo > stays same as before: > allocinfo - version: 1.0 > > 0 0 init/main.c:1310 func:do_initcalls > 0 0 init/do_mounts.c:350 func:mount_nodev_root > 0 0 init/do_mounts.c:187 func:mount_root_generic > ... > > > > > > > > I'm also not a fan of "percpu y" tags as that requires the reader t= o > > > > know how many CPUs were in the system to make the calculation (you > > > > might get the allocinfo content from a system you have no access to > > > > and no additional information). Maybe we can have "per-cpu bytes" a= nd > > > > "total bytes" columns instead? For per-cpu allocations these will b= e > > > > different, for all other allocations these two columns will contain > > > > the same number. > > > > > > I plan to remove 'percpu y/n' from this patch and implement it later. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node s= tats counter once the > > > > > > > > > > system boots up and knows how many NUMA nodes available= . percpu allocators > > > > > > > > > > are used for memory allocation hence increase PERCPU_DY= NAMIC_RESERVE. > > > > > > > > > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called= so the memory for > > > > > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called an= d memory is accounted. > > > > > > > > > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this = would mainly be > > > > > > > > > of interest to people looking at optimizing allocations t= o make sure > > > > > > > > > they're on the right numa node? > > > > > > > > > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue an= d make some > > > > > > > > optimizations. I can make it a kconfig. Does anybody else h= ave any > > > > > > > > opinion about this feature ? Thanks! > > > > > > > > > > > > > > I would like to see some other opinions from potential users,= have you > > > > > > > been circulating it? > > > > > > > > > > > > We have been using it internally for a while. I don't know who = the > > > > > > potential users are and how to reach them so I am sharing it he= re to > > > > > > collect opinions from others. > > > > > > > > > > Should definitely have a separate Kconfig option. Have you measur= ed > > > > > the memory and performance overhead of this change?