From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7221BC5B552 for ; Tue, 10 Jun 2025 00:22:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 036DF6B007B; Mon, 9 Jun 2025 20:22:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F2B346B0088; Mon, 9 Jun 2025 20:22:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3FB56B0089; Mon, 9 Jun 2025 20:22:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C37746B007B for ; Mon, 9 Jun 2025 20:22:05 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DA5F4C0675 for ; Tue, 10 Jun 2025 00:22:04 +0000 (UTC) X-FDA: 83537588568.21.EBBD5D6 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf26.hostedemail.com (Postfix) with ESMTP id B3F5F140014 for ; Tue, 10 Jun 2025 00:22:02 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=E7PR+sWZ; spf=pass (imf26.hostedemail.com: domain of cachen@purestorage.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=cachen@purestorage.com; dmarc=pass (policy=reject) header.from=purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1749514923; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aV5rucoiUV4bKvcbOXQJzj3Na8q8//TIVQpZuTtS6t0=; b=Kp6lVJ65YOmbFwVrDT14I6O8yrg3j2ARWJ0hnRmnVOl5t6nr3qXPAJZ24It0NDi069B4EB LeF1FKwN3vekrBZiFgBfCwK5fsrbkhFGlWKnk46NK3psNTiF6J96BktQRaRji8oSLC2TOP Tkcd17pu/z2VxaH6gbmNkQdQ4Fm68rA= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=E7PR+sWZ; spf=pass (imf26.hostedemail.com: domain of cachen@purestorage.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=cachen@purestorage.com; dmarc=pass (policy=reject) header.from=purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1749514923; a=rsa-sha256; cv=none; b=2fCI0iASgo9k/KZP2H2Qas76ohoM6jqxgVwSK6tGWl4beDJgJONW1qzHgPZJT9NsZG4UUv 6lnY4XNWsuNKlRVckO9xNoI7M0nuXPuAq/DSfdDIXn19FKBYUBqUYLWJdlVDVw3DjCBGb1 nc4y0sG4zooFv/1dODs1ITpXmgVPDtM= Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-31308f52248so692587a91.2 for ; Mon, 09 Jun 2025 17:22:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1749514921; x=1750119721; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aV5rucoiUV4bKvcbOXQJzj3Na8q8//TIVQpZuTtS6t0=; b=E7PR+sWZRUopD2w1plmV0hlEhZ6/VxlA1HtKWXDobZLCJAecpQHyxyv7Pql8nNWCsm oDgRde4v6rFVA8PUwQhcnfza0pFOR1S+/gOoDRR2iZ+Lvzci4zGsm1lAHTvcxjoucYjm g9Cv+90sZnopLKN6x4SdAnUQDctvboac+ujyKqEmFVtaQocVTGO+oTqzVvz3H24ZMx5M dSgzFvzoXdEPC+9DwzFaCYYIn51tH3Nmr6g+BbycmBa7NkH8MqQHO3cjIvzQ1TO6Kp6e Gp/XJF+ZS1D72xyzYlzhB27bLoIxWlVuKW2UlpVLOYLgGtA++kolW/T5y8TtSjZkFfly YCeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1749514921; x=1750119721; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aV5rucoiUV4bKvcbOXQJzj3Na8q8//TIVQpZuTtS6t0=; b=EgWaM0QwlZz7+sgxOPFLIzuM3v/ziGIOiV8guIjEL4qxEkM+6c9yx397jvyy6wEc6I OpTtjfqKICUY5WTH0uP7YWdYtR96/0Ugjfx7nqXU17SXXjUMjXrX5zfHgp6iLqOuiOhb or//zTIG/OGDZ3vtBH47EyMAhakV0hyHQ1DNehoTipDXAjvfi9F1P366/b6CpMhaCdCi rqg4Jxm+3xL077McB4AtK/VYqRT4gR9H3wcFOtwiOXWw5JcyGdjfMAT/4m5Ckat0an1y SVVnpLAb6x1A5BIzb/++RBMbII0suAD2KyW55Ut4Tp3fQsoLnB0l0E157+aAcV18WN6I jogQ== X-Forwarded-Encrypted: i=1; AJvYcCXrBvqRYkMNzjPHx/lgXV3sUMjaygN3iSVacok0hcbADyj7a0WQrpL6R/OT1HnAUwhO2edY/ifUXQ==@kvack.org X-Gm-Message-State: AOJu0YzK0TP+vGxlXxT21BZyYMnAmUBnT1ejo9OImJZge3Mw7b7/36sh fy+rvzeJ0wTD9GFabcnZi42gMM2iPD4/WLRZWRNIDBPIfMq5M+B6t2Et4f7dBqkqNjJqMuThFtS mo4tY6lHl+oQ1ymyFxVn/vDvXbUq06/3vkMciST29I/05Y3eLSfQ88Yo= X-Gm-Gg: ASbGncsXprF7jvwwkLcwznuYZR/CgvGI/gQB/9t3JCksjOlIRpFzVgofQXKkrRo20XQ D6QDTCSfOYB8PH3TgK0eNNDUB6lxLbF5LZ3hQ4v7dmWrxJUoKW++k454rzzNLSk2NdmxnJKE9du UKaYWDBj8sJkxSb0fw/kVO9pnna+fWqugbAckLKmpmlMk= X-Google-Smtp-Source: AGHT+IG0Upi4mMuQ2INEqIJPmz0WdwwJqpi+kzhnDs8xItytd6FGzmfkCi4O9MHk//TwYLZVCKCw3G9KWCXRElFgTGE= X-Received: by 2002:a17:90b:264a:b0:312:1ae9:1537 with SMTP id 98e67ed59e1d1-3134debe72dmr7585209a91.0.1749514921356; Mon, 09 Jun 2025 17:22:01 -0700 (PDT) MIME-Version: 1.0 References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> In-Reply-To: From: Casey Chen Date: Mon, 9 Jun 2025 17:21:50 -0700 X-Gm-Features: AX0GCFswvQNQ6uiGCVlAgOGkMmcvVkrVoo32H_a3E3MNaIIbVkADr-F0yVaC-9s Message-ID: Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats To: Suren Baghdasaryan Cc: Kent Overstreet , linux-mm@kvack.org, yzhong@purestorage.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B3F5F140014 X-Stat-Signature: jpq3x9z9qj4dbhmjexg4gpgb3qgra96j X-Rspam-User: X-HE-Tag: 1749514922-647806 X-HE-Meta: U2FsdGVkX18s4fpY63fdi3R1F2zuFmVojPKVPEDknctMyB/D70JrTj9b0ndDMz5apoA1udDcBmCMBDdj5X55wCA2dPuiq36iDeQVlbEh7lJCFzyt8eaHjcetJkQE8GuBQnmMaNPeFrUsgDHyVOGaNVpPyA1HVEo9ESNyqVLvk9Vne6YUo5oA2tcjXrHZDgKp8IeMtTArfrs1Y2GoV6sled46kUfwxLWPpp2WdoR0dPUteu8wzWD+aFAylHYhkLzE+UZTb0sps2Fi4rlGNKsldyrIvF0G+WYYY2mhguwYWjcNyqQqEumHRLUwZwXleEfyBvo4DOJEeFR16zQoYz5Rp1lEnMqV8m8BfT1QaqgEFUP6/ci6QlogJZCQqOk+95RLTAlxUtMya16g5gQqmHWxu5QxYciig+y9qvPagBHldOJ366HUrIMMz2UGalYjJTVYROobyOh3pTZ/NEPKWnWGWT0FkjgJVlrWJTjq/Jo9RGST0gRq6yV1pYlwGpssXH94AviTrII0IoEnLppPI6cemtVHGBJp4LEijxj10haxCUzBjVUt2oggodzCOgw3bwF5y5QLhrmQAanFOeydnNIVjXEefqNwFz+W0Ax066PieVLHskCpFtDgfLCCEc/IHbTSGXbITAgzcogIw8KGxppYTtgqVslVZqGYl55fUSuU2DHetXUoH+zVD7p2w+enQyOG74l6kmc3CU7CHKYjDt4EJnEVXGPdpPkn6BcyLW1u+7KmC81bmOjBnz+4f6XZGNvxPo/WDTfoGCLMk6AI0+Airnx+RKjn2fWSfCj/m7qC6nAcYQhw1m+L9XnycOKtWgZjIu9ijY6pAMAe+uz6S2AAEbOFslj7gSwCV02n30uZv/FklFIlzirnnPZ2E11XngdqM1iRBj9AXovY9VMNIHjcai7mAd+kUYacAha2lS6RO0ptgPVCrXwhe7gziQYfhJsAAFRfSgpzwfjYwhTdcsy 7HPOSu22 xdH8lXLltE3/PCynmIAr+9FeBg3avmevb4FiWpyNCnJpY4PGBErsf4dT+mJKc9TcjAcCku8wPDV6pnYSH/z+XXJbR6WSxyVxqedqk2M4zvCehfsAAx97VOYHRY9HI2uh57IED5PTPVWznLzr7iXYC81nURM0QQIu10KyML3kvHEvW/Vcb7SITqhcM0FR1PBVvPDyiwm+/+ASDbG22oVB8IbY6jS10CemWNmul9kDmmj+dW4W1kkD44m9DZwRYEtAtxBSIY/7UkyyMMYhABjDTvTHFyb0OZ9X2H37mC47eGyFNrRo78XyN09HtHistpMev6qZNdrixojpLHacjfgqropMYDZeEC4sNI9HUVLFd0FkVtxI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 4, 2025 at 8:22=E2=80=AFAM Suren Baghdasaryan wrote: > > On Tue, Jun 3, 2025 at 5:55=E2=80=AFPM Casey Chen wrote: > > > > On Tue, Jun 3, 2025 at 8:01=E2=80=AFAM Suren Baghdasaryan wrote: > > > > > > On Mon, Jun 2, 2025 at 2:32=E2=80=AFPM Suren Baghdasaryan wrote: > > > > > > > > On Mon, Jun 2, 2025 at 1:48=E2=80=AFPM Casey Chen wrote: > > > > > > > > > > On Fri, May 30, 2025 at 5:05=E2=80=AFPM Kent Overstreet > > > > > wrote: > > > > > > > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > > > > > On Thu, May 29, 2025 at 6:11=E2=80=AFPM Kent Overstreet > > > > > > > wrote: > > > > > > > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: > > > > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_= list rcu_list") > > > > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/= kernel/git/akpm/mm > > > > > > > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in t= otal and per-NUMA > > > > > > > > > nodes are displayed in a single row for each alloc_tag in= /proc/allocinfo. > > > > > > > > > Also percpu allocation is marked and its stats is stored = on NUMA node 0. > > > > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588 = 2147 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_des= c > > > > > > > > > percpu n total 447232 1747 numa0 269568 = 1053 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > > > > > percpu n total 83200 325 numa0 30976 = 121 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > > > > > ... > > > > > > > > > percpu n total 364800 5700 numa0 109440 = 1710 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core= /cmd.c:1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > > > > percpu n total 1249280 39040 numa0 374784 = 11712 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core= /cmd.c:1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > > > > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu a= llocation > > > > > > > stats, 'bytes' is per-cpu, we have to multiply it by the numb= er of > > > > > > > CPUs to get the total bytes. Mark it so we know the exact amo= unt of > > > > > > > memory used. Any /proc/allocinfo parser can understand it and= make > > > > > > > correct calculations. > > > > > > > > > > > > Ok, just wanted to be sure it wasn't something else. Let's shor= ten that > > > > > > though, a single character should suffice (we already have a he= ader that > > > > > > can explain what it is) - if you're growing the width we don't = want to > > > > > > overflow. > > > > > > > > > > > > > > > > Does it have a header ? > > > > > > > > Yes. See print_allocinfo_header(). > > > > > > I was thinking if instead of changing /proc/allocinfo format to > > > contain both total and per-node information we can keep it as is > > > (containing only totals) while exposing per-node information inside > > > new /sys/devices/system/node/node/allocinfo files. That seem= s > > > cleaner to me. > > > > > > > The output of /sys/devices/system/node/node/allocinfo is > > strictly limited to a single PAGE_SIZE and it cannot display stats for > > all tags. > > Ugh, that's a pity. Another option would be to add "nid" column like > this when this config is specified: > > nid bytes calls > 0 8588 2147 kernel/irq/irqdesc.c:425 func:alloc_d= esc > 1 0 0 kernel/irq/irqdesc.c:425 > func:alloc_desc > ... > > It bloats the file size but looks more structured to me. > How about this format ? With CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS=3Dy, /proc/allocinfo looks l= ike: allocinfo - version: 1.0 0 0 init/main.c:1310 func:do_initcalls 0 0 0 1 0 0 ... 776704 1517 kernel/workqueue.c:4301 func:alloc_unbound_pwq 0 348672 681 1 428032 836 6144 6 kernel/workqueue.c:4133 func:get_unbound_pool 0 4096 4 1 2048 2 With CONFIG_MEM_ALLOC_PROFILING_PER_NUMA_STATS=3Dn, /proc/allocinfo stays same as before: allocinfo - version: 1.0 0 0 init/main.c:1310 func:do_initcalls 0 0 init/do_mounts.c:350 func:mount_nodev_root 0 0 init/do_mounts.c:187 func:mount_root_generic ... > > > > > I'm also not a fan of "percpu y" tags as that requires the reader to > > > know how many CPUs were in the system to make the calculation (you > > > might get the allocinfo content from a system you have no access to > > > and no additional information). Maybe we can have "per-cpu bytes" and > > > "total bytes" columns instead? For per-cpu allocations these will be > > > different, for all other allocations these two columns will contain > > > the same number. > > > > I plan to remove 'percpu y/n' from this patch and implement it later. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node sta= ts counter once the > > > > > > > > > system boots up and knows how many NUMA nodes available. = percpu allocators > > > > > > > > > are used for memory allocation hence increase PERCPU_DYNA= MIC_RESERVE. > > > > > > > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called s= o the memory for > > > > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and = memory is accounted. > > > > > > > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this wo= uld mainly be > > > > > > > > of interest to people looking at optimizing allocations to = make sure > > > > > > > > they're on the right numa node? > > > > > > > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue and = make some > > > > > > > optimizations. I can make it a kconfig. Does anybody else hav= e any > > > > > > > opinion about this feature ? Thanks! > > > > > > > > > > > > I would like to see some other opinions from potential users, h= ave you > > > > > > been circulating it? > > > > > > > > > > We have been using it internally for a while. I don't know who th= e > > > > > potential users are and how to reach them so I am sharing it here= to > > > > > collect opinions from others. > > > > > > > > Should definitely have a separate Kconfig option. Have you measured > > > > the memory and performance overhead of this change?