From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C715C5AD49 for ; Tue, 3 Jun 2025 20:01:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C8B646B0503; Tue, 3 Jun 2025 16:01:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C63836B0504; Tue, 3 Jun 2025 16:01:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BA2AD6B0505; Tue, 3 Jun 2025 16:01:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9CC376B0503 for ; Tue, 3 Jun 2025 16:01:05 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 3625C1415D5 for ; Tue, 3 Jun 2025 20:01:05 +0000 (UTC) X-FDA: 83515158090.09.909B323 Received: from mail-pj1-f43.google.com (mail-pj1-f43.google.com [209.85.216.43]) by imf23.hostedemail.com (Postfix) with ESMTP id F2BED140002 for ; Tue, 3 Jun 2025 20:01:02 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=Vao6y8wf; spf=pass (imf23.hostedemail.com: domain of cachen@purestorage.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=cachen@purestorage.com; dmarc=pass (policy=reject) header.from=purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748980863; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TBSwdRp2HYUUsAkC669MaLvRlIKf7cTDcGKLYd5UiaA=; b=QwbdHHZ6hHa64eoi2jGGz+g4nkj//55wigBG7fW+EbI90XNgt+r1tTOjlKPEhlrg2D2RLb Z1KtQRotCLRLzys2Qa7hqofJd6t8ikzwGwzlLia01TnN663S3XuFuKbaiEHn5H46kPy8KA I2PYaa8c29aWMbj/r8UV3VcRgSeiDIY= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=Vao6y8wf; spf=pass (imf23.hostedemail.com: domain of cachen@purestorage.com designates 209.85.216.43 as permitted sender) smtp.mailfrom=cachen@purestorage.com; dmarc=pass (policy=reject) header.from=purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748980863; a=rsa-sha256; cv=none; b=02Hnu2tqVaZjsn2qzmRq14t3fNKokf5VIDkjlj15CCy8uo+MMdWoJo/evAqy7HaJhKsdL7 JfTDQSkpsBCI8faWs9GgfoK6iI+Lt9hiCvwnzrWWwyqT/9+knuY9ZbKnKs8lL5sv6OREen s61azjXkV9APeCvWOrOdF2gTnh88epE= Received: by mail-pj1-f43.google.com with SMTP id 98e67ed59e1d1-312cdbc67c9so284630a91.1 for ; Tue, 03 Jun 2025 13:01:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1748980861; x=1749585661; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=TBSwdRp2HYUUsAkC669MaLvRlIKf7cTDcGKLYd5UiaA=; b=Vao6y8wfngXT7hLfwT4k7c7/M4BBTP/M5udx5PzkhvfBzBrrB3yjmd2MicrSwC8MAQ 5Ly6Lo1oGyY28bnlJMNpKB1rcceA1n43LCauGMLXNOSx92yAqkGjdxK9a+HAzaOAOuWM 1RWjjZ6Fxeb9Q2v0ojyZ2vEqNk25LLMvBFpzAJGmT7gsOTluzQVqefyDrnxbwQDMY1/y bf1vYtW4ZEU03rJoYiMn17N9GCxropb2eZ3ARSZjHZAGafFkXcfRlrlAEucg8Sl7X08f oDuNbUSYmq2moCIb+wyluNU1A8/2xxDwyggWsR13O9EpRDHqe3yps39tBUtwz/cPrh8N qgoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748980861; x=1749585661; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TBSwdRp2HYUUsAkC669MaLvRlIKf7cTDcGKLYd5UiaA=; b=fDhiqgLzIHKBCOTea/mB3NLSnj09z3GeIVx6JsB2+zKE+X2MFN3K23b0l0MZ0iTnOm 58Tt3kuTQEUKpBLyA1LF6h1aB2hu1Gr04ux1XGHmR1TmFjn3VyvWx3AgLs+w8pgRw2M+ xMVzn8Uuh680t350KJcrrbE4exvItScXgdj0+9p3x7J6dPJB3yD2UclsGUxkXiBSpmhP ORSAUmDPO88y19Aw40CsKDFj7L8+tjGyhMbm7TCcG2wqntlfzEQ4H7axSz4tACzLt1I5 MC3ZX+EklKnjp62aUoZqKal5aTsjVUsg7Q/iTZ3Y+K0BKo6h+sSFXS5tHMfVK+4XCtBt 6lBA== X-Forwarded-Encrypted: i=1; AJvYcCW00NYwcOcdyBtOmWEp0qgrsur5TH0fhOtXu56947LvSwRQU5xGFaSD/kVIgzt2o/Fn7xgcUYyrCA==@kvack.org X-Gm-Message-State: AOJu0YwREzj1U/tG7afZuWXpO22BIZ3Qt6FYMK808VzIOhEcydSWjA6w vSogfwtsr62qMVIwV509oDRjBSnCnPH6sGavhSSPQF8scU1LSSgHXZq/1qQEs3br5HFSddph8SI CifBiVsRP5bQZC1iGMyD5DPLyzokEh5jKkQ3WHBoxLw== X-Gm-Gg: ASbGncvk36bepwacHfIKDaMrSI+Jv9RhfYGh7RxEX+iFH6/vMtycjWEaeQiF/SVLEGS SapSku7xqV9QD0A5AUe4LrJx3l7ly74KIgq0vWgzmDTRPLXlWgT0kBiBXxcd+Nj3+mAeBcAouYw Q+oj8lsAg7xKFE1pqmLLJXtGg5+qM1XBNB X-Google-Smtp-Source: AGHT+IH2AJQsrhZVV46u5xaq9gJvM2IhD0Ohe5hyD6XLKYL8PnaYFN3SR+ElqUV+dwwpN5DqV03CRNVhXgSf16EUlMQ= X-Received: by 2002:a17:90b:3d8e:b0:312:1ae9:1537 with SMTP id 98e67ed59e1d1-3130cac7c3fmr232685a91.0.1748980861557; Tue, 03 Jun 2025 13:01:01 -0700 (PDT) MIME-Version: 1.0 References: <20250530003944.2929392-1-cachen@purestorage.com> <5iiwnofmnx565g3xv3zdt35b7qkuwylzedkidnav72t24asswj@omjgyjnauulg> In-Reply-To: From: Casey Chen Date: Tue, 3 Jun 2025 13:00:50 -0700 X-Gm-Features: AX0GCFsMRcHuqtU-ty6zf-_V8Y9l_J-qSGAksDUx1YhoDrQRJ6sEy-soTy-8i3o Message-ID: Subject: Re: [PATCH 0/1] alloc_tag: add per-numa node stats To: Suren Baghdasaryan Cc: Kent Overstreet , linux-mm@kvack.org, yzhong@purestorage.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: F2BED140002 X-Stat-Signature: 65gy4jhm34c56k3qy4b65tz7qm5jkfog X-Rspam-User: X-Rspamd-Server: rspam07 X-HE-Tag: 1748980862-684229 X-HE-Meta: U2FsdGVkX1/9pYbU6wof2PHS32WOPToqeYTlFpdWBPoURKJe05ajBQo1KAvyidkP/GKDhI9MyYLz4JirrrS+jG6tXjzd7cz6dNbCX0TkRBQsVPn40Rr+W3c2Mj/T4XO0pdpPS+XHZ7U9sn8TZevXdSsnXy+DYRzWPf3L3hUikaiTCqP0miorJlhxsu7SiGRc4UlD3hYz4EmdfpJJEqj49SeW7PltXoeAwlYjYWtKe/RNND06b2Sej3SdBGBQO6bw9u4Mmq6dRCiNMyMOoifL5hcnmUthQuO/4jwrWiU12RLjN49TmV3Y39/4jvOeOKd0lkHB+sS+gktuP9QasdH/m8Z3fywgstOdqOOd1SzVXk2zjVRCYYR0UjXAHAsfBNkJFvzqUU3l9V9YbLANRIvPRG37Bhq9uNWSKxPEEmUiNy8Nc5Fm0QBQIZeWCxcthDOsfrotTmhfaYqKKTcKuhkjR6iqdB6J9doyGtUIcCzU+KNTGSAHqHlAJpkD9Rt9maM/BopIA79xNW8jptA7/Xykd2/1gtB8td2EjepichZVh5jGV0vgPUL4x+OdvBUz9FzPjg4FkmGb73sSUCOEY9qFpDttvWBZTDiDziOFTXyJZKJMUFObarqr44F9fYyGQTPEhNXrQNQ6yS225TZKanpJEX0CjqfpEH1/580n9PA7qCmKBdPpWH3RKzAwvaVmRK4t9dP2LrqZ72qlH3Ky3M1qfBgxnHXJfeMc2LyRFtCaIxsVYa3bxwtNrH8LGdiQwm+AKf/q8hFm+C3UXbrZHK8RpWIWI+v0OXE/i3DGPtkngYESNQH/uD93aQYMrE8iHD3d0gOlxHYK0dxT3fKUdPU78O5hyZoFUXElvoHA0+8EPRCD9genZlPFdC0IbXJ4owaJpoB5oeBq0bKB672WH5HSv28/2Hgs4kcMNwR9jSqr6y2LyOwFthnJBnA0hyTMASeogE9BVBvUm5XA8tPKPks xnYP0UZe dQjwAm3jal2OwJ1Lnkagg7taWeqmWqxE7giymF9/J8zc7qWZUALhh6us5nJ4/MP5DiEM/AfwWxwr2YbsuZLChIX4iEyeHkzgFcC9mZRJZ4aOizIXcU9CVlpyz5P5v750UFeBqViXobXKl/w/mFHrUck1K4IuKebxxFXekWiA79GmPkTdMt5xfH5T+nwRYyLkrOwrvvWIHzNUMnSCaCt92p4Lh+IvQFROSO6xF5LDgtxAyrQEfLfo1Iv51ctcMDsSPU7vRCtxpUv3H/vDYtehpa4K/xoVvKuC29ZnNv9IRgggkOU2de48en3pNz0KIDvk2d/LqvEam96ePhbBfI3HqutYfbDnoojXZ8902q8Y+f1AojYY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 2, 2025 at 2:32=E2=80=AFPM Suren Baghdasaryan wrote: > > On Mon, Jun 2, 2025 at 1:48=E2=80=AFPM Casey Chen wrote: > > > > On Fri, May 30, 2025 at 5:05=E2=80=AFPM Kent Overstreet > > wrote: > > > > > > On Fri, May 30, 2025 at 02:45:57PM -0700, Casey Chen wrote: > > > > On Thu, May 29, 2025 at 6:11=E2=80=AFPM Kent Overstreet > > > > wrote: > > > > > > > > > > On Thu, May 29, 2025 at 06:39:43PM -0600, Casey Chen wrote: > > > > > > The patch is based 4aab42ee1e4e ("mm/zblock: make active_list r= cu_list") > > > > > > from branch mm-new of git://git.kernel.org/pub/scm/linux/kernel= /git/akpm/mm > > > > > > > > > > > > The patch adds per-NUMA alloc_tag stats. Bytes/calls in total a= nd per-NUMA > > > > > > nodes are displayed in a single row for each alloc_tag in /proc= /allocinfo. > > > > > > Also percpu allocation is marked and its stats is stored on NUM= A node 0. > > > > > > For example, the resulting file looks like below. > > > > > > > > > > > > percpu y total 8588 2147 numa0 8588 214= 7 numa1 0 0 kernel/irq/irqdesc.c:425 func:alloc_desc > > > > > > percpu n total 447232 1747 numa0 269568 105= 3 numa1 177664 694 lib/maple_tree.c:165 func:mt_alloc_bulk > > > > > > percpu n total 83200 325 numa0 30976 12= 1 numa1 52224 204 lib/maple_tree.c:160 func:mt_alloc_one > > > > > > ... > > > > > > percpu n total 364800 5700 numa0 109440 171= 0 numa1 255360 3990 drivers/net/ethernet/mellanox/mlx5/core/cmd.c= :1410 [mlx5_core] func:mlx5_alloc_cmd_msg > > > > > > percpu n total 1249280 39040 numa0 374784 1171= 2 numa1 874496 27328 drivers/net/ethernet/mellanox/mlx5/core/cmd.c= :1376 [mlx5_core] func:alloc_cmd_box > > > > > > > > > > Err, what is 'percpu y/n'? > > > > > > > > > > > > > Mark percpu allocation with 'percpu y/n' because for percpu allocat= ion > > > > stats, 'bytes' is per-cpu, we have to multiply it by the number of > > > > CPUs to get the total bytes. Mark it so we know the exact amount of > > > > memory used. Any /proc/allocinfo parser can understand it and make > > > > correct calculations. > > > > > > Ok, just wanted to be sure it wasn't something else. Let's shorten th= at > > > though, a single character should suffice (we already have a header t= hat > > > can explain what it is) - if you're growing the width we don't want t= o > > > overflow. > > > > > > > Does it have a header ? > > Yes. See print_allocinfo_header(). > > > > > > > > > > > > > > > > > > > To save memory, we dynamically allocate per-NUMA node stats cou= nter once the > > > > > > system boots up and knows how many NUMA nodes available. percpu= allocators > > > > > > are used for memory allocation hence increase PERCPU_DYNAMIC_RE= SERVE. > > > > > > > > > > > > For in-kernel alloc_tags, pcpu_alloc_noprof() is called so the = memory for > > > > > > these counters are not accounted in profiling stats. > > > > > > > > > > > > For loadable modules, __alloc_percpu_gfp() is called and memory= is accounted. > > > > > > > > > > Intruiging, but I'd make it a kconfig option, AFAIK this would ma= inly be > > > > > of interest to people looking at optimizing allocations to make s= ure > > > > > they're on the right numa node? > > > > > > > > Yes, to help us know if there is an NUMA imbalance issue and make s= ome > > > > optimizations. I can make it a kconfig. Does anybody else have any > > > > opinion about this feature ? Thanks! > > > > > > I would like to see some other opinions from potential users, have yo= u > > > been circulating it? > > > > We have been using it internally for a while. I don't know who the > > potential users are and how to reach them so I am sharing it here to > > collect opinions from others. > > Should definitely have a separate Kconfig option. Have you measured > the memory and performance overhead of this change? I can make it a Kconfig option. Let's say, CONFIG_MEM_ALLOC_PER_NUMA_STATS=3Dy/n, which controls the number of counter per CPU. If CONFIG_MEM_ALLOC_PER_NUMA_STATS=3Dy, num_counter_percpu =3D num_possible_nodes(), else num_counter_percpu =3D 1. There is some memory cost. Additional memory used =3D Number of additional NUMA nodes * Number of CPUs * Number of tags * Size of each counter For example, in one of my testbeds, additional memory used =3D 1 (two nodes in total) * 112 (number of CPUs) * 4540 (number of tags) * 16 (size of counter), ~8MiB. This testbed has a total 755 GiB of memory.