From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECB86C3ABAC for ; Fri, 2 May 2025 16:48:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A31C6B0089; Fri, 2 May 2025 12:48:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1538A6B008A; Fri, 2 May 2025 12:48:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 019FF6B008C; Fri, 2 May 2025 12:48:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D313D6B0089 for ; Fri, 2 May 2025 12:48:31 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C6E051201E2 for ; Fri, 2 May 2025 16:48:32 +0000 (UTC) X-FDA: 83398551264.03.709BB87 Received: from mail-pg1-f173.google.com (mail-pg1-f173.google.com [209.85.215.173]) by imf27.hostedemail.com (Postfix) with ESMTP id C92F04000A for ; Fri, 2 May 2025 16:48:30 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=PPg73q9n; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf27.hostedemail.com: domain of csander@purestorage.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=csander@purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746204511; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=FGKG+bqqtU1mFVGRnDFZ9YvbYaqYm4zslqMRs/wrJK4=; b=IctOxi8/iWfVd7fGerspe79DHY9pMwQuX90SRqKG8oRCsgB7sJxXQ8lB3AS6UexNuz2GJn vQ+m+bfLIYdKX1ds+BD2rwB3spffpm7zliICBn+71DvtAZUxXQGVpRT5oGc2dA97mlpxiZ oZr22JM0i/hfmZXdnaeGR4dT4Mv69R0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746204511; a=rsa-sha256; cv=none; b=lglBHEELTrkYOLO/bLquaQNn2adyEqv99Cno58Y5DZ12EdIrXec5yYK1/0Zrhi42bH3cZ2 GnrCh0sZrCUe9PVmLxaK6eTDwaqG2LSl+Yt35nkx7TqndEI6otrg8ZQNt/w2B2swu/wZNm wnrD1LU5hWHNxtj7ID4C0/IlsuOF/cc= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=PPg73q9n; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf27.hostedemail.com: domain of csander@purestorage.com designates 209.85.215.173 as permitted sender) smtp.mailfrom=csander@purestorage.com Received: by mail-pg1-f173.google.com with SMTP id 41be03b00d2f7-af5499ca131so283098a12.3 for ; Fri, 02 May 2025 09:48:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1746204509; x=1746809309; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=FGKG+bqqtU1mFVGRnDFZ9YvbYaqYm4zslqMRs/wrJK4=; b=PPg73q9nJwsG9KZN69rBXaFDecXxf9AREczcUOc+iWiJTXd/X8BYs/F0a7NYnsHQ8U vQgCUDaLge2PqWETuBLx5vwub/9RhmaAOvDtcUDsVeRFqRUvLl/s3oQkSto7WoE1q90k ErjVf0T26BHtwwToYiieh8XuDxk6yGkAxtCC1tWxTWmTIBlHCGL7ZeZbkLvy3ac7g3dp 2RasPlYTnn4Sqvh7N5s7ksIwSxFoucSJRqId1NYtMuJu3TAEyRpW10Us3e8rK9ekSTNj 1CRQYsjuZ+kt8/Qz+f+vUGwNbcO+CqNQA2OKBxPkyOXpv/mvapKNV+zGZOSTe4bBQQMQ /6Iw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746204509; x=1746809309; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=FGKG+bqqtU1mFVGRnDFZ9YvbYaqYm4zslqMRs/wrJK4=; b=Xz2kwm2h6kBL2DSBsUbqN+VPZBhI5OAS+gShYFUYQvmlm/Bbxh5rX6++SJl8ObPssi oYow5AOe3bUtnvxrc5OMkq2nbwAKf/WW5lbXykbFbdYhA38SB1y4UZR2hOee/ZmN81mq 6yYBa281zkT7fMIcCvdU+jTEsg2rnowO6W/i+sFXb9zPAxj3L/qKWJx39Nps8eyo9WDT aXTj5uaa8jXwE7eD1fmeIL1ww+bT0408G9yOb2/FGS3C46cVJ/89Lc/JMDHrUIfB8iiF UbQBCeB9JMoGZYFfocPaA7D8943cb11jHhFUEOa9RxMckLQBmr3P/1g5cDU5siOs/Bv1 vTCg== X-Forwarded-Encrypted: i=1; AJvYcCX3oHf+98tx+vGnw+5Q8clX/hjFm/qwFwLTdgNtrLFQKjqCTEjCjhY+UkjlhT9zUNonQwaGCA4fOQ==@kvack.org X-Gm-Message-State: AOJu0Yzq7Bb/ewmH8mvmmFg6VqkMk6OxDOKGMpo1z8Z2CyposfdVKUjw tNOnUe6SbwQnMvgW2nqFFwgw+Ied/OIUhZ8mQd9o3Mx9XGz4ulERYVVAFzVPvGFtglTdKFFuDRN e9KA9aR4WkgQVTD9XhNjmNCp5zYLlKODRjFbJ/Q== X-Gm-Gg: ASbGncuhV5rwpf36RVBVGJDB+kBGsk8eWouvI8exBTj4ALz5Pvw6j2LlNqM3SOjxp2K at751imE15Mkl+8LwNeiopwpkIyj+xlUFaP0pgKGFbFmWWDHr9qMsZ5GXquBrhsJUT97K/75I7s 5n5dSLufQPwUsv4ZLop2ss X-Google-Smtp-Source: AGHT+IEELL9u72c1Fg2kejI58mub1eHMZr6Gpc7Bc6hBIFC204l0EKzL5sGiaFODbZX/MTt3DAVSgDpVZmluKrqjuK4= X-Received: by 2002:a17:90b:38c3:b0:305:5f31:6c63 with SMTP id 98e67ed59e1d1-30a4e6f7e74mr2199297a91.6.1746204509624; Fri, 02 May 2025 09:48:29 -0700 (PDT) MIME-Version: 1.0 References: <20250426020636.34355-1-csander@purestorage.com> In-Reply-To: <20250426020636.34355-1-csander@purestorage.com> From: Caleb Sander Mateos Date: Fri, 2 May 2025 09:48:17 -0700 X-Gm-Features: ATxdqUHt_RY6WlfG-cFs204ON7jneR73r_rcAj43sEs6pcYPHQ6F77sRjndgWok Message-ID: Subject: Re: [PATCH v6 0/3] nvme/pci: PRP list DMA pool partitioning To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: C92F04000A X-Rspam-User: X-Stat-Signature: g9bcznd9snxe9mq7fkdkyy1zq951tf7h X-HE-Tag: 1746204510-304727 X-HE-Meta: U2FsdGVkX1/DTTkwRlpo28pvJumHS528n9nDj2zQ714G0j0lLVdrrwG0diD4iUX0lGisaN9k4Tf/tms4IOOQ9/3XzEcwsXO4CBevQH0oSvMhwKnDrES9R+cG2aYeiC19Z9LlmD+9GSYI/9zloyHchoh9aFCnik3t6ftdLdyy9TMkabfnM1mG0byugMBaq1eZUsOQkjd3U5vYpvUYspOAo14hFF/hDAg9TNlQPoO8gaxbdntonxk0RjIT782kz1e0+WMiLu54UGMvjVZeBG84OMErlN2EszJgmyFqP58DKEA/MFzqMlI0agK4S4Oe59fUCyYukvpOTUxdR/jPcN3EV37U/lHMhC83KrY04fwD94Ds8ZB6VWnWDQ9AO/IPH/NQsvqlAYRT+m6TGpo87UyvgTU0A9elgEf6+ZiBxOVfGZf0ScDSjP7P/BNUJaH+2mLQms69XuGDIHVlIbnRq9mJCxCZA/5PPxPAXY88FKPK41CK5oZQPugPRfe3Y1OHpjk1Y1yVWVcpS+RwzbsM3PI64jSrIvqZdM0ySOtGBG8UepVTxdU3OUQGsocg7umuRMFNdyD3at664Qi3jCYzO5uEvq5yZVBAfGOCRJgPYLZbRTITW3CrghTzeAFqNKep7/qY/MQTM37onJ9QLn15wfrQPd68HdQA2PcPbaksOkqT5is33/Jxnd2IlbpJDUeRlBUJd71dkiib976DPpuDYGR9zrcDd1m8XCdWpvSEEwBrMdqJXUtcw/kJ5z9joqREHSi5s/SNjB5AjVdJns60FTElwCA8D9IpourxwnWkbSlRr67N06uYP4jpacsbFO5oJsQIl0+fDEaSSC6NNedkGuwuiBBow5TEoYy9THDqEgeBQiTdhv3fBUsqepvuOhX03Ed/E/Zm3JJIAlzKEDrxPHThWpwV5biUB9Ij3JD57uz/88uNMgAdtxgDvZBZx4gLAfj2e/Bi5YFzgiebZI9FrL/ iFTJsyDa baP2fzXO0G3d6r1DJaReOzAgxdPJ66bVMalqNbsPrFAdFkP/vUn9gx167ckJj9sPxG6wdQXIxp95wmJ+wi9trXcq2lybwL5riFfTQNM/yf8k/JFBmZxFpKpc7cfOQzeSwLBEEzbiY/hfkSpD8fctzl9k1XvVR1dt9MxQi6Ug8F7UMCOFYoBS2gB3xekOadHnndQFyRM7TuSo9NzVUcNU1m09AzXIGR61ZBybFKxFiJ8ay2lhc+Gzp1EHb0aRI0tnT480OtRygNkqeMldgLV4dJBB5IQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi all, It seems like there is consensus on this series and all patches have multiple reviews. Would it be possible to queue it up for 6.16? The NVMe tree seems like it would make sense, though maybe the dmapool patch needs to go through the mm tree? Thanks, Caleb On Fri, Apr 25, 2025 at 7:07=E2=80=AFPM Caleb Sander Mateos wrote: > > NVMe commands with over 8 KB of discontiguous data allocate PRP list > pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. > Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool > spinlock. These device-global spinlocks are a significant source of > contention when many CPUs are submitting to the same NVMe devices. On a > workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 > NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in > _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. > > Ideally, the dma_pools would be per-hctx to minimize contention. But > that could impose considerable resource costs in a system with many NVMe > devices and CPUs. > > As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each > nvme_queue to the set of DMA pools corresponding to its device and its > hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by > about half, to 1.2%. Preventing the sharing of PRP list pages across > NUMA nodes also makes them cheaper to initialize. > > Allocating the dmapool structs on the desired NUMA node further reduces > the time spent in dma_pool_alloc from 0.87% to 0.50%. > > Caleb Sander Mateos (2): > nvme/pci: factor out nvme_init_hctx() helper > nvme/pci: make PRP list DMA pools per-NUMA-node > > Keith Busch (1): > dmapool: add NUMA affinity support > > drivers/nvme/host/pci.c | 171 +++++++++++++++++++++++----------------- > include/linux/dmapool.h | 17 +++- > mm/dmapool.c | 16 ++-- > 3 files changed, 121 insertions(+), 83 deletions(-) > > v6: > - Clarify description of when PRP list pages are allocated (Christoph) > - Add Reviewed-by tags > > v5: > - Allocate dmapool structs on desired NUMA node (Keith) > - Add Reviewed-by tags > > v4: > - Drop the numa_node < nr_node_ids check (Kanchan) > - Add Reviewed-by tags > > v3: simplify nvme_release_prp_pools() (Keith) > > v2: > - Initialize admin nvme_queue's nvme_prp_dma_pools (Kanchan) > - Shrink nvme_dev's prp_pools array from MAX_NUMNODES to nr_node_ids (Kan= chan) > > -- > 2.45.2 >