From: Caleb Sander Mateos <csander@purestorage.com>
To: Keith Busch <kbusch@kernel.org>, Jens Axboe <axboe@kernel.dk>,
Christoph Hellwig <hch@lst.de>, Sagi Grimberg <sagi@grimberg.me>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Kanchan Joshi <joshi.k@samsung.com>,
linux-nvme@lists.infradead.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org,
Caleb Sander Mateos <csander@purestorage.com>
Subject: [PATCH v6 0/3] nvme/pci: PRP list DMA pool partitioning
Date: Fri, 25 Apr 2025 20:06:33 -0600 [thread overview]
Message-ID: <20250426020636.34355-1-csander@purestorage.com> (raw)
NVMe commands with over 8 KB of discontiguous data allocate PRP list
pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool.
Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool
spinlock. These device-global spinlocks are a significant source of
contention when many CPUs are submitting to the same NVMe devices. On a
workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2
NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in
_raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free.
Ideally, the dma_pools would be per-hctx to minimize contention. But
that could impose considerable resource costs in a system with many NVMe
devices and CPUs.
As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each
nvme_queue to the set of DMA pools corresponding to its device and its
hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by
about half, to 1.2%. Preventing the sharing of PRP list pages across
NUMA nodes also makes them cheaper to initialize.
Allocating the dmapool structs on the desired NUMA node further reduces
the time spent in dma_pool_alloc from 0.87% to 0.50%.
Caleb Sander Mateos (2):
nvme/pci: factor out nvme_init_hctx() helper
nvme/pci: make PRP list DMA pools per-NUMA-node
Keith Busch (1):
dmapool: add NUMA affinity support
drivers/nvme/host/pci.c | 171 +++++++++++++++++++++++-----------------
include/linux/dmapool.h | 17 +++-
mm/dmapool.c | 16 ++--
3 files changed, 121 insertions(+), 83 deletions(-)
v6:
- Clarify description of when PRP list pages are allocated (Christoph)
- Add Reviewed-by tags
v5:
- Allocate dmapool structs on desired NUMA node (Keith)
- Add Reviewed-by tags
v4:
- Drop the numa_node < nr_node_ids check (Kanchan)
- Add Reviewed-by tags
v3: simplify nvme_release_prp_pools() (Keith)
v2:
- Initialize admin nvme_queue's nvme_prp_dma_pools (Kanchan)
- Shrink nvme_dev's prp_pools array from MAX_NUMNODES to nr_node_ids (Kanchan)
--
2.45.2
next reply other threads:[~2025-04-26 2:08 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-26 2:06 Caleb Sander Mateos [this message]
2025-04-26 2:06 ` [PATCH v6 1/3] dmapool: add NUMA affinity support Caleb Sander Mateos
2025-04-28 10:35 ` John Garry
2025-04-28 15:01 ` Keith Busch
2025-05-05 10:36 ` Kanchan Joshi
2025-05-07 7:15 ` mm review needed, was: " Christoph Hellwig
2025-04-26 2:06 ` [PATCH v6 2/3] nvme/pci: factor out nvme_init_hctx() helper Caleb Sander Mateos
2025-05-07 7:16 ` Christoph Hellwig
2025-05-07 15:03 ` Caleb Sander Mateos
2025-04-26 2:06 ` [PATCH v6 3/3] nvme/pci: make PRP list DMA pools per-NUMA-node Caleb Sander Mateos
2025-05-02 16:48 ` [PATCH v6 0/3] nvme/pci: PRP list DMA pool partitioning Caleb Sander Mateos
2025-05-05 14:21 ` Keith Busch
2025-05-12 14:14 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250426020636.34355-1-csander@purestorage.com \
--to=csander@purestorage.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=joshi.k@samsung.com \
--cc=kbusch@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox