From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5640CC369C2 for ; Sat, 26 Apr 2025 02:08:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C34606B0008; Fri, 25 Apr 2025 22:07:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BE2606B000A; Fri, 25 Apr 2025 22:07:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA9BE6B000C; Fri, 25 Apr 2025 22:07:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 87A566B0008 for ; Fri, 25 Apr 2025 22:07:58 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id E7CCF1CF061 for ; Sat, 26 Apr 2025 02:07:58 +0000 (UTC) X-FDA: 83374559436.23.66E3DB7 Received: from mail-pl1-f228.google.com (mail-pl1-f228.google.com [209.85.214.228]) by imf26.hostedemail.com (Postfix) with ESMTP id C8534140002 for ; Sat, 26 Apr 2025 02:07:56 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b="WjXLt2O/"; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf26.hostedemail.com: domain of csander@purestorage.com designates 209.85.214.228 as permitted sender) smtp.mailfrom=csander@purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745633277; a=rsa-sha256; cv=none; b=V130WLP1o3FyONVGzwB5b0aGMf2RPPLgtpTEIuTWuOZTW0LYl61zUaKpkTaIiyyMmOitur nELNCu/US4nW1sseENjeKA8Y6RezGWBl9LXvDtDIzpJnxQVzoC2Vum4plDCehZlBx9iiVd qU3jYfF4iTWfBN/P/pMyQ6xTEUSgUQ8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b="WjXLt2O/"; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf26.hostedemail.com: domain of csander@purestorage.com designates 209.85.214.228 as permitted sender) smtp.mailfrom=csander@purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745633277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=sPxqdCfTuK69NF/+HUMCbay3LkIJWJ0+Y27AGW3mB5w=; b=HtrDMiIZm7BUErqfUlr+uOwkWPWHV0iw+36jWwJsarluYfpkQ18XlTf3JIvdFORzBdylF9 /MMe4Prenr+z0NML3gv8hJ7h+s7y2ML7/bgPhnhlhvzC/vhCujFLpMvyvbP6ZjgkuEBf0t 40i5rp2AFLGtv4MDr/E0v17V6deFFKY= Received: by mail-pl1-f228.google.com with SMTP id d9443c01a7336-22409402574so6206895ad.1 for ; Fri, 25 Apr 2025 19:07:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1745633275; x=1746238075; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=sPxqdCfTuK69NF/+HUMCbay3LkIJWJ0+Y27AGW3mB5w=; b=WjXLt2O/p11d9Px3EUspaEyH836ltfRl2t4IT8fdHCL7cCYJ5TU2m3KkY5FbqBlNbO YyrA3AfmOGPNyvCYXxVeUMW4skAqqHDSHpRPcELTRrAQQbuUMVi9iIKc56u6XlN723qn kWcouK9PPfGSQTfc03evRwJeXbJrO+L6BiuC20BzQfYENZraMiR0K5eeptYjshhYzwOj CYZHQq4yh2iPiKmSMRJA/VNdCjYHuV9SZi5GHkiWlSPaKAe0sMZXsXWbwe16AO1HWll9 w8k/X9shbNs/Cg00hlYoRG9UmcLbVgvZxO9SUqFCVdNFDnaTmXrBmBW1l/knbVFdp/5/ IkHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745633275; x=1746238075; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sPxqdCfTuK69NF/+HUMCbay3LkIJWJ0+Y27AGW3mB5w=; b=Jw+8xuzevRSr4EHclrf+co5zqBt96HdSdiZFH5/fVeAw6b91A+b22K7srPHKEIWsqJ f5g7oC8JSFe7obi5lljX1sASu9v2Yy7+cLHrr5t0qXQdO0QzlmvY1WtURXP1myC5dmI0 xXjLFRAN10nbCbwIV/FOuEk0zktooB+pVEBecOFMuDK70lGFdLu5o9XG8XFPUEhx4yb+ c/RNWP+11fA9G2SFYBuS8elAGIARDmMbKkAUzASGQUs23oWQis2/tMv/HBY/A6/HOoyz TidOrf8+hpEyWuTAdAjQ0o7W8H6F3WEfc3XYrPOxlQNnysEJxYrvoQWt3QWwhRlFAfcr q9dw== X-Forwarded-Encrypted: i=1; AJvYcCV6G6X8bZ1/Z/17BaiSrPaHtJBWeiWBrccR3NmhgW381omaa7BH/UYjgdCUBCuv8+iT4E4xL17neQ==@kvack.org X-Gm-Message-State: AOJu0YzVfycWO7tzdukG22A4W/cuGauNEtheMQ4ym0enAGTY2HKUlEHK SXnk8Jpxk5o7bmiB8S9+U4sePBy46OFO1azSrUC0LJpuL5uweRZ7W6ssQ87lOszI1I/btJQJqu+ kuDB68fmldAVBDz5bGwQCP6fnFPfEzRp51P2gaGiMIOWh2J6O X-Gm-Gg: ASbGnctSy9a7NVW/sILXlR7EXwUD27oJ7BgBqQZ27HGFzAcDmOz0bOhZj76fYube0Ba Z662p/fm+fxMrYwPMjdia/csDrJoN29POaBeKShtZ+BVZMMvr19nbhRkLbrKuZeb7WGPm/tc1WP vPgBQNFQwVmA3k2eUeEAObpnl/gWqoBoIg0R7wX3W4z4H4Ohp/nEY0kRzJSoizs9HxoaDBX3OtR kuaaEI4n+082ut9tM5l1QZVErfFCaHxTauSQYEcpzJ5FL89fqjKa3dGN1F9oQhx/g1CYyu3onjt z2RTXPk0l5J2ASKsk9tHNCyluRfndQ== X-Google-Smtp-Source: AGHT+IFJ+vxaCgBpvVNBnx5B9FTFi2A9sP3f/GstK6F+GJAg/un/9eKqg76budhFn0fn15Cyl/ImHI7SsnmI X-Received: by 2002:a17:903:2a88:b0:223:28a8:610b with SMTP id d9443c01a7336-22dbf740e51mr24627175ad.14.1745633275567; Fri, 25 Apr 2025 19:07:55 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([2620:125:9017:12:36:3:5:0]) by smtp-relay.gmail.com with ESMTPS id d9443c01a7336-22db4d9ffb9sm2054505ad.4.2025.04.25.19.07.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 25 Apr 2025 19:07:55 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (dev-csander.dev.purestorage.com [10.7.70.37]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id E4FCB340231; Fri, 25 Apr 2025 20:07:54 -0600 (MDT) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id DF47EE41C66; Fri, 25 Apr 2025 20:07:54 -0600 (MDT) From: Caleb Sander Mateos To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v6 0/3] nvme/pci: PRP list DMA pool partitioning Date: Fri, 25 Apr 2025 20:06:33 -0600 Message-ID: <20250426020636.34355-1-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C8534140002 X-Stat-Signature: etkok7t6jjsqewjbfrnysgmwehe7hg65 X-Rspam-User: X-HE-Tag: 1745633276-55965 X-HE-Meta: U2FsdGVkX19xMJrDFXskuUgkKI+heQMqJ8KPRgPUwJ5eSHAK0ggRN104PVMTklbqAhyAqSNmLhhnmlafkP5tI1EI1cJXDwFJ/tDPWJYJ6RI3tY0Lkz8fUCqYmHyb86VOlw5xst3VDI6gSz9rfoLTGEh4VqsnaYFijWE43Q1z3Jf2gmR85ZsT454liqetbA/NAfSUeX2PenRRND9wTXFCLfcfEzwnSBqLlMsyAit+hu4lamsVXqydtCQkE9c/Td6JvEPvRJZO6V5VaRQn9NrLQ8OKIm55dTL9yF0MBJH5jxt1lRCVLVuIju4P0c4nEn+I3nAgW2CeFDkyCQRGygbD7OYuGnvvBcTpzQRnSNdIuZHt3JZOc9smQwtCBL7LxgnLmH7dTB+ocmaAmkxd1DsDm8nz9UitlgflMRPq8CqSY7xsPEanzYLTeDu0e+vd0wCOyd2lbEjv55U4jZkUuEhkEx8F0gUOZRNJGIU5iziHfkIr1VoFmEYY1Ntrt7oyDcUJCaPynRkMLVR+u6ojWGtUkq3TKolRTdDOnpcUFAvO6ehT7HQ5WkXBZynJhalerfPT93vkUajTuzX5E0XqFZQPqYtzN/UIgs9yAAgmJ7qh20+yJCGkvEIWHBhOegutvNy5TJ2lREsUkc3GAwOY7ptcy2zSFObbaoBmpKoLRHM/JpjZj3jrLtyPx9AQ2RkyHHm/wL2INTByM2Q8/w8IqxrnTs+0rgYw1KjmHA7Aa8RsLP6mNY8G/SoDXmIY1oc9a/MpRjP4Y8yE0cic/KKSKsbdR2vRhEHMWuwctHN88F903/JmHvEcKhV3c6AF4Moewj+lVZR/ubpAlUTQNuslzoDduuiDh0e9ExL6aZnGr2sHv1WgwWpyL4ZdayGR3eY8yZzNCv4fx5QPEHqjGswfBneibZTIq7MxKQKq60MdNHy/o8HsRQJFCTibLY4DEXcbZCKnubBbbiu9lrW10nSgsKJ aiuBz2+q M1p0V+qlpBI4urx01NSZgmoc8+8dI59bruBzwDsGbTFIy1MXizvv2TtjUDH309INGiwBGjuttJR22zMcpPeskUzM2cz26KfPrZGZ91Cfe18BiHqsqEDSrjYBJvW0k17k2eMH7O6KLRcW60cJkp8TCoY86QTmrkpsUevOLcexcn0azVYNa2+6WjFU7I2gkicg18/u+B0/FQ8RzaD31HLKNxInfPZ5TZw59PNfm7NV1c4lwl/0I0fBo0LiZo+GEYaBu+nE+dAENIscKgNHclcmicP+eyM0kXlrvetkE6a0uNi0QkVdtoWEOtTAigySg3tVm1Sf0VIupIzyrkSs= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: NVMe commands with over 8 KB of discontiguous data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Allocating the dmapool structs on the desired NUMA node further reduces the time spent in dma_pool_alloc from 0.87% to 0.50%. Caleb Sander Mateos (2): nvme/pci: factor out nvme_init_hctx() helper nvme/pci: make PRP list DMA pools per-NUMA-node Keith Busch (1): dmapool: add NUMA affinity support drivers/nvme/host/pci.c | 171 +++++++++++++++++++++++----------------- include/linux/dmapool.h | 17 +++- mm/dmapool.c | 16 ++-- 3 files changed, 121 insertions(+), 83 deletions(-) v6: - Clarify description of when PRP list pages are allocated (Christoph) - Add Reviewed-by tags v5: - Allocate dmapool structs on desired NUMA node (Keith) - Add Reviewed-by tags v4: - Drop the numa_node < nr_node_ids check (Kanchan) - Add Reviewed-by tags v3: simplify nvme_release_prp_pools() (Keith) v2: - Initialize admin nvme_queue's nvme_prp_dma_pools (Kanchan) - Shrink nvme_dev's prp_pools array from MAX_NUMNODES to nr_node_ids (Kanchan) -- 2.45.2