From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8775AC369C2 for ; Tue, 22 Apr 2025 22:10:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 23F146B0005; Tue, 22 Apr 2025 18:10:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1EE196B0007; Tue, 22 Apr 2025 18:10:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B72D6B0008; Tue, 22 Apr 2025 18:10:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E04F16B0005 for ; Tue, 22 Apr 2025 18:10:05 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 116171D05C3 for ; Tue, 22 Apr 2025 22:10:06 +0000 (UTC) X-FDA: 83363073612.08.669E206 Received: from mail-vk1-f226.google.com (mail-vk1-f226.google.com [209.85.221.226]) by imf23.hostedemail.com (Postfix) with ESMTP id 17E3C14000A for ; Tue, 22 Apr 2025 22:10:03 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=cVxD4Nzb; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf23.hostedemail.com: domain of csander@purestorage.com designates 209.85.221.226 as permitted sender) smtp.mailfrom=csander@purestorage.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745359804; a=rsa-sha256; cv=none; b=INxMDNXFc+/Ru4iBuRlRbfSh+kgHaQlw74xW/SM1JatpbFgwLCSZ+M0MReo0wJI8G6pT6f AIYRBUGhWMSp33I7UsvIRnn8GdNgzhRkq+xUKntjpAN8W5FsgK7pzA90UNmKc5Y5C0hZCx sXEwZ1lqMS+/+WaA8vpcEFVEHkfiR9k= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=purestorage.com header.s=google2022 header.b=cVxD4Nzb; dmarc=pass (policy=reject) header.from=purestorage.com; spf=pass (imf23.hostedemail.com: domain of csander@purestorage.com designates 209.85.221.226 as permitted sender) smtp.mailfrom=csander@purestorage.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745359804; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=KwOPiQZIcSoYiJQce8m9Bf0/9Br1sjvbT2lnDAfcS9w=; b=tmGUufYC5hOrjmsA4KyQOjhjKHr30SMZczbqzCF//ajYegr/jXGiiOJCA5d5IPZDkA0OeP 9k5Wcu6MwQo1+6M1iLe4yYlVXweALCjDo0f378M5kivngZGHZfuBB03xUJhKLRe4J3Hd90 wiAsxOXgZuc02k1qG0JVGpMQ6JGNDtc= Received: by mail-vk1-f226.google.com with SMTP id 71dfb90a1353d-523e895dd3dso173610e0c.2 for ; Tue, 22 Apr 2025 15:10:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=purestorage.com; s=google2022; t=1745359803; x=1745964603; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=KwOPiQZIcSoYiJQce8m9Bf0/9Br1sjvbT2lnDAfcS9w=; b=cVxD4Nzbw4VCIyXtVpaJz4+6jaUCc3btjfef6V7SgqZFZsFSMBNI+7clKl23RRkvPT 8BTvW6cswql4vm8zGbPGel6/FnCsWs9/P6DmT0VMjReORwm7l2G4EThjSw5fly3CUi5G JugyX358f1mXT29SGjC2daU8AG5SqfzT7/J/YOLORqtylU+F0nbu7gwWFyhgGI4+tiBM sf2IhZGHvtcGbVYIAxUkOdUE32w6MSxFZidNDjDnTvdSySPI8ebB0sXU0TuQZWvjzsbV /aI23XSIQymjLQzkG7VowO+9CgUFTVCHGOhPBUFANNOHR16l2PXiYUK2NvaBXtR0a6lW 1F0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745359803; x=1745964603; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=KwOPiQZIcSoYiJQce8m9Bf0/9Br1sjvbT2lnDAfcS9w=; b=SLNBSFwVqDSumG2t9hQKDLIH+BEHYbL9DYXAZO1ZYLycZdaGgMzZU6GXNtHbdo5VWP C+ZPm8tN+BRaUCRD5HXwTXLSuo3qwk1k7wkQAwd3ZBQlBXw7IjajVH0JyCDLGm6V01Es vS313zd8TCgcYjw3FEpbI3XwdETg5q10X9PGsq41YKFpkBW+pSMrWjZ2Q5IxoHrNP35H g6Y3k5t06xw1o4dcBf92XyH++IjU5GAkSABc5EOt9SB/soAZmzNpD5L09lyUqfMVEA3X ztfS/t4bHogLBbua1qstRtyxv/DyEMV3YOjMb39DF/w7Yfuz9ga2xD4sXK9pM20XwLNQ JTgw== X-Forwarded-Encrypted: i=1; AJvYcCViea4lnsAQslI0TLzwh1Dgbz4xGwW13U+P/2yctA5J3PnjuSFbb7WQ4TiAWjgsJqq6GsyXr0pfJw==@kvack.org X-Gm-Message-State: AOJu0YxQ0g1M1vskEVkVj5ZaCVSpE4JaQ5q8N5AU0sc9LSQBeve2kZql Azd8VfeJZ6/ZZtmO2IGCXeVZz5lj04rtPUPc896LpfEkxwyvooHHYez6MUF9Aam+6ps/URbA+Vt 7lqquY1VR0QBxHtUKC6d/HHz6w3NrPUcq X-Gm-Gg: ASbGncswsHbSKrAZ1tPFFJKvTsssmop9mMpZEXL7d0OH1+f3n6mK5A82CSb+iXdTrsi 2k5Z6Ote0hJOL02NNg2/NuHL+MTaMYLQRuiw/HG3MCqhVTOrH9DMyyxspv1bQ0KONIZay8AcQiH 6C9PgSWOEkzNCNjdGgYbLU0qq8h8rjy7EscCwdAaaNmKzqNtbqBbJv7Pfu0nP4CnrPspuTALTij K0TmxNXu7az+nunqUvZ6g9LCvQwyMwMihhAlWb0dSSrOgBQY+ncpRjYACJJv60I92LNrDZO2C7F /2emWiq9SZv6SotrBrSRKcg8WxVc2V8lAdskDJj77x09 X-Google-Smtp-Source: AGHT+IFxdjGFL/VvbqbDrlnoQYfdKyqz0v7X9GEE7nLpcx5NH6BX81xcZcT2Wtr3f6QKNmjvv2s/vLJ60LMH X-Received: by 2002:a05:6122:1051:b0:527:67da:74eb with SMTP id 71dfb90a1353d-529253b58b4mr3568089e0c.1.1745359803082; Tue, 22 Apr 2025 15:10:03 -0700 (PDT) Received: from c7-smtp-2023.dev.purestorage.com ([208.88.159.128]) by smtp-relay.gmail.com with ESMTPS id 71dfb90a1353d-52922c112ffsm364279e0c.5.2025.04.22.15.10.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 22 Apr 2025 15:10:03 -0700 (PDT) X-Relaying-Domain: purestorage.com Received: from dev-csander.dev.purestorage.com (unknown [IPv6:2620:125:9007:640:ffff::418a]) by c7-smtp-2023.dev.purestorage.com (Postfix) with ESMTP id 6653234045E; Tue, 22 Apr 2025 16:10:02 -0600 (MDT) Received: by dev-csander.dev.purestorage.com (Postfix, from userid 1557716354) id 60268E41D69; Tue, 22 Apr 2025 16:10:02 -0600 (MDT) From: Caleb Sander Mateos To: Keith Busch , Jens Axboe , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Caleb Sander Mateos Subject: [PATCH v5 0/3] nvme/pci: PRP list DMA pool partitioning Date: Tue, 22 Apr 2025 16:09:49 -0600 Message-ID: <20250422220952.2111584-1-csander@purestorage.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 17E3C14000A X-Stat-Signature: p4jypidgktc7u35yfmas3dcpyza5auk3 X-HE-Tag: 1745359803-48995 X-HE-Meta: U2FsdGVkX19NNSilfhDQBznKwxaAmnRkUdxje8tbMvaGqhHZIA3TJJtRFAAwCcnpj43nW2LZxnR7LGKFwv2c6uv4IMTFKXgM5bhokT66iBNnUG/qJXq2J7PC4BOO92Ui3JGPPpy3L+PvafiqOXzzUt+pwTteOb/9eKpu5buIPXxvDefgrOg8ESOo3cP61d9TX9edeCgvA2ArSa4XQQZxLcxtJEKgPdq7dQCHSuAU9JCbCWU+bP34mDEHn+85yMSnasmaiMzu3BolZhrdPZI+l9i8DogPbq/mTL1+U9Ap7JWvZxBoRCImbULxcq+03iFQUZLhqrSpC18wviXOglkVC8Fjb9aZKWrr+934VsQS793Xkk0jgZIkAvFs7R98Rqjwl3/i1EuGA4Z1Bo98IZKy1ZQvcr+1N4JHhNhmfGHhYwWTtd63fCisJg9srsZpG1aAJ6CCOxuqbQyyaAk/NvVrTdGiflTT4iaGEZzTziki3yBOCsFi21rZ2aLb2LDqXva9/AH2rMg+SyQhgnYYBx43KQo8OAWlwq3sTxwZ3dQ8R/g6fpPOCKgJ+AH6Isj6BeAFtJvrlJnmyZOt9zdiI03KiBqOl71x3YXx2RlvCdvVJyBRCvNHley+VgHVN+cpWtuFvBN8/ESkucRc6eztwrPWL/FImvhh09D9Mn7FJODfp57HEL0+zQejmRDAoTFzKZPjH+Us4XbdtH1QEUx7ZlJfD2vXhT1W0HC0O8AqWWDS8gFkTUXpH3FXrLh4rk/ipM2lx47m6EzDPokW2CvLjRtKiuW6jXTgMV9uxELB/YjLhG8SMuwUUZoi4mPHWQU4WFQfu2fjh2KDdlafHdKanAot7fPo9cuU87G8y0paItkavU9P2niJAmpugNkA0Q4yVJioWfO0BaiE6K2JXq5S/rwaaTNDzncJACUWpmPn075tYfbVcob2x6/HBJsUYq8wKRwK1bI8N485GfIuks+PX1H KP5HGuVT 58Abub2aqGL+RAQ53vijD4tiLrpXBJxYGmSgIRuFDOMKiWCCgf7Pblvc74VaQL/oYmAOWGVLf9XVAP0PF0ZVZCbnNbaqOlGC/K70z9vzlH8lixN2O5g0H4B6FBfjhuSvOHDtBWWVyIzWlqh0HBUKQf7geLCFYzxDGa17IOKGS4do2JE29BYep8FA49VEYWuHdeCAXmO+xd7mdtCyr18NVrv3J7CFF2NEdZBlcIk5gSKV8tZwiOijOSxBDQRwytj6CnCQ7nF0gClAerkxphugll3xJuvBe++rlCRlWaJjoBIgVxCste1NocT8gRsqYsSUN0D8C X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: NVMe commands with more than 4 KB of data allocate PRP list pages from the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. These device-global spinlocks are a significant source of contention when many CPUs are submitting to the same NVMe devices. On a workload issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes to 23 NVMe devices, we observed 2.4% of CPU time spent in _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. Ideally, the dma_pools would be per-hctx to minimize contention. But that could impose considerable resource costs in a system with many NVMe devices and CPUs. As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each nvme_queue to the set of DMA pools corresponding to its device and its hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by about half, to 1.2%. Preventing the sharing of PRP list pages across NUMA nodes also makes them cheaper to initialize. Allocating the dmapool structs on the desired NUMA node further reduces the time spent in dma_pool_alloc from 0.87% to 0.50%. Caleb Sander Mateos (2): nvme/pci: factor out nvme_init_hctx() helper nvme/pci: make PRP list DMA pools per-NUMA-node Keith Busch (1): dmapool: add NUMA affinity support drivers/nvme/host/pci.c | 171 +++++++++++++++++++++++----------------- include/linux/dmapool.h | 17 +++- mm/dmapool.c | 16 ++-- 3 files changed, 121 insertions(+), 83 deletions(-) v5: - Allocate dmapool structs on desired NUMA node (Keith) - Add Reviewed-by tags v4: - Drop the numa_node < nr_node_ids check (Kanchan) - Add Reviewed-by tags v3: simplify nvme_release_prp_pools() (Keith) v2: - Initialize admin nvme_queue's nvme_prp_dma_pools (Kanchan) - Shrink nvme_dev's prp_pools array from MAX_NUMNODES to nr_node_ids (Kanchan) -- 2.45.2