From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AC722C369CB for ; Wed, 23 Apr 2025 13:22:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 366896B0083; Wed, 23 Apr 2025 09:22:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 313696B008A; Wed, 23 Apr 2025 09:22:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 202F36B00A5; Wed, 23 Apr 2025 09:22:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 000906B0083 for ; Wed, 23 Apr 2025 09:21:59 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 548078060D for ; Wed, 23 Apr 2025 13:22:01 +0000 (UTC) X-FDA: 83365371642.23.12DEADA Received: from mail-il1-f177.google.com (mail-il1-f177.google.com [209.85.166.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 281021C0009 for ; Wed, 23 Apr 2025 13:21:59 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=gz6PEywH; dmarc=none; spf=pass (imf21.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.177 as permitted sender) smtp.mailfrom=axboe@kernel.dk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1745414519; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yFxbUqe4x1e+3vHnnMvSYfdGB6uDRsaUIx6ClI9ClLM=; b=bxpJrGhMPyuX5pIrIwsQvIBi46DBHw6ZVqy2Ab3S9LrKCh+zUM/D4TXVNtohUKWWtdc8EQ 6JrVhmC1nsqYsR4hoqBl7J1Magut7EbCPBJb+LIwF0wH8VIwNricAYBDLCIfzC1jn6sxPh HVcSCsPspnk073RojehiUsAV07pGygE= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1745414519; a=rsa-sha256; cv=none; b=Lt0lh7sC4TE7QYelXIcUwuZtAHZx5uOOImiu3ey5sG+/2fDEgFmNSvjXde6/nz80N5nMQ8 e+FG7bO82vJvc0yEQ8oZXbJnlKS9La1tKKbE0I3jA0rzXsXLqIB7Oe0MIrcOFD8jzk0Fuq HV+iks3ou9vjlLxbcHbsXwDRjITWngE= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel-dk.20230601.gappssmtp.com header.s=20230601 header.b=gz6PEywH; dmarc=none; spf=pass (imf21.hostedemail.com: domain of axboe@kernel.dk designates 209.85.166.177 as permitted sender) smtp.mailfrom=axboe@kernel.dk Received: by mail-il1-f177.google.com with SMTP id e9e14a558f8ab-3d5e68418b5so54482215ab.2 for ; Wed, 23 Apr 2025 06:21:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20230601.gappssmtp.com; s=20230601; t=1745414518; x=1746019318; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=yFxbUqe4x1e+3vHnnMvSYfdGB6uDRsaUIx6ClI9ClLM=; b=gz6PEywHy7m+VYq3OOUKEgC7KkvcEFGgtBP1WXEQli5l4BFhvCg9yHP41tKXJ04kpa 1RgKwnNBXc1bHi1alLyioYDosJkRQJJd5qpYCGOqJBZuNt181XCcc8JJ7YG8fZ0Cj1KZ YXfITy1ZXBiSU5S/3T4UnVpomNYjvWF8jjGx0BgB9uDjG+KxK+RS0E+iCjaC70rHZVkZ 4KplkrFAcOScbwzQa/KzLcrMo8vPSvBffC8LBCf1gApjk1/fouBBczeNgVBb6phw6AaL zWC9Yda4U92l7yoYgTgZMwI5jfIVk3ZQuaNRZ1IIQn9M9z0V5Sc3QPQIhn+JU66+EYFV yOwA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745414518; x=1746019318; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=yFxbUqe4x1e+3vHnnMvSYfdGB6uDRsaUIx6ClI9ClLM=; b=UJ0x9TI8zAZjTvj6DdEVA83goB1wXtv7h2xLjzZ4R5kDrP0cFEj4WFv+HyeGahQW2W QTZAsNTtYKiic0k2OX3E2QdqoWQ4dDOQ11tzfKM9Lpb799ThoChFZJrk7OnqaAQaFFSZ xRts2VfBmXx0D7YHUl1wJi/+4ui4CB5JMeBE/CxDrgikcEJ5yfUgznCVTnyf03JH6vc7 0p7V0UyLKB+QM4nx4A3gjw0WG9TRH4IZhPXcSyj+7TERUBoo4rp+qa5DPVrxQz8Bq+Ne +eVvreX2B7ufyhijHjOowIVf9QIdFUebLs4413IhCKyvvLV/8DzxWPHyt9BOldfmdQud 653Q== X-Forwarded-Encrypted: i=1; AJvYcCUck5yP8d2KyuxEn0sTTqELEyRNi+cTmbKRT3TTbkZFhsQln3ASAXJjYhXAr9miMi5KCk4Qv0mmgw==@kvack.org X-Gm-Message-State: AOJu0YxfcKygXYfc9/Szg1gJJLpbRqaK6v80iXFS6594ICB5JfHrIIc5 AIjoLLRFnKaf85Ms/cvmTDNIf+xxNwnpz4hs4FbW2PgVee9msLafuS6xhB/MXZo= X-Gm-Gg: ASbGncseEnbewqtM0wn4vPLCFCYwkw/4mqe5LBbBTWjCkiQ2yOVi0KHQbcuzJMpP0OY 4Vw+324zITzH76lG2XzsRpdxr/5N4BMqro0mEGZ1dl+dFKrbv4oHLHqBSvTPLgn4hEe7HjV3Lbx 4SFNNv0cL4iqUB9HDM+0WxMbgp4eJYdJeYrM62h0936FcU6ctqYUz6Ib5VdMS8Us6bNLW6IlHci ixeLRkrjQ3OwV9trJl3NILVurHYHyeLizXXDkye80fnt1EeXKNT0i+s9RKm3SnwBrOnr0DfzTk+ d4/G0f4IQXH66l2Hg3yKuSQagFxoDM3dq0zYGMLesPrbA74n X-Google-Smtp-Source: AGHT+IGTVEkt9W2JltunyBersFyZ6wDo6AvaUWgTrYHtS8zxIbIrUCUZrDuNaD9JWWXTO8zilUtOtA== X-Received: by 2002:a05:6e02:17cd:b0:3d3:f6ee:cc4c with SMTP id e9e14a558f8ab-3d889047eb1mr178691565ab.0.1745414517912; Wed, 23 Apr 2025 06:21:57 -0700 (PDT) Received: from [192.168.1.150] ([198.8.77.157]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4f6a3806326sm2806031173.42.2025.04.23.06.21.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 23 Apr 2025 06:21:57 -0700 (PDT) Message-ID: <09bde11c-a3f3-4c5a-91ed-74bfd2e0f61d@kernel.dk> Date: Wed, 23 Apr 2025 07:21:56 -0600 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v5 0/3] nvme/pci: PRP list DMA pool partitioning To: Caleb Sander Mateos , Keith Busch , Christoph Hellwig , Sagi Grimberg , Andrew Morton Cc: Kanchan Joshi , linux-nvme@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20250422220952.2111584-1-csander@purestorage.com> Content-Language: en-US From: Jens Axboe In-Reply-To: <20250422220952.2111584-1-csander@purestorage.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 281021C0009 X-Rspam-User: X-Stat-Signature: j57x5daauka973o1teroimjzw4eohtd1 X-HE-Tag: 1745414518-558213 X-HE-Meta: U2FsdGVkX18AoaeGyzfh2KUGK/MozLuV2t/DE3xxHUVaXC9v9fTKo2V1ZlJ3rLr/9u7WBUPjBmZGIdG/6/uro3RVKcN7lKEZIgff/tfmNUg8xGLZ8NPPk3BZxe4lIgwhJbyQuqc/f2KJdMCfuoL5ljuT9UFqTcIiltqxdWB5k4ZDA9KWp5bxqLly4oic8Mj8+zoka11s0GpngX7P7E8sGVIGk5CVOuzOLrMZAsO/wUrvHIa/nG85ezM8YKK2z/mRaHrICwxtU0XIXMLZaC45Ct4jRtcwohlJsU7KiYGMacZZmibo/GQ/VBCpx5D91F8+tvhvxm4h5qbCY5cTm6BIcfo17gBzhy2AluD4/uzZWYXLaDHIbf5C2yKtiCvuRr3rBrT+HBpgWp+cIifA4DQbaUAR9d5OYQEZxG9KnGYREvzXKFgxxKosxpCzUN9Vz9JFCQZOvbSRuRCA4pM7hzUFJUi2VEpP1Ypb1m/y07HMbi7FuKYesP8TXKCIiIB5PrKbkJsNs3T7VEafBacuXiX5FjMomh8HOzAQ3W6fhqiofZA2LkAJrMCXG2Jo4/rIYzZznq8XXgFqe6pj4b1IhftW8DDejO61UP/RkU4vrOibVsW0tA/K7pPm0dQTTKx3zhPOwPnIjcNcDYH6qQxbRnqoCYCvnC/1bomNPwY28gS19dq5k0kGTm4tNtTiQKDs/dMfEyn1Va2+YBdyF2kQ+IFdtUFMkPTHGbTpNwtEKo0olY/Z90XH+SeeBBV0UkbWFlFp/ByST7ChWl6Whcl8j8CYt2ReqgLS2K2HyCNB/5PPgWVQ6ofP1O7meH4zo8t+WrPUbBVMNNVK+EGetczflR7XYJBFVt70pY4r22AaU0h7y4b/pF9yJxCPIuV3IWOzHb40GOJxdIUt6mXrx+kY92h+QNdJtmrqGsyQ8xuvzfFDCb2oa+dSVfBqYsHMtzHF94w667vBbIb/E/wa62bl3Il R5/ZUcNo bwS8/AdA5mPMhU/rP3coTsHM5Clls3DbbIN1RTJ+N9f/Sg07fiOOEZl/9FrTQBnRr3l4t8zPNy9x14KmJ7gPVbFKstOClHZhdBeI+LME94kXo7fzqtgjv15hdW/YbUxFAvp7xU+5LPFcNwnWRa2ZC35iFkjGiE1VaVlQsrxKejLjjQFlxNLz95SixsTo4dUN73pVJn9fIzaO3eg1yy3E/EWXedKwCzLoH608Cm/JBua2g5Gqik6H50vV3j63WCVxHa+dhCfS0dlZ0CwVxyCv5nbw/MvlcT4fDihX2CCEgf4Tgs8xTSzcj7yXXf6n7CAQPBcgQOCNOxINO8D8BCwHZAqpShbyOMlE+i6il X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 4/22/25 4:09 PM, Caleb Sander Mateos wrote: > NVMe commands with more than 4 KB of data allocate PRP list pages from > the per-nvme_device dma_pool prp_page_pool or prp_small_pool. Each call > to dma_pool_alloc() and dma_pool_free() takes the per-dma_pool spinlock. > These device-global spinlocks are a significant source of contention > when many CPUs are submitting to the same NVMe devices. On a workload > issuing 32 KB reads from 16 CPUs (8 hypertwin pairs) across 2 NUMA nodes > to 23 NVMe devices, we observed 2.4% of CPU time spent in > _raw_spin_lock_irqsave called from dma_pool_alloc and dma_pool_free. > > Ideally, the dma_pools would be per-hctx to minimize > contention. But that could impose considerable resource costs in a > system with many NVMe devices and CPUs. > > As a compromise, allocate per-NUMA-node PRP list DMA pools. Map each > nvme_queue to the set of DMA pools corresponding to its device and its > hctx's NUMA node. This reduces the _raw_spin_lock_irqsave overhead by > about half, to 1.2%. Preventing the sharing of PRP list pages across > NUMA nodes also makes them cheaper to initialize. > > Allocating the dmapool structs on the desired NUMA node further reduces > the time spent in dma_pool_alloc from 0.87% to 0.50%. Looks good to me: Reviewed-by: Jens Axboe -- Jens Axboe