From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7499EC28B30 for ; Thu, 20 Mar 2025 15:58:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F270D280003; Thu, 20 Mar 2025 11:58:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED6BB280001; Thu, 20 Mar 2025 11:58:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC597280003; Thu, 20 Mar 2025 11:58:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BEA5C280001 for ; Thu, 20 Mar 2025 11:58:52 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6719358903 for ; Thu, 20 Mar 2025 15:58:53 +0000 (UTC) X-FDA: 83242387746.19.60AB7B9 Received: from tor.source.kernel.org (tor.source.kernel.org [172.105.4.254]) by imf21.hostedemail.com (Postfix) with ESMTP id E7BE61C0003 for ; Thu, 20 Mar 2025 15:58:51 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="dy/JUVoI"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of kbusch@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kbusch@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742486332; a=rsa-sha256; cv=none; b=bssqnxcxvRlVT8s9HGgMs5y7gN56lVQVT1WMwXSVA7kvyYgbv4twTkGxOski1ffVrU/kZl uufrngMiw0BEpm+s63MH6S8o8B0+V9f33f8THnKkhTkSwWnlOEANbCn7CAhgoAZvhC08IH s16rpBYMG7c0vYwXuinPtWwQgcZbN74= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b="dy/JUVoI"; dmarc=pass (policy=quarantine) header.from=kernel.org; spf=pass (imf21.hostedemail.com: domain of kbusch@kernel.org designates 172.105.4.254 as permitted sender) smtp.mailfrom=kbusch@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742486332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Bd+5NzPzLOb/tb2B+0WfNG5g9jc0B1YHST19RUDGWg4=; b=4dlycLqf3yLzKN66QwII0NDGwza1eAmIuZ/C5mpGqCI/upcOmc11TziHhs6RDBWhgNcUlp JiOKb5ii58JaXNnI9IlEM6R8wBz2PS626ekV8errJhKo5mEugWuopu17tME2Ek1HnViefh +u7LprBUZEG+SyghEflKEqcPbA8XXOQ= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by tor.source.kernel.org (Postfix) with ESMTP id 6A46561134; Thu, 20 Mar 2025 15:58:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7180FC4CEDD; Thu, 20 Mar 2025 15:58:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1742486330; bh=kEMYb1JDv1HDrUaDiio6agDObcxFRmWmlmQGt0VX4h0=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=dy/JUVoIQP9T5awVvskPFXgVGzW9fA0utk7uFP6TF6W2utNxTI5Feo7RN3BAX1da5 WtXQLRMRleI3VW+LW46DIRCT1bnY2GIp21hCCU1GryHWfKJFy7RoKHjUT6qraut4wf z/1/TA0zLdxxRJ9rNTtDGRMj/k47J/ZHP38ntz/XXMGMOI+jilOSP6Dn3g1+9UfvUR PUd5pt9GyTGFqjQB2DZ+4Kn+8+QXmpw47g/IzvnzYZ6n1w8N3ZEy6flP82kXNpLBWn By0XcJNXvNDhSpBgDnykq+YYcubmf/UQM9nLu4xnzEurV+g9YAiUNhLjOmnDQqVhv6 I06AvsnNfmQHw== Date: Thu, 20 Mar 2025 09:58:47 -0600 From: Keith Busch To: Bart Van Assche Cc: Christoph Hellwig , Luis Chamberlain , linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, lsf-pc@lists.linux-foundation.org, david@fromorbit.com, leon@kernel.org, sagi@grimberg.me, axboe@kernel.dk, joro@8bytes.org, brauner@kernel.org, hare@suse.de, willy@infradead.org, djwong@kernel.org, john.g.garry@oracle.com, ritesh.list@gmail.com, p.raghav@samsung.com, gost.dev@samsung.com, da.gomez@samsung.com Subject: Re: [LSF/MM/BPF TOPIC] breaking the 512 KiB IO boundary on x86_64 Message-ID: References: <20250320141846.GA11512@lst.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Queue-Id: E7BE61C0003 X-Stat-Signature: ctwzinq6rqh75e1gdu1e5udjo496gjy6 X-Rspamd-Server: rspam06 X-HE-Tag: 1742486331-746902 X-HE-Meta: U2FsdGVkX1+WEWdjSDCn98IAXjuigg45HAWYAzv0YuCxg0vMB7Fide5ScmVEBUFrbXMA7Z/wokfc1lObexvJ8WQoYh02CuvovDb6cXDWhvg5w43ucpnXUZMKpsFHHXXl0On2gNay9zUNJCKtGInu4kb0e8yR0ZgP+joBYf7Rk0TndlpWVriqkrgz1ETF/Wah3un2pLhF+MPj+xjxraw0kLhs0WLqWFz0zxLjwe00YXUWzWKznm+w/3GF2MEVQwaAO5sARf0R0utw56PXPzVWP+nyfH5pQUtwK5c1RiiK//5R3j2jONO129ukEfRMod3SJztJloW9+YZlORh9UUU1zCn2PAl9X/4dER8LdZAIzTF7OFRTl0oBV8mSKuJu9MdUJBu/LQ2devDJ0q5+qObGRfPqjNPRho9GdBeEDQ4X7ajppnqUynBgl5O/ahwybKFN3qKhlxW9PWdjby+ylkhuDJGHLU+s+m2wkqm8rEQvt8bQf0W7inuCn+S7BeEN3mcyRDexfw9loj1OtXBJpx1jUK1MOXUH5i+HOo9M05gmPgDNW6K/HxK87UHybnrQgnVrVt4yh/zbl8FQ+CvaHdg2MGDxzSlyOKy8QFMgL2uTinMjuAWXox4jY7m0pDe2mkflZZrmyZOYSgjAPKrbWYGo1p6SNqIW/fZnwysVtQEZD8ITC4emJ9bK3vva7AS5J1FV+NPi5gmYBvjoKkNZvFAc9k3ot/mboc0XNnP8wvKYadaP/QR4SLtSwt7GXhF5/wpGl1DjHL+QRMqpJTInzPolUzO5CiXsAC7lj2QMZZq5qZddyrlXb70xwPvMlzvLR+8imwNtoO5EVsNjiXap4Qfq1SIpNaOjYqkPrj+kQJVqR40FezKXjpCQhBgpUnnnVpks/wjcQY4Ug1qSN7XQIGAaJqSh7WAZtPsrmdbjrt8wFGbpEnxwsIkwqh9qx65YvfMAyOCPRNKghHIlaXQhprN KFute9un 9TSPYabuLm52GH6lWe0PcWdeVRh2QDT/9X/BU X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 20, 2025 at 08:37:05AM -0700, Bart Van Assche wrote: > On 3/20/25 7:18 AM, Christoph Hellwig wrote: > > On Thu, Mar 20, 2025 at 04:41:11AM -0700, Luis Chamberlain wrote: > > > We've been constrained to a max single 512 KiB IO for a while now on x86_64. > > > > No, we absolutely haven't. I'm regularly seeing multi-MB I/O on both > > SCSI and NVMe setup. > > Is NVME_MAX_KB_SZ the current maximum I/O size for PCIe NVMe > controllers? From drivers/nvme/host/pci.c: Yes, this is the driver's limit. The device's limit may be lower or higher. I allocate out of hugetlbfs to reliably send direct IO at this size because the nvme driver's segment count is limited to 128. The driver doesn't impose a segment size limit, though. If each segment is only 4k (a common occurance), I guess that's where Luis is getting the 512K limit? > /* > * These can be higher, but we need to ensure that any command doesn't > * require an sg allocation that needs more than a page of data. > */ > #define NVME_MAX_KB_SZ 8192 > #define NVME_MAX_SEGS 128 > #define NVME_MAX_META_SEGS 15 > #define NVME_MAX_NR_ALLOCATIONS 5 > > > > This is due to the number of DMA segments and the segment size. > > > > In nvme the max_segment_size is UINT_MAX, and for most SCSI HBAs it is > > fairly large as well. > > I have a question for NVMe device manufacturers. It is known since a > long time that submitting large I/Os with the NVMe SGL format requires > less CPU time compared to the NVMe PRP format. Is this sufficient to > motivate NVMe device manufacturers to implement the SGL format? All SCSI > controllers I know of, including UFS controllers, support something that > is much closer to the NVMe SGL format rather than the NVMe PRP format. SGL support does seem less common than you'd think. It is more efficient when you have physically contiguous pages, or an IOMMU mapped discontiguous pages into a dma contiguous IOVA. If you don't have that, PRP is a little more efficient for memory and CPU usage. But in the context of large folios, yeah, SGL is the better option.