From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B28BEC41535 for ; Fri, 22 Dec 2023 15:11:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4FCAE6B006E; Fri, 22 Dec 2023 10:11:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4AC866B0071; Fri, 22 Dec 2023 10:11:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 374926B007B; Fri, 22 Dec 2023 10:11:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 299296B006E for ; Fri, 22 Dec 2023 10:11:07 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 08FC8A091B for ; Fri, 22 Dec 2023 15:11:07 +0000 (UTC) X-FDA: 81594792174.11.54F77ED Received: from sin.source.kernel.org (sin.source.kernel.org [145.40.73.55]) by imf06.hostedemail.com (Postfix) with ESMTP id 7CC3718000F for ; Fri, 22 Dec 2023 15:11:04 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IN66h3fd; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of kbusch@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=kbusch@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703257865; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=K4h3y9RLdDXoV2ORCa7QwrDaDTIoSA8ARwOJuQ2fKSI=; b=lef6Yibqc1diHlaFJhdJoJ6NZthmqqNCdz4tT3PEZLUKxqMVIUYpwIHsZSnIBAG7uEMXkx uPxszE1wv8RTEV82Wxk1rxXF7j9JHrdPanyLjnhMY4JLtDJa3ZOB85c1ToKYn6VG0Lc9F9 oTt6cJHGEkPcsxlMHyWi4/Rxhy53Bxw= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=IN66h3fd; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf06.hostedemail.com: domain of kbusch@kernel.org designates 145.40.73.55 as permitted sender) smtp.mailfrom=kbusch@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703257865; a=rsa-sha256; cv=none; b=eOptMXgUkjaPbHITD8/7iwRyUfBpIx1ASFIrCLnyqQaCNH/RhAESYF5K54Axeqn6vyZm9p PHMcpZruY7Ka0of2Fn+9W8hgW+EHG72yq7veBGzAIXAIAeJKpIQprOOSqxwO82o+YZ5vJ5 cxwEIG8EpUCcN422+q34G9HK350CUmM= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by sin.source.kernel.org (Postfix) with ESMTP id 64309CE20A5; Fri, 22 Dec 2023 15:10:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A4BE1C433C9; Fri, 22 Dec 2023 15:10:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1703257857; bh=jj2Z4UzDBfG4TrpZAn+WY65WDkHUYpWkxVo4G50GhjE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=IN66h3fddKRYluEoG8tMXspnU52/OIoPT8qyGxpDlX7QZplgwPhbNGKhyDYVB+ba3 xfS+rS8Pbl0d4WyiJU6jtf5wUjxqbY33nnLOM/l0KfLJGcuJwXDw4TOs/G2D6LO+nN lI84IPiWYg6Q55aXY91lD9zWpSnBAWW9o7UsKcpvFWC5KvifnsjStyMJvLYvnI4TSU Wca1GQlsm4cd1tEDt1DV6MOyYXxYYJsM4DHt/yhC71QsefXtoX0B7rautzn7uU7fok 3GubVzIH6YP1eiulSfcywSjiiIRVcyZT93OzPjbGTie+ClXeTgK3B16e0dSXVB6eiX Q9pDkFv2myIqA== Date: Fri, 22 Dec 2023 08:10:54 -0700 From: Keith Busch To: Viacheslav Dubeyko Cc: Bart Van Assche , Hannes Reinecke , lsf-pc@lists.linuxfoundation.org, linux-mm@kvack.org, linux-block@vger.kernel.org, linux-scsi@vger.kernel.org, "linux-nvme@lists.infradead.org" Subject: Re: [LSF/MM/BPF TOPIC] Large block for I/O Message-ID: References: <7970ad75-ca6a-34b9-43ea-c6f67fe6eae6@iogearbox.net> <4343d07b-b1b2-d43b-c201-a48e89145e5c@iogearbox.net> <03ebbc5f-2ff5-4f3c-8c5b-544413c55257@suse.de> <5c356222-fe9e-41b0-b7fe-218fbcde4573@acm.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 7CC3718000F X-Stat-Signature: 6tqodkzoq8p5crj47b47dprhzome87u6 X-HE-Tag: 1703257864-609471 X-HE-Meta: U2FsdGVkX1+PS8coXNTUZaaQmwUFWyPVLdJNz+EsCRh/fQzRSN8SohEnv1/hseLhlV0y34ydkDhbCtPHo8fIWQ9n7whfHO/OpnqBMfiIEJQ0CJA8aT1ZIfdBpa+zEP6Xi1C6BX0EzeIIQ5+wA+t0iqaiVBriBtXi+yN1eQmr6poqW5lVm3w8MoNoNKopkuZqc5RZXlFkn3xqqeOEQXesQZTiRwf07FdV2aEkZgIHhVYQznKjxgIggs+2xhYgYEQkmBPWuSr+QPLaAPqQlVeE1JRNP83+LvDG8xNqQSO+PQ105B9LyVa9g3w7zCAz3sck3+87CArGMaOU6uGNkZhLSlQUM5O93jst4FAOvl0Wfp1oBUeFpJNq3nyn+g981auwAiFQshlPcVkOzjWLtqeM0xv5i3IUKesUYUC9VXui9cwnthkhbau9zvfFjirRzxUa6kg40FmsV3GqCGWVwYs62TQRVuxJOISpbZzN8nM2UtWBzKgE9OS4Nw8kZ4VIpgsmpZtoU2mqb6XA/1NTKhQ5sATpVB3TWacMIZeG0wqtg9cbcdPDuWpR8BnMHnqgEoTnXyeYpv0B9dEq6/LHGFxnLNPdduvrWUYldOIdLWMFHM7Ai+rFgWe9zofUXiyo/2WC/PlGySFvoDlGKqb3WFewTKVZVscB8bmdTAE29PBTNmmg8eaCJcPuuf8s/rQVTVDuHwnMJn0uNrRNqpZzrrEhiYx4xDG/Qg2rggRhKJjM12HyWeLGTZsFmcc8FPLcjbeFjXKCMa8zW/D1kMp7SjHreO78p5p17gz9xgyxiJnYvTbofjVsEx0FzUXwzPHBReI1uiEFq5qIzouj7yN+McqbI+7LwvWvjqTxSiRSYEdziwpBW6PuCo9mttrsaZHKJjMGUmIeRqJxhsTWvVgHZAO82hPMGLxnmjHSwoEnxDbC4BB/81BcZPv2DdDooYjt7YjWIr23uwd1w9snTvYv01T upRGFF/W vbJ5uioyqe3gKP2an1j0fhqWQQX3ebJ8J1uSudFySPhFj7+0XX9qbAt+X8sHR7ZE9g3GH6J7bt88DdfPjNOwr8yU5qkAeKHw1eHp4Q7+xVZcXfBQzZ+XBJZCrmEwCI5Ua75RLNXeowko4WGI4g3eNpNll+MK3KdD8j6trTednUsqM/OuL0SonvBbeMSNz52sIq9xwpeMgp/ivatDmhtrUu1274MgqBpBmTLdvDeEok2zAouRqno4GR2UZ0ZClIDSFHpoyn87nZ6TYh5Q3OyTUnyGpGgQmgVZeHt2LXiMm+OQBfsjOH/+lWdFCew== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Dec 22, 2023 at 11:23:26AM +0300, Viacheslav Dubeyko wrote: > > On Dec 21, 2023, at 11:33 PM, Bart Van Assche wrote: > > I'm interested in this topic. But I'm wondering whether the disadvantages of > > large blocks will be covered? Some NAND storage vendors are less than > > enthusiast about increasing the logical block size beyond 4 KiB because it > > increases the size of many writes to the device and hence increases write > > amplification. > > > > I am also interested in this discussion. Every SSD manufacturer carefully hides > the details of architecture and FTLīs behavior. I believe that switching on bigger > logical size (like 8KB, 16KB, etc) could be even better for SSD's internal mapping > scheme and erase blocks management. I assume that it could require significant > reworking the firmware and, potentially, ASIC logic. This could be the main pain > for SSD manufactures. Frankly speaking, I donīt see the direct relation between > increasing logical block size and increasing write amplification. If you have 16KB > logical block size on SSD side and file system will continue to use 4KB logical > block size, then, yes, I can see the problem. But if file system manages the space > in 16KB logical blocks and carefully issue the I/O requests of proper size, then > everything should be good. Again, FTL is simply trying to write logical blocks into > erase block. And we have, for example, 8MB erase block, then mapping and writing > 16KB logical blocks looks like more beneficial operation compared with 4KB logical > block. If the host really wants to write in small granularities, then larger block sizes just shifts the write amplification from the device to the host, which seems worse than letting the device deal with it. I've done some early profiling on my fleet and there are definitely applications that overwhelming prefer larger writes. Those should be great candidates to use these kinds of logical block formats. It's already flash-friendly, but aligning filesystems and memory management to the same granularity is a nice plus. Other applications, though, still need 4k writes. Turning those to RMW on the host to modify 4k in the middle of a 16k block is obviously a bad fit. Anyway, your mileage may vary. This example BPF program provides an okay starting point for examining disk usage to see if large logical block sizes are a good fit for your application: https://github.com/iovisor/bpftrace/blob/master/tools/bitesize.bt > So, I see more troubles on file systems side to support bigger logical > size. For example, we discussed the 8KB folio size support recently. > Matthew already shared the patch for supporting 8KB folio size, but > everything should be carefully tested. Also, I experienced the issue > with read ahead logic. For example, if I format my file system volume > with 32KB logical block, then read ahead logic returns to me 16KB > folios that was slightly surprising to me. So, I assume we can find a > lot of potential issues on file systems side for bigger logical size > from the point of view of efficiency of metadata and user data > operations. Also, high-loaded systems could have fragmented memory > that could make the memory allocation more tricky operation. I mean > here that it could be not easy to allocate one big folio. > Log-structured file systems can easily aligned write I/O requests for > bigger logical size. But in-place update file systems can increase > write amplification for bigger logical size because of necessity to > flush bigger portion of data for small modification. However, FTL can > use delta-encoding and smart logic of compaction several logical > blocks into one NAND flash page. And, by the way, NAND flash page > usually is bigger than 4KB.