From: Viacheslav Dubeyko <slava@dubeyko.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: Keith Busch <kbusch@kernel.org>,
Bart Van Assche <bvanassche@acm.org>,
Hannes Reinecke <hare@suse.de>,
lsf-pc@lists.linuxfoundation.org, linux-mm@kvack.org,
linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] Large block for I/O
Date: Mon, 25 Dec 2023 11:55:23 +0300 [thread overview]
Message-ID: <FE53ACBB-1787-4EA0-93D9-1147E43A5F57@dubeyko.com> (raw)
In-Reply-To: <ZYWz8K98YUGf/VZp@casper.infradead.org>
> On Dec 22, 2023, at 7:06 PM, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Dec 22, 2023 at 08:10:54AM -0700, Keith Busch wrote:
>> If the host really wants to write in small granularities, then larger
>> block sizes just shifts the write amplification from the device to the
>> host, which seems worse than letting the device deal with it.
>
> Maybe? I'm never sure about that. See, if the drive is actually
> managing the flash in 16kB chunks internally, then the drive has to do a
> RMW which is increased latency over the host just doing a 16kB write,
> which can go straight to flash. Assuming the host has the whole 16kB in
> memory (likely?) Of course, if you're PCIe bandwidth limited, then a
> 4kB write looks more attractive, but generally I think drives tend to
> be IOPS limited not bandwidth limited today?
>
Fundamentally, if storage device supports 16K physical sector size, then
I am not sure that we can write by 4K I/O requests. It means that we should
read 16K LBA into page cache or application’s buffer before any write
operation. So, I see potential RMW inside of storage device only if device
is capable to manage 4K I/O requests even if physical sector is 16K.
But is it real life use-case?
I am not sure about attractiveness of 4K write operations. Usually, file system
provides the way to configure an internal logical block size and metadata
granularities. Finally, it is possible to align the internal metadata and user data
granularities on 16K size, for example. An if we are talking about metadata
structures (for example, inodes table, block mapping, etc), then it’s frequently
updated data. So, 16K will most probably contains several updated 4K pieces.
And, as a result, we have to flush all these updated metadata, anyway, despite
PCIe bandwidth limitation (even if we have some). Also, I assume that to send
16K I/O request could be more beneficial that several 4K I/O requests. Of course,
real life is more complicated.
Thanks,
Slava.
next prev parent reply other threads:[~2023-12-25 8:55 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <7970ad75-ca6a-34b9-43ea-c6f67fe6eae6@iogearbox.net>
2023-12-20 10:01 ` LSF/MM/BPF: 2024: Call for Proposals Daniel Borkmann
2023-12-20 15:03 ` [LSF/MM/BPF TOPIC] Large block for I/O Hannes Reinecke
2023-12-21 20:33 ` Bart Van Assche
2023-12-21 20:42 ` Matthew Wilcox
2023-12-21 21:00 ` Bart Van Assche
2023-12-22 5:09 ` Christoph Hellwig
2023-12-22 5:13 ` Matthew Wilcox
2023-12-22 5:37 ` Christoph Hellwig
2024-01-08 19:30 ` Bart Van Assche
2024-01-08 19:35 ` Matthew Wilcox
2024-02-22 18:45 ` Luis Chamberlain
2024-02-25 23:09 ` Dave Chinner
2024-02-26 15:25 ` Luis Chamberlain
2024-03-07 1:59 ` Luis Chamberlain
2024-03-07 5:31 ` Dave Chinner
2024-03-07 7:29 ` Luis Chamberlain
2023-12-22 8:23 ` Viacheslav Dubeyko
2023-12-22 12:29 ` Hannes Reinecke
2023-12-22 13:29 ` Matthew Wilcox
2023-12-22 15:10 ` Keith Busch
2023-12-22 16:06 ` Matthew Wilcox
2023-12-25 8:55 ` Viacheslav Dubeyko [this message]
2023-12-25 8:12 ` Viacheslav Dubeyko
2024-02-23 16:41 ` Pankaj Raghav (Samsung)
2024-01-17 13:37 ` LSF/MM/BPF: 2024: Call for Proposals [Reminder] Daniel Borkmann
2024-02-14 13:03 ` LSF/MM/BPF: 2024: Call for Proposals [Final Reminder] Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FE53ACBB-1787-4EA0-93D9-1147E43A5F57@dubeyko.com \
--to=slava@dubeyko.com \
--cc=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=lsf-pc@lists.linuxfoundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox