From: Luis Chamberlain <mcgrof@kernel.org>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Keith Busch" <kbusch@kernel.org>,
"Theodore Ts'o" <tytso@mit.edu>,
"Pankaj Raghav" <p.raghav@samsung.com>,
"Daniel Gomez" <da.gomez@samsung.com>,
"Javier González" <javier.gonz@samsung.com>,
lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, linux-block@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
Date: Fri, 3 Mar 2023 13:45:48 -0800 [thread overview]
Message-ID: <ZAJqjM6qLrraFrrn@bombadil.infradead.org> (raw)
In-Reply-To: <ZAFuSSZ5vZN7/UAa@casper.infradead.org>
On Fri, Mar 03, 2023 at 03:49:29AM +0000, Matthew Wilcox wrote:
> On Thu, Mar 02, 2023 at 06:58:58PM -0700, Keith Busch wrote:
> > That said, I was hoping you were going to suggest supporting 16k logical block
> > sizes. Not a problem on some arch's, but still problematic when PAGE_SIZE is
> > 4k. :)
>
> I was hoping Luis was going to propose a session on LBA size > PAGE_SIZE.
> Funnily, while the pressure is coming from the storage vendors, I don't
> think there's any work to be done in the storage layers. It's purely
> a FS+MM problem.
You'd hope most of it is left to FS + MM, but I'm not yet sure that's
quite it yet. Initial experimentation shows just enabling > PAGE_SIZE
physical & logical block NVMe devices gets brought down to 512 bytes.
That seems odd to say the least. Would changing this be an issue now?
I'm gathering there is generic interest in this topic though. So one
thing we *could* do is perhaps review lay-of-the-land of interest and
break down what we all think are things likely could be done / needed.
At the very least we can come out together knowing the unknowns together.
I started to think about some of these things a while ago and with the
help of Willy I tried to break down some of the items I gathered from him
into community OKRs (super informal itemization of goals and sub tasks which
would complete such goals) and started trying to take a stab at them
with our team, but obviously I think it would be great if we all just
divide & and conquer here. So maybe reviewing these and extending them
as a community would be good:
https://kernelnewbies.org/KernelProjects/large-block-size
I'm recently interested in tmpfs so will be taking a stab at higher
order page size support there to see what blows up.
The other stuff like general IOMAP conversion is pretty well known, and
we already I think have a proposed session on that. But there is also
even smaller fish to fry, like *just* doing a baseline with some
filesystems with 4 KiB block size seems in order.
Hearing filesystem developer's thoughts on support for larger block
size in light of lower order PAGE_SIZE would be good, given one of the
odd situations some distributions / teams find themselves in is trying
to support larger block sizes but with difficult access to higher
PAGE_SIZE systems. Are there ways to simplify this / help us in general?
Without it's a bit hard to muck around with some of this in terms of
support long term. This also got me thinking about ways to try to replicate
larger IO virtual devices a bit better too. While paying a cloud
provider to test this is one nice option, it'd be great if I can just do
this in house with some hacks too. For virtio-blk-pci at least, for instance,
I wondered whether using just the host page cache suffices, or would a 4K
page cache on the host modify say a 16 k emualated io controller results
significantly? How do we most effectively virtualize 16k controllers
in-house?
To help with experimenting with large io and NVMe / virtio-blk-pci I
recented added support to intantiate tons of large IO devices to kdevops
[0], with it it should be easy to reproduce odd issues we may come up
with. For instnace it should be possible to subsequently extend the
kdevops fstests or blktests automation support with just a few Kconfig files
to use some of these largio devices to see what blows up.
If we are going to have this session I'd like to encourage & invite Pankaj and
Daniel who have been doing great work on reviewing all this too and can give
some feedback on some of their own findings!
[0] https://github.com/linux-kdevops/kdevops/commit/af33568445111cc114653264f6dbc8684f3b10e8
Luis
next prev parent reply other threads:[~2023-03-03 21:46 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-01 3:52 Theodore Ts'o
2023-03-01 4:18 ` Gao Xiang
2023-03-01 4:40 ` Matthew Wilcox
2023-03-01 4:59 ` Gao Xiang
2023-03-01 4:35 ` Matthew Wilcox
2023-03-01 4:49 ` Gao Xiang
2023-03-01 5:01 ` Matthew Wilcox
2023-03-01 5:09 ` Gao Xiang
2023-03-01 5:19 ` Gao Xiang
2023-03-01 5:42 ` Matthew Wilcox
2023-03-01 5:51 ` Gao Xiang
2023-03-01 6:00 ` Gao Xiang
2023-03-02 3:13 ` Chaitanya Kulkarni
2023-03-02 3:50 ` Darrick J. Wong
2023-03-03 3:03 ` Martin K. Petersen
2023-03-02 20:30 ` Bart Van Assche
2023-03-03 3:05 ` Martin K. Petersen
2023-03-03 1:58 ` Keith Busch
2023-03-03 3:49 ` Matthew Wilcox
2023-03-03 11:32 ` Hannes Reinecke
2023-03-03 13:11 ` James Bottomley
2023-03-04 7:34 ` Matthew Wilcox
2023-03-04 13:41 ` James Bottomley
2023-03-04 16:39 ` Matthew Wilcox
2023-03-05 4:15 ` Luis Chamberlain
2023-03-05 5:02 ` Matthew Wilcox
2023-03-08 6:11 ` Luis Chamberlain
2023-03-08 7:59 ` Dave Chinner
2023-03-06 12:04 ` Hannes Reinecke
2023-03-06 3:50 ` James Bottomley
2023-03-04 19:04 ` Luis Chamberlain
2023-03-03 21:45 ` Luis Chamberlain [this message]
2023-03-03 22:07 ` Keith Busch
2023-03-03 22:14 ` Luis Chamberlain
2023-03-03 22:32 ` Keith Busch
2023-03-03 23:09 ` Luis Chamberlain
2023-03-16 15:29 ` Pankaj Raghav
2023-03-16 15:41 ` Pankaj Raghav
2023-03-03 23:51 ` Bart Van Assche
2023-03-04 11:08 ` Hannes Reinecke
2023-03-04 13:24 ` Javier González
2023-03-04 16:47 ` Matthew Wilcox
2023-03-04 17:17 ` Hannes Reinecke
2023-03-04 17:54 ` Matthew Wilcox
2023-03-04 18:53 ` Luis Chamberlain
2023-03-05 3:06 ` Damien Le Moal
2023-03-05 11:22 ` Hannes Reinecke
2023-03-06 8:23 ` Matthew Wilcox
2023-03-06 10:05 ` Hannes Reinecke
2023-03-06 16:12 ` Theodore Ts'o
2023-03-08 17:53 ` Matthew Wilcox
2023-03-08 18:13 ` James Bottomley
2023-03-09 8:04 ` Javier González
2023-03-09 13:11 ` James Bottomley
2023-03-09 14:05 ` Keith Busch
2023-03-09 15:23 ` Martin K. Petersen
2023-03-09 20:49 ` James Bottomley
2023-03-09 21:13 ` Luis Chamberlain
2023-03-09 21:28 ` Martin K. Petersen
2023-03-10 1:16 ` Dan Helmick
2023-03-10 7:59 ` Javier González
2023-03-08 19:35 ` Luis Chamberlain
2023-03-08 19:55 ` Bart Van Assche
2023-03-03 2:54 ` Martin K. Petersen
2023-03-03 3:29 ` Keith Busch
2023-03-03 4:20 ` Theodore Ts'o
2023-07-16 4:09 BELINDA Goodpaster kelly
2025-09-22 17:49 Belinda R Goodpaster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZAJqjM6qLrraFrrn@bombadil.infradead.org \
--to=mcgrof@kernel.org \
--cc=da.gomez@samsung.com \
--cc=javier.gonz@samsung.com \
--cc=kbusch@kernel.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=p.raghav@samsung.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox